>
The original paper used several medical X-ray datasets which I don’t have access to anymore, so I needed a new dataset with spatial annotations to test the expert attention mechanism. I picked the Ukiyo-eVG dataset: ~11K Japanese woodblock prints
IMO it would be hard to reproduce the results using autoresearch setup.
To get CLIP to work properly we typically need large batch sizes. So the experiments in the original paper were quite heavy, and ran parallel across 8 GPUs.
That's such a weird switch. There's lots of free medical imaging online. Example: https://www.cancerimagingarchive.net/