To get CLIP to work properly we typically need large batch sizes. So the experiments in the original paper were quite heavy, and ran parallel across 8 GPUs.