CLIP

google/siglip-base-patch16-256-multilingual · Hugging Face
a year ago
SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
AI Image Classification CLIP