Small, smart, and synthetic: distilling data for pre-trained vision models

Small, smart, and synthetic: distilling data for pre-trained vision models

Big vision models are now trained once and reused via simple "linear probes." This paper asks: can a tiny set of synthetic images replace massive real datasets for training those probes?

Enter Linear Gradient Matching: it learns a handful of synthetic images so that, after a frozen feature extractor (e.g., DINO, CLIP), they provoke nearly the same gradients in the linear classifier as real data.

  • Outperforms real-image baselines for linear probing in the authors' tests.
  • Generalizes across models: a set distilled with a DINO backbone can train a competitive CLIP probe.
  • Excels on fine-grained categories.
  • Doubles as an interpretability tool—revealing similarity between models’ embedding spaces and flagging spurious correlations on adversarial datasets.

Why it matters: faster prototyping, lower storage and compute, and safer data sharing—without starting from scratch.

Paper: https://arxiv.org/abs/2511.16674v1

Register: https://www.AiFeta.com

#AI #ComputerVision #DatasetDistillation #SelfSupervisedLearning #ML #CLIP #DINO #Interpretability

Read more