Blog
SAM Series (for images and videos)

SAM is a big collection of models published by Meta AI and it aims to segment components from contents. It originally started with image segmentation but gradually expanded to video, 3D geometry (opens in a new tab) and audio (opens in a new tab).

In this post, we will focus on the image and video segmentation models. Segmentation is a quite classic task in computer vision and it has been extensively studied for decades. But SAM series models revolutionized the field in following ways:

  1. From expert model to foundation model: Traditional segmentation models are usually trained for a specific task or dataset. SAM introduced a "Promptable Segmentation" task and does not require any fine-tuning for zero-shot segmentation.
  2. Unprecedented dataset scale and strong generalization ability: Previous models are usually trained on about ~100k images and ~1M masks. SAM v1 is trained on 11M images and 1.1B masks, which exceeds the scale of previous datasets by more than 100x.
Comments Loading (Debug)...