Image annotation is the process of labeling images so that computer vision models can learn to recognize objects, regions, text, or actions. It turns raw pixels into training data by attaching human or model-generated labels to pixels, shapes, or entire images. Teams use annotation to build datasets for object detection, instance and semantic segmentation, keypoint and landmark detection, OCR, classification, and visual search.
Annotation can be manual, model-assisted, or automated. In model-assisted workflows, a pre-trained model proposes labels that humans correct(called as human in the loop), which speeds up throughput while preserving accuracy. Active learning is often used to pick the most informative images for review so the model improves quickly. Outputs are saved in common formats such as COCO JSON, Pascal VOC XML, YOLO TXT, or format-specific masks. In medical imaging, annotations may be saved as 3D label maps aligned to NIfTI volumes or as DICOM overlays.
Quality assurance is crucial. Good projects start with a clear label ontology and detailed guidelines, then measure quality with inter-annotator agreement, gold-standard checks, sampling-based review, and feedback loops. Attributes like occlusion, truncation, and confidence scores are often captured to make datasets more useful for training and evaluation.
Common annotation types used in machine learning:
Example
Scenario: You’re building a product-detection model for a fashion e-commerce catalog. Each product photo shows a model wearing multiple items.
What annotators do:
What the annotation looks like (conceptually):
Why this helps: