An extreme close-up of an eye with a red neural web across the iris, representing real-time machine perception and object detection.
YOLO gives a machine a single glance. One forward pass turns pixels into boxes, labels, and confidence scores.

YOLO, short for You Only Look Once, is a family of single-stage, real-time object detectors. It frames detection as one regression problem: a single neural network processes the whole image in one forward pass, divides it into a grid, and predicts bounding boxes and class probabilities at the same time. That unified design is what makes it fast enough for video and live cameras. It contrasts with two-stage detectors of the R-CNN family, which first propose regions and then classify them in a slower multi-step pipeline. The Ultralytics package is the standard toolkit for training, running, and exporting modern YOLO models, and it unifies detection, segmentation, pose estimation, oriented boxes, classification, and tracking under one API.

Licensing matters here. Ultralytics YOLO is released under AGPL-3.0, a strong copyleft licence. Read the licensing section before building anything commercial. A paid Enterprise licence exists for closed-source products.

Where YOLO sits

You work through the Ultralytics Python API or CLI. The model runs on PyTorch , and once trained it exports to many runtimes for server or edge deployment.

Application
Video analytics Robotics Inspection Your detection or tracking service
Interface
Ultralytics Python API yolo CLI train, predict, val, track, export
Model
Backbone Neck (FPN + PAN) Detection head Feature extraction, fusion, prediction
Runtime and export
PyTorch ONNX TensorRT CoreML / LiteRT Server GPU, Jetson, mobile, edge TPU

How single-shot detection works

A YOLO network has three parts. The backbone (a CSP-style convolutional network) extracts features. The neck fuses features across scales using a Feature Pyramid Network plus Path Aggregation Network, so the model sees both fine detail and broad context. The head produces the final predictions. Older versions used anchor boxes, predefined box shapes that predictions adjust; since YOLOv8, Ultralytics models are anchor-free and predict box coordinates directly with a decoupled head. The training loss combines a box regression term (Complete IoU), an objectness term, and a classification term.

A classic post-processing step, non-maximum suppression (NMS), removes duplicate overlapping boxes for the same object. NMS adds latency and complicates export. YOLOv10 (from Tsinghua University in 2024) pioneered NMS-free detection in the YOLO family using dual label assignment, and current Ultralytics flagships make end-to-end NMS-free inference the default.

The version lineage

YOLO is not one model but a lineage, and different versions come from different authors under different licences.

VersionYearAuthorNote
v1 to v32016-2018Joseph RedmonOriginal single-stage detector
v42020Alexey BochkovskiyCSPDarknet, Darknet-native
v52020UltralyticsFirst PyTorch rewrite
v82023UltralyticsAnchor-free, multi-task
v102024Tsinghua UniversityNMS-free end-to-end
YOLO112024UltralyticsWidely deployed stable release
v12 / v132025Academic groupsAttention and hypergraph research lines
YOLO262025-2026UltralyticsEdge-optimised current flagship

As of 2026, YOLO11 is the most widely deployed stable Ultralytics production model, and YOLO26 is the newer flagship tuned for edge and CPU inference. YOLOv12 and YOLOv13 are separate academic research lineages that run through the Ultralytics package but are not Ultralytics releases. For a transformer-based real-time alternative, see RT-DETR.

Installing Ultralytics

bash
pip install -U ultralytics

The package pulls in PyTorch automatically. For GPU acceleration, install a CUDA-enabled PyTorch build that matches your CUDA version first. Python 3.8 or higher is required.

Running inference on a pretrained model

python
from ultralytics import YOLO

# Load pretrained weights (n = nano; s, m, l, x are larger)
model = YOLO("yolo11n.pt")

# Predict on an image, a video, a folder, a URL, or a webcam index
results = model.predict(source="street.jpg", conf=0.25, save=True)

for r in results:
    for box in r.boxes:
        cls_id = int(box.cls[0])
        conf = float(box.conf[0])
        xyxy = box.xyxy[0].tolist()   # [x1, y1, x2, y2]
        print(model.names[cls_id], round(conf, 3), xyxy)

The same task from the command line:

bash
yolo detect predict model=yolo11n.pt source=street.jpg conf=0.25 save=True

Training on a custom dataset

Point a small data.yaml at your labelled images, then fine-tune from pretrained weights (transfer learning), which needs far less data than training from scratch.

yaml
# data.yaml
path: ./datasets/hardhats
train: images/train
val: images/val
names:
  0: person
  1: helmet
  2: no-helmet
python
from ultralytics import YOLO

model = YOLO("yolo11n.pt")          # start from pretrained weights
model.train(data="data.yaml", epochs=100, imgsz=640, batch=16, device=0)
model.val()                          # evaluate on the val split
model.export(format="onnx")          # export for deployment

From data to deployed detector

Step 1 Label Annotate images with boxes and write a data.yaml.
Step 2 Train Fine-tune from pretrained weights on your classes.
Step 3 Export Convert to ONNX, TensorRT, CoreML, or LiteRT.
Step 4 Deploy Run on a server GPU, a Jetson, or a mobile device.

How it compares

YOLO (Ultralytics)RT-DETRFaster R-CNNSAM
TypeSingle-stage CNNTransformer detectorTwo-stage CNNSegmentation foundation model
SpeedVery fast, real-timeReal-time on GPUNot real-timeNot real-time
NMSNMS-free in recent versionsNMS-freeNeeds NMSNot applicable
LicenceAGPL-3.0 (Enterprise paid)Apache 2.0PermissiveApache 2.0
Best forReal-time detection, edgeHigh-accuracy real-time on GPUMax-accuracy offline baselinesPromptable segmentation masks

Licensing: read this before you ship

Ultralytics YOLO (v5, v8, YOLO11, YOLO26) is AGPL-3.0. This is the load-bearing fact for commercial use:

  • If you use Ultralytics code, architectures, or trained or fine-tuned weights in a product, AGPL requires you to release the complete source of your entire derivative work under AGPL-3.0.
  • The AGPL network clause means this triggers even for a SaaS or internal network service, where users interact over the network but never receive your binary. Ordinary GPL does not cover that case; AGPL does.
  • Fine-tuned weights and a private deployment do not escape the obligation.
  • For a closed-source commercial product, buy the Ultralytics Enterprise licence, which removes the open-source requirement.

If you cannot open-source your application and will not buy the Enterprise licence, choose a permissively licensed detector such as RT-DETR instead.

When not to use YOLO

  • A closed-source commercial product with no licence. See the licensing section above. This is the most common and most expensive mistake.
  • Tiny objects or dense, overlapping crowds. Grid-based detection struggles here; specialised small-object or crowd-counting models often do better.
  • Maximum accuracy on cluttered scenes. Transformer detectors like RT-DETR can edge out YOLO on some benchmarks, with permissive licences.
  • Open-vocabulary or zero-shot detection. Standard YOLO detects only its trained classes. To detect anything from a text prompt, use YOLO-World, Grounding DINO, or GLIP.
  • Pixel-perfect masks of arbitrary objects. For promptable, high-fidelity segmentation, use a foundation model like SAM rather than YOLO’s segmentation head.

Further reading

Sources