YOLO methods

On this page

Utilities
Bounding-box inference
NVDEC GPU-accelerated YOLO inference
Pose-estimation inference
YOLO pose-estimation segmentation visualizer
YOLO pose-estimation segmentation inference
Pose-estimation track inference
Pose-estimation track plotting
Pose-estimation plotting
Bounding box plotting
YOLO annotation visualizer
COCO key-points -> YOLO pose-estimation format conversion
COCO key-points -> YOLO bounding box conversion
COCO key-points -> YOLO segmentation conversion
SAM3 -> YOLO segmentation project
SAM3 -> YOLO bounding-box (detection) project
Merge multiple YOLO projects
Multi-animal DeepLabCut predictions -> YOLO pose-estimation annotations format conversion
DeepLabCut predictions -> YOLO pose-estimation annotations
Labelme annotations -> YOLO bounding box annotations
Labelme points -> YOLO keypoints annotations
Labelme points -> YOLO segmentation annotations
SimBA ROIs -> YOLO bounding box annotations
SimBA pose-estimation -> YOLO pose-estimation annotations
SimBA pose-estimation -> YOLO segmentation annotations
SLEAP CSV predictions -> YOLO pose-estimation annotations
SLEAP H5 predictions -> YOLO pose-estimation annotations
SLEAP annotations -> YOLO pose-estimation annotations
LightningPose keypoints -> YOLO bounding box conversion
LightningPose keypoints -> YOLO pose-estimation annotations

Methods for training YOLO models, creating training and validation datasets, and converting behavioral neuroscience specific datasets to YOLO datasets.

Utilities 

simba.utils.yolo.apply_fixed_bbox_size(data, video_name, img_w, img_h, bbox_size)[source]

Apply a fixed axis-aligned bounding-box size to detected rows in a results table.

The current box center is preserved, then the box is resized to bbox_size (h, w). If the resized box would exceed frame boundaries, the box is shifted so it remains fully inside the image while preserving the requested size.

The function expects YOLO corner columns X1..Y4 and updates them in-place on the input dataframe before returning it.

Each detection box is resized to a fixed size about its own centre; a box that would leave the frame is shifted back fully inside while keeping the fixed size

Parameters

data (pd.DataFrame) – Detection dataframe containing CONFIDENCE and corner coordinate columns X1, Y1, X2, Y2, X3, Y3, X4, Y4.
video_name (str) – Video identifier used in error messages.
img_w (int) – Image width in pixels.
img_h (int) – Image height in pixels.
bbox_size (Tuple[int, int]) – Target fixed bounding-box size as (height, width) in pixels.

Returns

Input dataframe with updated fixed-size bbox coordinates for detected rows.

Return type

pd.DataFrame

Raises

InvalidInputError – If required columns are missing or if bbox_size is larger than image dimensions.

simba.utils.yolo.create_yolo_sample_visualizations(samples, save_dir, names=None, palette='Set1', seg_opacity=0.5, draw_labels=True, verbose=True, source='')[source]

Create annotated visualizations from YOLO-format (image, label_str) samples.

Auto-detects annotation type (bounding-box or segmentation) from the label string format and draws the appropriate overlays. Images are saved as PNG files in save_dir.

Parameters

samples (List[Tuple[str, np.ndarray, str]]) – List of (sample_name, image, label_str) tuples produced by a SAM3-to-YOLO converter.
save_dir (Union[str, os.PathLike]) – Directory where annotated images are saved. Created if it does not exist.
names (Optional[Tuple[str, ...]]) – Class names in index order. Required when draw_labels=True; otherwise optional and only used to size the color palette. Default None.
palette (str) – Color palette name. Default 'Set1'.
seg_opacity (float) – Opacity of filled segmentation polygons (0.0–1.0). Default 0.5.
draw_labels (bool) – If True, draw the class name text alongside each box/polygon. Default True.
verbose (bool) – Print progress messages. Default True.
source (str) – Caller class name for log messages.

simba.utils.yolo.detect_yolo_project_type(label_path)[source]

Detect the YOLO project type (bbox, keypoint, or segmentation) from a single label file.

The first non-empty annotation line is inspected and classified by the number of values following the class id:

bbox: class_id + 4 values (x_center, y_center, w, h).
keypoint: class_id + 4 bbox values + N*3 keypoint values (x, y, visibility), where every visibility flag is 0, 1, or 2.
segmentation: class_id + N*2 polygon vertices (N >= 3, i.e. at least 6 values).

Parameters: label_path (str) – Path to a YOLO-format .txt label file. Must exist and be readable.
Returns: The detected project type: one of 'bbox', 'keypoint', or 'segmentation'. Defaults to 'bbox' when the file is empty or no line can be classified.
Return type: str
Example

>>> detect_yolo_project_type(label_path='/project/labels/frame_0001.txt')
>>> 'keypoint'

simba.utils.yolo.export_yolo_model(model_path, export_format, imgsz=256, device=0, int8=False, batch=1, workspace=8, data=None, task=None, dynamic=False, simplify=True, half=False)[source]

Export a YOLO model using Ultralytics model.export.

Wrapper around Ultralytics export that supports common deployment formats (including ONNX and TensorRT engine).

Note

INT8 export is valid for the engine, openvino, and tflite formats and cannot be combined with half=True. For openvino and tflite INT8, a calibration dataset (data yaml) is required, as Ultralytics performs post-training quantization using representative images.

Important

When exporting a segmentation model, the imgsz parameter is critical for mask quality. Segmentation requires pixel-level precision along object boundaries, so spatial detail lost to downscaling hurts segmentation far more than detection or pose tasks. Set imgsz as large as your GPU memory allows. The default 256 may be too coarse for high-quality segmentation masks.

Parameters

model_path (Union[str, os.PathLike]) – Path to source YOLO weights (typically .pt).
export_format (Literal["onnx", "engine", "torchscript", "onnxsimplify", "coreml", "openvino", "pb", "tf", "tflite", "torch", "ncnn", "mnn"]) – Target export format. ncnn and mnn are well suited to ARM/mobile CPUs (e.g. Raspberry Pi).
imgsz (int) – Export input image size in pixels.
device (Union[Literal['cpu'], int]) – Export device ('cpu' or CUDA index).
int8 (bool) – If True, request INT8 quantized export. Requires export_format of 'engine', 'openvino', or 'tflite'. For 'openvino' and 'tflite', a data calibration yaml must be supplied.
batch (int) – Export batch/profile size (must be >= 1). For INT8, ensure calibration data size is at least this value.
workspace (int) – TensorRT workspace budget in GB (must be >= 1).
data (Optional[Union[str, os.PathLike]]) – Optional dataset yaml path used for export/calibration.
task (Optional[Literal["detect", "segment", "classify", "pose", "obb"]]) – Optional explicit YOLO task. Set this to avoid backend task auto-guessing warnings.
dynamic (bool) – If True, build with dynamic input profiles.
half (bool) – If True, request FP16 export where supported.

Returns

Path-like export artifact returned by Ultralytics.

Return type

Union[str, os.PathLike]

Raises

SimBAPAckageVersionError – If Ultralytics is unavailable.
InvalidInputError – On unsupported format or invalid precision combination.

Example

>>> export_yolo_model(
...     model_path=r"F://netholabs\primintellect_test\mdl\weights\best.pt",
...     export_format='engine',
...     imgsz=256,
...     device=0,
...     int8=True,
...     batch=4,
...     workspace=8,
...     task='detect',
...     dynamic=False,
...     half=False
... )

>>> export_yolo_model(
...     model_path=r"H:\netholabs\openvino\best.pt",
...     export_format='openvino',
...     imgsz=256,
...     device='cpu',
...     int8=True,
...     data=r"H:\netholabs\openvino\map.yaml",
...     task='detect',
... )

simba.utils.yolo.filter_yolo_keypoint_data(bbox_data, keypoint_data, class_id=None, confidence=None, class_idx=None, confidence_idx=None)[source]

Helper to filters YOLO bounding box and keypoint data based on class ID and/or confidence threshold.

Parameters

bbox_data (np.ndarray) – A 2D array of shape (N, M) representing YOLO bounding box data, where each row corresponds to one detection and contains class and confidence values.
keypoint_data (np.ndarray) – A 3D array of shape (N, 2, 3) representing keypoints for each detection, where K is the number of keypoints per detection.
class_id (Optional[int]) – Target class ID to filter detections. Defaults to None.
confidence (Optional[float]) – Minimum confidence threshold to keep detections. Must be in [0, 1]. Defaults to None.
confidence_idx (int) – Index in bbox_data where confidence value is stored. Defaults to 5.
class_idx (int) – Index in bbox_data where class ID is stored. Defaults to 6.

simba.utils.yolo.fit_yolo(weights_path, model_yaml, save_path, epochs=25, batch=16, plots=True, imgsz=640, format=None, device=0, verbose=True, workers=8)[source]

Trains a YOLO model using specified initial weights and a configuration YAML file.

Note

Download initial weights. Example model_yaml.

See also

For the recommended wrapper class with parameter validation, see simba.model.yolo_fit.FitYolo.

Parameters

initial_weights – Path to the pre-trained YOLO model weights (usually a .pt file). Example weights can be found [here](https://huggingface.co/Ultralytics).
model_yaml – YAML file containing paths to the training, validation, and testing datasets and the object class mappings. Example YAML file can be found [here](https://github.com/sgoldenlab/simba/blob/master/misc/ex_yolo_model.yaml).
save_path – Directory path where the trained model, logs, and results will be saved.
epochs – Number of epochs to train the model. Default is 5.
batch – Batch size for training. Default is 16.

Returns

None. The trained model and associated training logs are saved in the specified project_path.

Example

>>> fit_yolo(initial_weights=r"C:/troubleshooting/coco_data/weights/yolov8n-obb.pt", data=r"C:/troubleshooting/coco_data/model.yaml", save_path=r"C:/troubleshooting/coco_data/mdl", batch=16)

simba.utils.yolo.get_yolo_imgsz_and_batch_size(model, raise_error=True)[source]

Attempt to read the image size and batch size baked into a YOLO model.

Note

For .engine (TensorRT) files both values are read straight from the embedded header and represent the fixed input bindings. For .pt and other formats the values are scraped from the training arguments, so imgsz reflects the training size (a sensible default, not a hard constraint) and batch is frequently unavailable.

See also

read_yolo_metadata() (full metadata dictionary)

Parameters

model (Union[str, os.PathLike, YOLO]) – Path to a YOLO model file, or an already-loaded ultralytics.YOLO instance.
raise_error (bool) – If True (default), raise InvalidInputError when imgsz or batch cannot be found in the model metadata. If False, missing values are returned as None.

Returns

Tuple of (imgsz, batch_size), each an int (or None if not found and raise_error is False).

Return type

Tuple[Optional[int], Optional[int]]

Raises

InvalidInputError – If raise_error is True and imgsz and/or batch is not present in the model metadata.

Example

>>> get_yolo_imgsz_and_batch_size(r'/models/best.engine')
(256, 192)
>>> get_yolo_imgsz_and_batch_size(r'/models/best.pt', raise_error=False)
(640, None)

simba.utils.yolo.keypoint_array_to_yolo_annotation_str(x, img_h, img_w, padding=None)[source]

Convert a set of keypoints into a YOLO-format annotation string that includes the normalized bounding box and keypoints.

[x_center y_center width height x1 y1 v1 x2 y2 v2 … xn yn vn]

A bounding box is derived from the keypoint extent (min/max x and y, optionally expanded by padding); the box centre, width, height and every keypoint coordinate are normalized to [0, 1] by the image width/height, and each keypoint is followed by its integer visibility flag (2 = visible, 1 = labelled/occluded, 0 = missing). Keypoints at (0, 0) are treated as missing and excluded from the bounding box.

Keypoints (x, y, visibility) are bounded by a box derived from their extent and normalized by image size into a YOLO pose annotation line

Parameters

x (np.ndarray) – Array of keypoints with shape (N, 3), where each row contains (x, y, visibility).
img_h (int) – Height of the image.
img_w (int) – Width of the image.
padding (Optional[float]) – Optional padding factor (between 0.0 and 1.0) to expand the bounding box around the keypoints.

Returns

YOLO string representation of the pose-estimation data including bounding box and keypoints.

Return type

str

Example

>>> x = np.array([[100, 200, 2], [150, 250, 2], [120, 240, 1]])
>>> keypoint_array_to_yolo_annotation_str(x=x, img_h=480, img_w=640)

simba.utils.yolo.load_yolo_model(weights_path, verbose=True, format=None, device=0)[source]

Load a YOLO model.

Parameters

weights_path (Union[str, os.PathLike]) – Path to model weights (.pt, .engine, etc).
verbose (bool) – Whether to print loading info.
format (Optional[str]) – Export format, one of VALID_FORMATS or None to skip export.
device (Union[Literal['cpu'], int]) – Device to load model on. ‘cpu’, int GPU index.

Example

>>> load_yolo_model(weights_path=r"/mnt/c/troubleshooting/coco_data/mdl/train8/weights/best.pt", format="onnx", device=0)

simba.utils.yolo.read_yolo_metadata(model)[source]

Read metadata from a YOLO model file or loaded YOLO instance.

Supports .engine (TensorRT), .pt (PyTorch), .onnx, .torchscript, and any other format that ultralytics.YOLO can load. For .engine files the embedded JSON header is read directly without loading the model. For all other formats the model is loaded via Ultralytics to extract metadata.

Parameters: model (Union[str, os.PathLike, YOLO]) – Path to a YOLO model file, or an already-loaded ultralytics.YOLO instance.
Returns: Dictionary of model metadata. Common keys: batch, imgsz, task, names, stride, fp16, dynamic.
Return type: dict
Raises: InvalidInputError – If model is not a YOLO instance, not a valid path, or has an unsupported extension.
Example

>>> meta = read_yolo_metadata('/models/best.engine')
>>> meta['batch']
192
>>> meta['imgsz']
[256, 256]
>>> meta = read_yolo_metadata('/models/best.pt')
>>> meta['task']
'detect'

simba.utils.yolo.yolo_predict(model, source, half=False, batch_size=4, stream=False, imgsz=640, iou=0.75, device=0, threshold=0.25, max_detections=300, verbose=True, retina_msk=False)[source]

Produce YOLO predictions.

See also

For recommended wrapper classes that use this function, see simba.model.yolo_inference.YoloInference, simba.model.yolo_pose_inference.YOLOPoseInference, and simba.model.yolo_seg_inference.YOLOSegmentationInference.

Parameters

model (Union[str, os.PathLike]) – Loaded ultralytics.YOLO model. Returned by load_yolo_model().
source (Union[str, os.PathLike, np.ndarray]) – Path to video, video stream, directory, image, or image as loaded array.
half (bool) – Whether to use half precision (FP16) for inference to speed up processing.
stream (bool) – If True, return a generator that yields results one by one. Useful for stream or large videos.
imgsz (int) – Size to resize input images to (square dimension). Must be positive integer.
iou (float) – If max_detections > 1, then the bbox overlap allowed to detect multiple animals.
batch_size (Optional[int]) – If stream is False, then the number of images to process in each batch.
device (Union[Literal['cpu'], int]) – Device identifier for inference. ‘cpu’ to force CPU inference. E.g., integer index of the GPU device (e.g., 0 for ‘cuda:0’).
threshold (float) – Confidence threshold for filtering predictions. Only detections with confidence >= threshold are returned. Must be between 0.0 and 1.0.
max_detections (int) – Maximum number of detections per image/frame to return.
verbose (bool) – If True, print inference progress and summary information.

Returns

YOLO results or generator of YOLO results.

Bounding-box inference 

class simba.model.yolo_inference.YoloInference(weights, video_path, verbose=False, save_dir=None, half_precision=True, device=0, batch_size=400, core_cnt=8, threshold=0.25, max_detections=300, max_per_class=None, smoothing_method=None, smoothing_time_window=None, interpolate=False, imgsz=320, bbox_size=None, stream=True)[source]

EXPECTED RUNTIMES
VIDEOS (COUNT)	FRAMES (COUNT)	TIME (S)	STDEV(S)
1	9000	19.69	0.185202592
2	18000	39.91333333	0.718424202
3	27000	59.20333333	0.29143324
4	36000	80.82	1.407870733
BATCH SIZE: 500
IMGSZ: 256
NVIDIA GeForce RTX 4070
CPU COUNT (LOADERS): 16
3 runs

EXPECTED RUNTIMES BOUNDING BOX
VIDEOS (COUNT)	FRAMES (COUNT)	TIME (S)	STDEV(S)
5	9010	11.2562	0.569887814
10	18020	20.87785	0.145593286
20	36040	41.24536667	1.867777656
BATCH SIZE: 10
IMGSZ: 256
ORIGINAL SIZE: 1280x1024
NVIDIA GeForce RTX 4070 (NVDECs: 1)
3 runs

EXPECTED RUNTIMES
VIDEOS (COUNT)	FRAMES (COUNT)	TIME (S)	STDEV(S)
1	9000	21.89	2.87
2	18000	41.83	0.48
3	27000	63.08	0.41
4	36000	84.44	1.32
5	45000	103.84	1.17
6	54000	126.29	1.22
7	63000	148.71	1.86
BATCH SIZE: 500
IMGSZ: 288
NVIDIA GeForce RTX 4070
3 runs

EXPECTED RUNTIMES
VIDEOS (COUNT)	FRAMES (COUNT)	TIME (S)	STDEV(S)
1	1500	22.41333333	1.243958735
2	3000	44.76866667	0.22300299
3	6000	66.592	1.305805881
4	9000	89.683	1.132298106
BATCH SIZE: 500
IMGSZ: 256
NVIDIA GeForce RTX 4070
CPU COUNT (LOADERS): 16
3 runs