Using the Body Tracking API
Since V4.0, the AI detection module has been split into two different modules: body tracking and object detection. Each module has its own data structures, methods and parameters. Before that, the body tracking feature was directly integrated into the object detection module.
Body Tracking Configuration
To configure the body tracking module, use BodyTrackingParameters at initialization and BodyTrackingRuntimeParameters to change specific parameters during use.
The initial configuration must be set only once when enabling the module, whereas the runtime configuration can be changed at runtime.
-
a new detection model
BodyTrackingParameters::detection_modelthat enables human body detection. This preset configures the runtime and accuracy of the human body detector:-
BODY_TRACKING_MODEL::HUMAN_BODY_FAST: real-time performance even on NVIDIA® Jetson™ or low-end GPU cards -
BODY_TRACKING_MODEL::HUMAN_BODY_MEDIUM: a compromise between accuracy and speed -
BODY_TRACKING_MODEL::HUMAN_BODY_ACCURATE: state-of-the-art accuracy, requires a powerful GPU
-
-
BodyTrackingParameters::enable_body_fitting: this enables the fitting process for each detected person. It must be enabled to retrieve the local rotations of each keypoint. Otherwise, the data will be empty. -
BodyTrackingParameters::body_formatis the body format output by the ZED SDK. The currently supported body formats are:-
BODY_FORMAT::BODY_18: an 18-keypoint body model. This is a COCO18 format and is not directly compatible with public software like Unreal or Unity. For this reason, the local rotation and translation of each keypoint are not available with this format. -
BODY_FORMAT::BODY_34: a 34-keypoint body model. This model is compatible with public software, and all data available forBODY_18can also be extracted with this format. The body_fitting option must be enabled to use this format. -
BODY_FORMAT::BODY_38: a 38-keypoint body model. This includes simplified face, hand and foot keypoints.
-
-
BodyTrackingParameters::body_selection(BODY_KEYPOINTS_SELECTION) selects which keypoints of the chosen body format are output:-
BODY_KEYPOINTS_SELECTION::FULL(default): outputs every keypoint of the body format. -
BODY_KEYPOINTS_SELECTION::UPPER_BODY: outputs only the keypoints from the hips up (arms, head, torso).
-
-
BodyTrackingParameters::enable_segmentation: computes a 2D mask distinguishing the pixels of each detected person from the background. -
BodyTrackingParameters::max_range: sets an upper depth range (in theUNITconfigured atInitParameterslevel) beyond which people are not detected. Defaults toInitParameters::depth_maximum_distance. -
BodyTrackingParameters::allow_reduced_precision_inference: allows the AI model to run at a lower precision (FP16/INT8) to improve runtime and memory usage, at a small (typically 1-2%) cost in accuracy. -
BodyTrackingParameters::prediction_timeout_s: duration, in seconds, during which the ZED SDK keeps predicting a tracked person’s position after it stops being detected, before switching its tracking state toSEARCHING. Set to0to disable prediction.
At runtime, BodyTrackingRuntimeParameters also exposes:
BodyTrackingRuntimeParameters::detection_confidence_threshold: minimum detection confidence (1-100) for a body to be output.BodyTrackingRuntimeParameters::minimum_keypoints_threshold: discards a detected skeleton if fewer than this number of keypoints were detected. Useful for removing unstable fitting results when a person is partially occluded.BodyTrackingRuntimeParameters::skeleton_smoothing: amount of temporal smoothing applied to the fitted skeleton, from0(none) to1(maximum smoothing, more latency).
The code below shows how to set the most commonly used attributes.
If you want to track people’s motion within their environment, you will first need to activate the positional tracking module. Then, set detection_parameters.enable_tracking to true.
With these parameters configured, you can enable the Body Tracking module:
The Body Tracking module requires a stereo camera equipped with an inertial sensor (IMU). The original ZED (no IMU) and the monocular ZED X One cameras are not supported. See the Body Tracking overview for the full list of supported cameras.
Getting Human Body Data
To get the detected people in a scene, grab a new image with grab(...) and extract them with retrieveBodies(). This process is exactly the same as getting new objects with the Object Detection module.
The sl::Bodies class stores all data regarding the different people present in the scene in its vector<sl::BodyData> body_list attribute. Each person’s data is stored as a sl::BodyData. sl::Bodies also contains the timestamp of the detection, which can help connect the bodies to the images.
All 2D data is related to the left image, while the 3D data is expressed in either the CAMERA or WORLD reference frame, depending on RuntimeParameters.measure3D_reference_frame (given to the grab() function). The 2D data is expressed in the initial camera resolution RESOLUTION. Scaling can be applied if the value is needed in another resolution.
For 3D data, the coordinate frame and units can be set by the user using COORDINATE_SYSTEM and UNIT, respectively. These settings are accessible through InitParameters when opening the ZED camera with the open function.
Accessing 2D and 3D body keypoints
Once a sl::BodyData is retrieved from the body vector, you can access information such as its ID, position, velocity, label, and tracking_state but also its keypoint positions and rotations.
The 2D and 3D keypoint data of a detected person are accessible in a vector of pixel keypoints keypoint_2d and a vector of 3D positions keypoint.
See the keypoint index and name correspondence as well as the output skeleton format here.
Getting more results
When fitting is enabled at the initial configuration stage, more results become available according to the chosen BODY_FORMAT. The local rotation and translation of each keypoint become available to the user with the BODY_FORMAT::BODY_34 or BODY_FORMAT::BODY_38 format.
Both the keypoint positions and the joint orientations follow the COORDINATE_SYSTEM convention configured in InitParameters, with positional values expressed in the configured UNIT.
Understanding Joint Orientations
local_orientation_per_joint and global_root_orientation are not expressed the same way, and it is important to distinguish them before using them for retargeting or biomechanical analysis:
-
local_orientation_per_jointis the rotation of a keypoint relative to its parent joint in the body’s kinematic chain (the same parent/child relationship used bylocal_position_per_joint), not relative to the world or the root. For example, the rotation stored forLEFT_ELBOWis relative to its parent,LEFT_SHOULDER, not relative to the camera or world axes. To reconstruct the absolute (world-space) orientation of a given joint, you must compose its local orientation with the local orientations of every ancestor joint up to the root, rather than using a single joint’s value on its own. -
The identity quaternion (
[0, 0, 0, 1]) for a given joint’slocal_orientation_per_jointcorresponds to that joint’s orientation in the body fitting model’s neutral rest pose, which is a T-pose (body upright, arms extended horizontally). This is the same reference pose that the Unreal Engine and Unity integrations require a target avatar’s skeleton to be rigged in, so that the ZED SDK’s local rotations can be applied directly onto the avatar’s bones. -
global_root_orientationis the measured, absolute orientation of the skeleton’s root joint (the pelvis forBODY_34andBODY_38), expressed in the same reference frame (CAMERAorWORLD, set viaRuntimeParameters.measure3D_reference_frame) andCOORDINATE_SYSTEMas the rest of the 3D body data. Because it reflects the person’s actual measured orientation, it is generally not identity — it is only the identity quaternion at the instant the root joint happens to be aligned with the reference frame’s axes.
Code Example
For code examples, check out the Tutorial and Sample on GitHub.

