Using the Body Tracking API

Since V4.0, the AI detection module has been split into two different modules: body tracking and object detection. Each module has its own data structures, methods and parameters. Before that, the body tracking feature was directly integrated into the object detection module.

Body Tracking Configuration

To configure the body tracking module, use BodyTrackingParameters at initialization and BodyTrackingRuntimeParameters to change specific parameters during use.

The initial configuration must be set only once when enabling the module, whereas the runtime configuration can be changed at runtime.

a new detection model BodyTrackingParameters::detection_model that enables human body detection. This preset configures the runtime and accuracy of the human body detector:
- BODY_TRACKING_MODEL::HUMAN_BODY_FAST: real-time performance even on NVIDIA® Jetson™ or low-end GPU cards
- BODY_TRACKING_MODEL::HUMAN_BODY_MEDIUM: a compromise between accuracy and speed
- BODY_TRACKING_MODEL::HUMAN_BODY_ACCURATE: state-of-the-art accuracy, requires a powerful GPU
BodyTrackingParameters::enable_body_fitting: this enables the fitting process for each detected person. It must be enabled to retrieve the local rotations of each keypoint. Otherwise, the data will be empty.
BodyTrackingParameters::body_format is the body format output by the ZED SDK. The currently supported body formats are:
- BODY_FORMAT::BODY_18: an 18-keypoint body model. This is a COCO18 format and is not directly compatible with public software like Unreal or Unity. For this reason, the local rotation and translation of each keypoint are not available with this format.
- BODY_FORMAT::BODY_34: a 34-keypoint body model. This model is compatible with public software, and all data available for BODY_18 can also be extracted with this format. The body_fitting option must be enabled to use this format.
- BODY_FORMAT::BODY_38: a 38-keypoint body model. This includes simplified face, hand and foot keypoints.
BodyTrackingParameters::body_selection (BODY_KEYPOINTS_SELECTION) selects which keypoints of the chosen body format are output:
- BODY_KEYPOINTS_SELECTION::FULL (default): outputs every keypoint of the body format.
- BODY_KEYPOINTS_SELECTION::UPPER_BODY: outputs only the keypoints from the hips up (arms, head, torso).
BodyTrackingParameters::enable_segmentation: computes a 2D mask distinguishing the pixels of each detected person from the background.
BodyTrackingParameters::max_range: sets an upper depth range (in the UNIT configured at InitParameters level) beyond which people are not detected. Defaults to InitParameters::depth_maximum_distance.
BodyTrackingParameters::allow_reduced_precision_inference: allows the AI model to run at a lower precision (FP16/INT8) to improve runtime and memory usage, at a small (typically 1-2%) cost in accuracy.
BodyTrackingParameters::prediction_timeout_s: duration, in seconds, during which the ZED SDK keeps predicting a tracked person’s position after it stops being detected, before switching its tracking state to SEARCHING. Set to 0 to disable prediction.

At runtime, BodyTrackingRuntimeParameters also exposes:

BodyTrackingRuntimeParameters::detection_confidence_threshold: minimum detection confidence (1-100) for a body to be output.
BodyTrackingRuntimeParameters::minimum_keypoints_threshold: discards a detected skeleton if fewer than this number of keypoints were detected. Useful for removing unstable fitting results when a person is partially occluded.
BodyTrackingRuntimeParameters::skeleton_smoothing: amount of temporal smoothing applied to the fitted skeleton, from 0 (none) to 1 (maximum smoothing, more latency).

The code below shows how to set the most commonly used attributes.

1 // Set initialization parameters
2 BodyTrackingParameters detection_parameters;
3 detection_parameters.detection_model = BODY_TRACKING_MODEL::HUMAN_BODY_ACCURATE; //specific to human skeleton detection
4 detection_parameters.enable_tracking = true; // Objects will keep the same ID between frames
5 detection_parameters.enable_body_fitting = true; // Fitting process is called, the user has access to all available data for a person processed by SDK
6 detection_parameters.body_format = BODY_FORMAT::BODY_34; // selects the 34 keypoints body model for SDK outputs
7 
8 // Set runtime parameters
9 BodyTrackingRuntimeParameters detection_parameters_rt;
10 detection_parameters_rt.detection_confidence_threshold = 40;

If you want to track people’s motion within their environment, you will first need to activate the positional tracking module. Then, set detection_parameters.enable_tracking to true.

1 if (detection_parameters.enable_tracking) {
2     // Set positional tracking parameters
3     PositionalTrackingParameters positional_tracking_parameters;
4     // Enable positional tracking
5     zed.enablePositionalTracking(positional_tracking_parameters);
6 }

With these parameters configured, you can enable the Body Tracking module:

1 // Enable body tracking with initialization parameters
2 zed_error = zed.enableBodyTracking(detection_parameters);
3 if (zed_error != ERROR_CODE::SUCCESS) {
4     cout << "enableBodyTracking: " << zed_error << "\nExit program.";
5     zed.close();
6     exit(-1);
7 }

The Body Tracking module requires a stereo camera equipped with an inertial sensor (IMU). The original ZED (no IMU) and the monocular ZED X One cameras are not supported. See the Body Tracking overview for the full list of supported cameras.

Getting Human Body Data

To get the detected people in a scene, grab a new image with grab(...) and extract them with retrieveBodies(). This process is exactly the same as getting new objects with the Object Detection module.

1 sl::Bodies bodies; // Structure containing all the detected bodies
2 // grab runtime parameters
3 RuntimeParameters runtime_parameters;
4 runtime_parameters.measure3D_reference_frame = sl::REFERENCE_FRAME::WORLD;
5 
6 if (zed.grab(runtime_parameters) == ERROR_CODE::SUCCESS) {
7   zed.retrieveBodies(bodies, detection_parameters_rt); // Retrieve the detected bodies
8 }

The sl::Bodies class stores all data regarding the different people present in the scene in its vector<sl::BodyData> body_list attribute. Each person’s data is stored as a sl::BodyData. sl::Bodies also contains the timestamp of the detection, which can help connect the bodies to the images.

All 2D data is related to the left image, while the 3D data is expressed in either the CAMERA or WORLD reference frame, depending on RuntimeParameters.measure3D_reference_frame (given to the grab() function). The 2D data is expressed in the initial camera resolution RESOLUTION. Scaling can be applied if the value is needed in another resolution.

For 3D data, the coordinate frame and units can be set by the user using COORDINATE_SYSTEM and UNIT, respectively. These settings are accessible through InitParameters when opening the ZED camera with the open function.

Accessing 2D and 3D body keypoints

Once a sl::BodyData is retrieved from the body vector, you can access information such as its ID, position, velocity, label, and tracking_state but also its keypoint positions and rotations.

The 2D and 3D keypoint data of a detected person are accessible in a vector of pixel keypoints keypoint_2d and a vector of 3D positions keypoint.

1 // collect all 2D keypoints
2 for (auto& kp_2d : body.keypoint_2d) {
3   // user code using each kp_2d point
4 }
5 
6 // collect all 3D keypoints
7 for (auto& kp_3d : body.keypoint)
8 {
9   // user code using each kp_3d point
10 }

See the keypoint index and name correspondence as well as the output skeleton format here.

Getting more results

When fitting is enabled at the initial configuration stage, more results become available according to the chosen BODY_FORMAT. The local rotation and translation of each keypoint become available to the user with the BODY_FORMAT::BODY_34 or BODY_FORMAT::BODY_38 format.

1 // collect local rotation for each keypoint
2 for (auto &kp : body.local_orientation_per_joint)
3 {
4    // kp is the local keypoint rotation represented by a quaternion
5    // user code
6 }
7 
8 // collect local translation for each keypoint
9 for (auto &kp : body.local_position_per_joint)
10 {
11 
12    // kp is the local keypoint translation
13    // user code
14 }
15 
16 // get global root orientation
17 auto global_root_orientation = body.global_root_orientation
18 
19 // note that global root translation is available in body.keypoint[root_index] where root_index is the root index of the body model

Both the keypoint positions and the joint orientations follow the COORDINATE_SYSTEM convention configured in InitParameters, with positional values expressed in the configured UNIT.

Understanding Joint Orientations

local_orientation_per_joint and global_root_orientation are not expressed the same way, and it is important to distinguish them before using them for retargeting or biomechanical analysis:

local_orientation_per_joint is the rotation of a keypoint relative to its parent joint in the body’s kinematic chain (the same parent/child relationship used by local_position_per_joint), not relative to the world or the root. For example, the rotation stored for LEFT_ELBOW is relative to its parent, LEFT_SHOULDER, not relative to the camera or world axes. To reconstruct the absolute (world-space) orientation of a given joint, you must compose its local orientation with the local orientations of every ancestor joint up to the root, rather than using a single joint’s value on its own.
The identity quaternion ([0, 0, 0, 1]) for a given joint’s local_orientation_per_joint corresponds to that joint’s orientation in the body fitting model’s neutral rest pose, which is a T-pose (body upright, arms extended horizontally). This is the same reference pose that the Unreal Engine and Unity integrations require a target avatar’s skeleton to be rigged in, so that the ZED SDK’s local rotations can be applied directly onto the avatar’s bones.
global_root_orientation is the measured, absolute orientation of the skeleton’s root joint (the pelvis for BODY_34 and BODY_38), expressed in the same reference frame (CAMERA or WORLD, set via RuntimeParameters.measure3D_reference_frame) and COORDINATE_SYSTEM as the rest of the 3D body data. Because it reflects the person’s actual measured orientation, it is generally not identity — it is only the identity quaternion at the instant the root joint happens to be aligned with the reference frame’s axes.

Code Example

For code examples, check out the Tutorial and Sample on GitHub.