Introduction

Depth from Stereo

The ZED reproduces the way our binocular vision works. Human eyes are horizontally separated by about 65 mm on average. Thus, each eye has a slightly different view of the world around. By comparing these two views, our brain can infer not only depth but also 3D motion in space. Likewise, the ZED has two eyes separated by 12 cm which capture a high-resolution 3D video of the scene and estimate depth and motion by comparing the displacement of pixels between the left and right images.

Depth map

The ZED stores a distance value (Z) for each pixel (X, Y) in the image. The distance is expressed in metric units (meters for example) and calculated from the back of the left eye of the camera to the scene object.

Depth maps captured by the ZED cannot be displayed directly as they are encoded on 32 bits. To display the depth map, a monochrome (grayscale) 8-bit representation is necessary with values between [0, 255], where 255 represents the closest possible depth value and 0 the most distant possible depth value.

3-D Point Cloud

Another common way of representing depth information is by a 3-D point cloud. A point cloud can be seen as a depth map in three dimensions. While a depth map only contains the distance or Z information for each pixel, a point cloud is a collection of 3D points (X,Y,Z) that represent the external surface of the scene and can contain color information.

For more information on Depth Sensing, see the Using the API page.