\
\
Basic DefinitionsWhen we capture an image, the 3D world coordinates (x,y,z) undergo two critical transformations:
\ This process follows the path: WORLD → CAMERA → IMAGE. This sequence is known as the Forward Imaging Model. The specific projection from the camera coordinate system to the 2D image plane is called Perspective Projection. Through this process, a 3D object is converted into a 2D image.
\ In certain applications, we need to reverse this process - obtaining 3D coordinates from a 2D point in an image. ==This challenge is referred to as finding depth in an image==. Essentially, we ask: "Given any point in the image (U,V), can we determine its corresponding point in the 3D world?" When analyzing this reverse process, we discover that while information is lost during projection, we haven't completely lost the "X" and "Y" information. This gives us hope for recovery with additional contextual information. Since we know the 2D coordinates, we can determine that the 3D point must exist somewhere along a specific projection line (visualized as dotted green lines in diagram below). This same line that was used in the original forward projection can assist us in the reverse mapping process as well.
\
\
Nature has provided us with an elegant solution for perceiving depth through binocular vision. Our two eyes, positioned horizontally at a distance from each other (interpupillary distance), each capture a slightly different perspective of the world. This principle is the foundation of stereoscopic vision and depth perception.
The Stereo Vision Process\ This stereo vision approach is the fundamental principle behind many depth-sensing technologies, including 3D reconstruction, autonomous navigation systems, and mixed reality applications.
The points (ur, vr) and (ul, vl) represent the same 3D point as it appears in the right and left camera images respectively. These corresponding points are crucial for stereo vision as they establish how the same physical point projects differently onto each camera's image plane. The horizontal displacement between these corresponding points is called disparity and is inversely proportional to depth - objects closer to the cameras have larger disparities than distant objects.
\ The coordinate system in stereo vision is carefully defined to facilitate calculations:
To compute the 3D coordinates of point P(x,y,z) using stereo vision:
The actual 3D coordinates can be calculated using these formulas (for a simplified rectified case):
\ Where f is the focal length of the cameras.
Applications and ImpactStereo vision serves as the foundation for numerous depth-sensing technologies and applications:
\ This fundamental approach to recovering 3D information from 2D projections continues to evolve with advancements in computational efficiency, matching algorithms, and integration with other sensing modalities.
ReferencesAll Rights Reserved. Copyright , Central Coast Communications, Inc.