2.4  Two-view geometry


In the previous section, we have described the geometry of a single camera. We now examine the case of two camera views capturing the same scene from two different viewpoints. Given two images, we are interested in estimating the 3D structure of the scene. The estimation of coordinates of a 3D point P can be performed in two steps. First, given a selected pixel p1 in the image I1, the position of the corresponding pixel p2 in image I2 is estimated. Similarly to the previous section, a pair of corresponding points p1 and p2 is called a point-correspondence. This correspondence in points p1 and p2 comes from the projection of the same point P onto both images I1 and I2. Second, the position of P is calculated by triangulation of the two corresponding points, using the geometry of the two cameras. The geometry of the two cameras relates to the respective position and orientation and internal geometry of each individual camera. The underlying geometry that describes the relationship between both cameras is known as the epipolar geometry. The estimation process of the epipolar geometry is known as weak calibration, as opposed to the strong calibration mentioned earlier in this chapter. The epipolar geometry addresses (among others) the following two aspects.

2.4.1  Epipolar geometry


Let us now describe the geometry of two images and define their mutual relation. Consider a 3D point P that is projected through the camera centers C1 and C2 onto two images at pixel positions p1 and p2, respectively (see Figure 2.9(a)).
PIC
(a)
 
PIC
(b)

Figure 2.9 Epipolar geometry. (a) The epipolar plane is defined by the point P and the two camera centers C1 and C2. (b) Two cameras and imge planes with an indication of the terminology of epipolar geometry.


Obviously, the 3D points P, C1, C2 and the projected points p1, p2 are all located within one common plane. This common plane denoted π is known as the epipolar plane. The epipolar plane is fully determined by the back-projected ray going through C1 and p1 and the camera center C2. The property that the previously specified points belong to the epipolar plane provides a constraint for searching point-correspondences. More specifically, considering the image point p1, a point p2 lies on the intersection of the plane π with the second image plane (denoted within I2 in Figure 2.9(a)). The intersection of both planes corresponds to a line known as the epipolar line. Therefore, the search of point-correspondences can be limited to a search along the epipolar line instead of an exhaustive search in the image. Additionally, it is interesting to note that this constraint is independent of the scene structure, but instead, uniquely relies on the epipolar geometry. The epipolar geometry can be described using a 3 × 3 rank-2 matrix, called the fundamental matrix F, which is defined by l2 = Fp1. An example of two views with the computed epipolar lines superimposed onto the images is given in Figure 2.10.


PIC
(a)
PIC
(b)

Figure 2.10 Two views of a scene with 4 superimposed epipolar lines.


We now introduce some terminology related to the epipolar geometry employed further in this thesis.

As previously highlighted, three-dimensional structural information can be extracted from two images by exploiting the epipolar geometry. It was shown that the three-dimensional structure can be extracted by determining the correspondences in the two views and that the point-corresponden-ces can be searched along the epipolar line only. To simplify the search, images are typically captured such that all epipolar lines are parallel and horizontal. In this case, the search of point-correspondences can be performed along the horizontal raster lines of both images. However, it is difficult to accurately align and orient the two cameras such that epipolar lines are parallel and horizontal. Instead, an alternative approach is to capture two views (without alignment and orientation constraints) and transform both images to synthesize two novel views with parallel and horizontal epipolar lines. This procedure is called image rectification which will be described in the next section.

2.4.2  Image rectification


Image rectification is the process of transforming two images I1 and I2 such that their epipolar lines are horizontal and parallel. This procedure is particularly useful for depth-estimation algorithms because the search of point-correspondences can be performed along horizontal raster image lines. Practically, the image-rectification operation corresponds to a virtual rotation of two cameras, so that they would become aligned. As illustrated by Figure 2.11, an image-rectification technique [33] consists of synthesizing a common image plane I′ and re-projecting the two images I1 and I2 onto this synthetic plane.
PIC
PIC

Figure 2.11 The image-rectification operation re-projects two images I1 and I2 onto a common synthetic image plane I′.


We now derive the transform between the original image I1 and rectified image I1′. Let us consider a pixel p1 and its projections on the rectified image p1′ in I1 and I1′, respectively (illustrated by Figure 2.12). Without loss of generality, it is assumed that the camera is located at the origin of the world coordinate system.


PIC
Figure 2.12 A 3D point (X,Y,Z)T is projected onto the original and rectified image planes I1 and I1′ at pixel position p1 and p1′.


The projection of a 3D point (X,Y,Z)T onto the original and rectified images can be written as

             (     )                  (     )
                X                        X
λ1p1 = K1R1  (  Y  ) and λ ′1p′1 = K ′R ′(  Y  ) ,
                Z                        Z
(2.43)

with R1,K1 and R′,K′ being the original and (virtual) rectification matrices, respectively. By recombining both previous relations, we obtain the transformation between the original and rectified image planes defined as

λ-′1 ′     ′ ′  -1  -1
λ1 p1 = K◟-R--R◝◜1-K-1◞p1.
             H1
(2.44)

Similarly, the rectification of image I2 is performed using the relation

λ ′ ′     ′ ′  -1  -1
--2p2 = K◟-R--R◝◜2-K-2◞p2.
λ2           H2
(2.45)

When observing Equation (2.44), it can be noted that the four matrices can be combined into a 3 × 3 matrix that corresponds to a homography transform. We now provide details on the computation of the rotation and projection matrix R′ and K′.

2.4.2.0  A. Calculation of matrix R′


A single rotation matrix R′ can rotate the two cameras towards the same direction. The rotation matrix R′ = (r′x,r′y,r′z)T can be calculated as follows.

Note that each vector is normalized by rk := rk∕||rk||, where ||rk|| represents the L2-Norm of the vector and k ∈{x,y,z}.

2.4.2.0  B. Calculation of matrix K′


The intrinsic parameters of K′ must be equal for both image-rectification operations and can be defined arbitrarily, e.g., K′ = K1+K2
---2--.

2.4.2.0  C. Calculation of the bounding box


Basically, Equation (2.44) means that the image rectification corresponds to a homography transform. Because a homography transforms rectangles into quadrilaterals of arbitrary size, the size of the output image needs to be calculated. The bounding box calculation needs to be carried out for both rectified images and the largest bounding box should be selected as the common bounding box. The upper-left and bottom-right corners of the two rectified images are denoted (minx,miny) and (maxx,maxy) (see Figure 2.13). Subsequently, the width and height of the output image can be written as w′ = maxx -minx and h′ = maxy -miny, respectively. Additionally, as illustrated at the corner-points in Figure 2.13, some points with positive coordinates may be projected at negative pixel positions.

PIC
Figure 2.13 Bounding box calculation of the rectified images derived from a planar homography transform.


Consequently, we modify the homography transforms H1 and H2 to obtain new homographies H1′ and H2′ that map all input pixels to positive coordinates, where Hi′, for i is 1 or 2, can be defined by

                        (               )
  ′                       1  0  - minx
H i = T Hi  with   T =  ( 0  1  - miny  ) .
                          0  0     1
(2.46)

The matrix T corresponds to a translation matrix expressed in homogeneous coordinates.

2.4.2.0  D. Algorithm summary


The rectification homographies H′1 and H′2 can be defined as
H 1′= T K ′R ′R -11K -11 and
H 2′= T K ′R ′R -21K -21,
(2.47)

with T, R′, K′ defined as in the previous paragraphs. As an example, Figure 2.14 depicts two rectified images with two superimposed horizontal epipolar lines.


PIC
(a)
PIC
(b)

Figure 2.14 Two rectified images with two superimposed horizontal epipolar lines.