2.4 Twoview geometry
In the previous section, we have described the geometry of a single camera. We now examine the case of two camera views capturing the same scene from two different viewpoints. Given two images, we are interested in estimating the 3D structure of the scene. The estimation of coordinates of a 3D point P can be performed in two steps. First, given a selected pixel p_{1} in the image I_{1}, the position of the corresponding pixel p_{2} in image I_{2} is estimated. Similarly to the previous section, a pair of corresponding points p_{1} and p_{2} is called a pointcorrespondence. This correspondence in points p_{1} and p_{2} comes from the projection of the same point P onto both images I_{1} and I_{2}. Second, the position of P is calculated by triangulation of the two corresponding points, using the geometry of the two cameras. The geometry of the two cameras relates to the respective position and orientation and internal geometry of each individual camera. The underlying geometry that describes the relationship between both cameras is known as the epipolar geometry. The estimation process of the epipolar geometry is known as weak calibration, as opposed to the strong calibration mentioned earlier in this chapter. The epipolar geometry addresses (among others) the following two aspects.
 Geometry of pointcorrespondence: considering a point in an image, the epipolar geometry provides a constraint on the position of the corresponding point.
 Scene geometry: given pointcorrespondences and the epipolar geometry of both cameras, a description of the scene structure can be recovered.
2.4.1 Epipolar geometry
Let us now describe the geometry of two images and define their mutual relation. Consider a 3D point P that is projected through the camera centers C_{1} and C_{2} onto two images at pixel positions p_{1} and p_{2}, respectively (see Figure 2.9(a)).
Obviously, the 3D points P, C_{1}, C_{2} and the projected points p_{1}, p_{2} are all located within one common plane. This common plane denoted π is known as the epipolar plane. The epipolar plane is fully determined by the backprojected ray going through C_{1} and p_{1} and the camera center C_{2}. The property that the previously specified points belong to the epipolar plane provides a constraint for searching pointcorrespondences. More specifically, considering the image point p_{1}, a point p_{2} lies on the intersection of the plane π with the second image plane (denoted within I_{2} in Figure 2.9(a)). The intersection of both planes corresponds to a line known as the epipolar line. Therefore, the search of pointcorrespondences can be limited to a search along the epipolar line instead of an exhaustive search in the image. Additionally, it is interesting to note that this constraint is independent of the scene structure, but instead, uniquely relies on the epipolar geometry. The epipolar geometry can be described using a 3 × 3 rank2 matrix, called the fundamental matrix F, which is defined by l_{2} = Fp_{1}. An example of two views with the computed epipolar lines superimposed onto the images is given in Figure 2.10.
We now introduce some terminology related to the epipolar geometry employed further in this thesis.
 The epipolar plane is the plane defined by a 3D point and the two cameras centers.
 The epipolar line is the line determined by the intersection of the image plane with the epipolar plane.
 The baseline is the line going through the two cameras centers.
 The epipole is the imagepoint determined by the intersection of the image plane with the baseline. Also, the epipole corresponds to the projection of the first camera center (say C_{1}) onto the second image plane (say I_{2}), or vice versa.
As previously highlighted, threedimensional structural information can be extracted from two images by exploiting the epipolar geometry. It was shown that the threedimensional structure can be extracted by determining the correspondences in the two views and that the pointcorrespondences can be searched along the epipolar line only. To simplify the search, images are typically captured such that all epipolar lines are parallel and horizontal. In this case, the search of pointcorrespondences can be performed along the horizontal raster lines of both images. However, it is difficult to accurately align and orient the two cameras such that epipolar lines are parallel and horizontal. Instead, an alternative approach is to capture two views (without alignment and orientation constraints) and transform both images to synthesize two novel views with parallel and horizontal epipolar lines. This procedure is called image rectification which will be described in the next section.
2.4.2 Image rectification
Image rectification is the process of transforming two images I_{1} and I_{2} such that their epipolar lines are horizontal and parallel. This procedure is particularly useful for depthestimation algorithms because the search of pointcorrespondences can be performed along horizontal raster image lines. Practically, the imagerectification operation corresponds to a virtual rotation of two cameras, so that they would become aligned. As illustrated by Figure 2.11, an imagerectification technique [33] consists of synthesizing a common image plane I′ and reprojecting the two images I_{1} and I_{2} onto this synthetic plane.
Figure 2.11 The imagerectification operation reprojects two images I_{1} and I_{2} onto a common synthetic image plane I′.

We now derive the transform between the original image I_{1} and rectified image I_{1}′. Let us consider a pixel p_{1} and its projections on the rectified image p_{1}′ in I_{1} and I_{1}′, respectively (illustrated by Figure 2.12). Without loss of generality, it is assumed that the camera is located at the origin of the world coordinate system.

The projection of a 3D point (X,Y,Z)^{T } onto the original and rectified images can be written as
 (2.43) 
with R_{1},K_{1} and R′,K′ being the original and (virtual) rectification matrices, respectively. By recombining both previous relations, we obtain the transformation between the original and rectified image planes defined as
 (2.44) 
Similarly, the rectification of image I_{2} is performed using the relation
 (2.45) 
When observing Equation (2.44), it can be noted that the four matrices can be combined into a 3 × 3 matrix that corresponds to a homography transform. We now provide details on the computation of the rotation and projection matrix R′ and K′.
2.4.2.0 A. Calculation of matrix R′
A single rotation matrix R′ can rotate the two cameras towards the same direction. The rotation matrix R′ = (r′^{x},r′^{y},r′^{z})^{T } can be calculated as follows.
 The row r′^{x} can be defined parallel to the baseline going through the two cameras centers C_{1} and C_{2}, leading to r′^{x}=.
 The row r′^{y} can be chosen as r′^{y} = r^{z} ×r′^{x} where r^{z} is the first row vector of the rotation matrix R_{1}. In this case, the new axis r′^{y} is perpendicular to both the new r′^{x} and the old r^{z} of matrix R_{1}.
 The row r′^{z} is defined orthogonal to r′^{x} and r′^{y} such that r′^{z} = r′^{x} ×r′^{y}.
Note that each vector is normalized by r^{k} := r^{k}∕r^{k}, where r^{k} represents the L2Norm of the vector and k {x,y,z}.
2.4.2.0 B. Calculation of matrix K′
The intrinsic parameters of K′ must be equal for both imagerectification operations and can be defined arbitrarily, e.g., K′ = .
2.4.2.0 C. Calculation of the bounding box
Basically, Equation (2.44) means that the image rectification corresponds to a homography transform. Because a homography transforms rectangles into quadrilaterals of arbitrary size, the size of the output image needs to be calculated. The bounding box calculation needs to be carried out for both rectified images and the largest bounding box should be selected as the common bounding box. The upperleft and bottomright corners of the two rectified images are denoted (min_{x},min_{y}) and (max_{x},max_{y}) (see Figure 2.13). Subsequently, the width and height of the output image can be written as w′ = max_{x} min_{x} and h′ = max_{y} min_{y}, respectively. Additionally, as illustrated at the cornerpoints in Figure 2.13, some points with positive coordinates may be projected at negative pixel positions.

Consequently, we modify the homography transforms H_{1} and H_{2} to obtain new homographies H_{1}′ and H_{2}′ that map all input pixels to positive coordinates, where H_{i}′, for i is 1 or 2, can be defined by
 (2.46) 
The matrix T corresponds to a translation matrix expressed in homogeneous coordinates.
2.4.2.0 D. Algorithm summary
The rectification homographies H′_{1} and H′_{2} can be defined as
 (2.47) 
with T, R′, K′ defined as in the previous paragraphs. As an example, Figure 2.14 depicts two rectified images with two superimposed horizontal epipolar lines.