2.3 Camera calibration
Camera calibration involves the estimation of both extrinsic and intrinsic camera parameters. This section does not present a new calibration algorithm, but instead shortly describes a parameterestimation technique for calibrating a multiview camera setup. To compute the camera parameters, a practical technique [108] based on a calibration rig is employed. As for correcting the lens distortion, the estimation of camera parameters is based on a calibration rig with known geometry, such as a checkerboard pattern. Using various perspective views of the checkerboard, the algorithm estimates the position, orientation and internal parameters of the camera. The process of estimating all camera parameters is known as strong calibration, which will be addressed in the sequel.
2.3.1 Camera parameters estimation
Let us consider a planar checkerboard that defines a righthanded 3D world coordinate system (see Figure 2.8). The first stage of the camera calibration procedure is to establish correspondences between 2D points in the image and 3D points on the checkerboard, i.e., socalled pointcorrespondences. In practice, a robust extraction of pointcorrespondences can be performed by exploiting the topological structure of the checkerboard pattern [90].

Because each 3D feature point belongs to the board plane Z = 0, the projection of each 2D point onto the image plane can be written as
 (2.23) 
which can be reformulated to
 (2.24) 
where r_{i} corresponds to the i^{th} column of the rotation matrix R and t = RC. Because the matrix product K is a 3 × 3 matrix, it corresponds to a socalled planar homography transform, which is typically denoted as a matrix H. A planar homography transform is a nonsingular linear relation between two planes [37]. In our case, the homography transform defines a linear mapping of points between the planar checkerboard (3D position) and the image plane (pixel position). The calculation of the camera parameters requires the estimation of the homography transform H:
 (2.25) 
2.3.1.0 A. Homography transform estimation
The homography H can be estimated by detecting pointcorrespondences p_{i} ↔P_{i}′ with p_{i} = (x_{i},y_{i},1)^{T } being the pixel position and P_{i}′ = (X_{i},Y _{i},1)^{T } being the world coordinates of the corresponding feature point of index i on the checkerboard. Using the previously introduced notation, Equation (2.25) may be written as
 (2.26) 
In order to eliminate the scaling factor, one can calculate the cross product of each term of Equation (2.26), leading to
 (2.27) 
and since p_{i} ×p_{i} = 0_{3}, Equation (2.27) may be written as
 (2.28) 
Writing the matrix product HP_{i}′ as
 (2.29) 
with h^{mT } being the transpose ^{2} of the m th row of H, we rework Equation (2.28) into
 (2.30) 
Since Equation (2.30) is linear in h^{mT } and noting that h^{mT }P_{i}′ = P_{i}^{′T }h^{m}, Equation (2.30) may be reformulated as
 (2.31) 
Note that the rows of the previous matrix are not linearly independent: the third row is the sum of x_{i} times the first row and y_{i} times the second row. Thus, for each pointcorrespondence, Equation (2.31) provides two linearly independent equations. The two first rows are typically used for solving H. Because the homography transform is written using homogeneous coordinates, the homography H is defined using 8 parameters plus a free 9^{th} homogeneous scaling factor. Therefore, at least 4 pointcorrespondences providing 8 equations are required to compute the homography. Practically, a larger number of correspondences is employed so that an overdetermined linear system is obtained. By rewriting H in a vector form as h = [h_{11},h_{12},h_{13},h_{21},h_{22},h_{23},h_{31},h_{32},h_{33}]^{T }, n pairs of pointcorrespondences enable the construction of a 2n × 9 linear system, which is expressed by (i corresponds to the index of the feature point, hence i = 1 for first two rows, i = 2 for the second two rows, etc.)
 (2.32) 
Solving this linear system involves the calculation of a Singular Value Decomposition (SVD). Such an SVD corresponds to reworking the matrix to the form of the matrix product C = UDV ^{T }, where the solution h corresponds to the last column of the matrix V . To avoid numerical instabilities, the coordinates of pointcorrespondences should be normalized. This technique is known as the normalized Direct Linear Transformation (DLT) algorithm [37]. It should be noted that a nonlinear refinement of the solution may be performed afterwards. However, such a nonlinear refinement does not yield a significant improvement and moreover, it involves the calculation of a Jacobian matrix, required for the nonlinear LevenbergMarquardt optimization.
2.3.1.0 B. Calculation of intrinsic parameters
Assuming that the homography transform H is calculated, the intrinsic parameters can be recovered using the apriori knowledge that r_{1} and r_{2} are orthonormal. Denoting the homography transform as H = [h_{1} h_{2} h_{3}], it can be derived from Equation (2.24) that [h_{1} h_{2}] = λ′K ⋅ [r_{1} r_{2}] which is equivalent to K^{1} ⋅ [h_{1} h_{2}] = λ′[r_{1} r_{2}], where λ′ is a homogeneous factor. Assuming that the vectors r_{1}, r_{2} of the rotation matrix R are orthonormal, then the vectors r_{1} and r_{2} are perpendicular to each other and have equal norm ^{3}, so that
 (2.33) 
The internal term K^{T }K^{1} appears regularly, which for further use is defined as B and can be written as
 (2.34) 
This matrix covers the intrinsic parameters of the camera, such as the focal length α = f, aspectratio related parameter β = ηf and skew γ = τ. This description corresponds to the notation adopted in the literature [108]. By writing B in a vector form and omitting the double terms from the symmetry in B, we find a reduced vector b with only 6 terms, thus b = [B_{11},B_{12},B_{22},B_{13},B_{23},B_{33}]^{T }. Referring to Equation (2.33), and starting with the bottom equation and generalizing it with the difference in index in the top equation, we can write the general expression of h_{k}^{T }Bh_{l} as
 (2.35) 
with
 (2.36) 
and h_{k} being the k^{th} column vector of H with h_{k} = [h_{k1},h_{k2},h_{k3}]^{T }. For each homography or image of the checkerboard, Equation (2.33) can be written as
 (2.37) 
Note that the vector b which summarizes the intrinsic parameters, is a 6element vector so that 6 equations are necessary to recover all camera parameters. Therefore, since each homography provides 2 linear equations, at least 3 homographies or captured images are sufficient. Assuming that n homography transforms are known, the linear system composed of n instances of Equation (2.37) can be written as
 (2.38) 
with V referring to a 2n × 6 matrix. This linear system can be solved by employing the standard technique of Singular Value Decomposition. Using the computed solution vector b, the intrinsic parameters can be derived [108] as follows. The computed vector b = [B_{11},B_{12},B_{22},B_{13},B_{23},B_{33}]^{T } is expanded into the symmetric matrix B, by inserting again the omitted symmetric matrix elements. Rewriting the matrix B as B = λK^{T }K^{1} (Equation 2.34), we finally find the intrinsic parameters as
 (2.39) 
2.3.1.0 C. Calculation of extrinsic parameters
The position and orientation of the camera can be recovered from Equation (2.24) and written as
 (2.40) 
Given the vectors r_{1} and r_{2} and bearing in mind that the rotation matrix is orthogonal, the third vector r_{3} to complete the matrix R is found by
 (2.41) 
where × denotes a cross product. The scaling parameters are calculated by λ_{1} = 1∕∥K^{1}h_{1}∥ and λ_{2} = 1∕∥K^{1}h_{2}∥ and λ_{3} = (λ_{1} + λ_{2})∕2. Theoretically, we have λ_{1} = λ_{2}. However, due to inaccuracies in the featurepoint estimation procedure, both terms may differ and have to be treated separately.
2.3.1.0 D. Nonlinear refinement of the camera parameters
To refine the obtained camera parameters, it is possible to perform a nonlinear minimization of a projection function. This function is projecting the 3D points onto the image plane and accumulating the differences between corresponding points. The function is specified by
 (2.42) 
where j denotes the image index and i corresponds to the pointcorrespondence index.
2.3.1.0 E. Summary of camera calibration steps
The camera calibration algorithm can be summarized as follows.
 Capture N (at least 3) images with the checkerboard pattern and estimate pointcorrespondences.
 Estimate and correct the radial lens distortion.
 For each captured image, calculate the N homography transforms.
 Using the N homography transforms, calculate the intrinsic and extrinsic parameters.
 Refine the calculated camera parameters as explained in Section D above.
As opposed to the standard technique [108], we perform the correction of the nonlinear radial lens distortion prior to the camera calibration procedure. This slight modification yields more accurate results for obtaining the homographies H_{i} and, afterwards, more accurate calibration parameters.