Camera calibration involves the estimation of both extrinsic and intrinsic camera parameters. This section does not present a new calibration algorithm, but instead shortly describes a parameter-estimation technique for calibrating a multi-view camera setup. To compute the camera parameters, a practical technique  based on a calibration rig is employed. As for correcting the lens distortion, the estimation of camera parameters is based on a calibration rig with known geometry, such as a checkerboard pattern. Using various perspective views of the checkerboard, the algorithm estimates the position, orientation and internal parameters of the camera. The process of estimating all camera parameters is known as strong calibration, which will be addressed in the sequel.
Let us consider a planar checkerboard that defines a right-handed 3D world coordinate system (see Figure 2.8). The first stage of the camera calibration procedure is to establish correspondences between 2D points in the image and 3D points on the checkerboard, i.e., so-called point-correspondences. In practice, a robust extraction of point-correspondences can be performed by exploiting the topological structure of the checkerboard pattern .
Because each 3D feature point belongs to the board plane Z = 0, the projection of each 2D point onto the image plane can be written as
which can be reformulated to
where ri corresponds to the ith column of the rotation matrix R and t = -RC. Because the matrix product K is a 3 × 3 matrix, it corresponds to a so-called planar homography transform, which is typically denoted as a matrix H. A planar homography transform is a non-singular linear relation between two planes . In our case, the homography transform defines a linear mapping of points between the planar checkerboard (3D position) and the image plane (pixel position). The calculation of the camera parameters requires the estimation of the homography transform H:
The homography H can be estimated by detecting point-correspondences pi ↔Pi′ with pi = (xi,yi,1)T being the pixel position and Pi′ = (Xi,Y i,1)T being the world coordinates of the corresponding feature point of index i on the checkerboard. Using the previously introduced notation, Equation (2.25) may be written as
In order to eliminate the scaling factor, one can calculate the cross product of each term of Equation (2.26), leading to
and since pi ×pi = 03, Equation (2.27) may be written as
Writing the matrix product HPi′ as
Note that the rows of the previous matrix are not linearly independent: the third row is the sum of -xi times the first row and -yi times the second row. Thus, for each point-correspondence, Equation (2.31) provides two linearly independent equations. The two first rows are typically used for solving H. Because the homography transform is written using homogeneous coordinates, the homography H is defined using 8 parameters plus a free 9th homogeneous scaling factor. Therefore, at least 4 point-correspondences providing 8 equations are required to compute the homography. Practically, a larger number of correspondences is employed so that an over-determined linear system is obtained. By rewriting H in a vector form as h = [h11,h12,h13,h21,h22,h23,h31,h32,h33]T , n pairs of point-correspondences enable the construction of a 2n × 9 linear system, which is expressed by (i corresponds to the index of the feature point, hence i = 1 for first two rows, i = 2 for the second two rows, etc.)
Solving this linear system involves the calculation of a Singular Value Decomposition (SVD). Such an SVD corresponds to reworking the matrix to the form of the matrix product C = UDV T , where the solution h corresponds to the last column of the matrix V . To avoid numerical instabilities, the coordinates of point-correspondences should be normalized. This technique is known as the normalized Direct Linear Transformation (DLT) algorithm . It should be noted that a non-linear refinement of the solution may be performed afterwards. However, such a non-linear refinement does not yield a significant improvement and moreover, it involves the calculation of a Jacobian matrix, required for the non-linear Levenberg-Marquardt optimization.
Assuming that the homography transform H is calculated, the intrinsic parameters can be recovered using the a-priori knowledge that r1 and r2 are orthonormal. Denoting the homography transform as H = [h1 h2 h3], it can be derived from Equation (2.24) that [h1 h2] = λ′K ⋅ [r1 r2] which is equivalent to K-1 ⋅ [h1 h2] = λ′[r1 r2], where λ′ is a homogeneous factor. Assuming that the vectors r1, r2 of the rotation matrix R are orthonormal, then the vectors r1 and r2 are perpendicular to each other and have equal norm 3, so that
The internal term K-T K-1 appears regularly, which for further use is defined as B and can be written as
This matrix covers the intrinsic parameters of the camera, such as the focal length α = f, aspect-ratio related parameter β = ηf and skew γ = τ. This description corresponds to the notation adopted in the literature . By writing B in a vector form and omitting the double terms from the symmetry in B, we find a reduced vector b with only 6 terms, thus b = [B11,B12,B22,B13,B23,B33]T . Referring to Equation (2.33), and starting with the bottom equation and generalizing it with the difference in index in the top equation, we can write the general expression of hkT Bhl as
and hk being the kth column vector of H with hk = [hk1,hk2,hk3]T . For each homography or image of the checkerboard, Equation (2.33) can be written as
Note that the vector b which summarizes the intrinsic parameters, is a 6-element vector so that 6 equations are necessary to recover all camera parameters. Therefore, since each homography provides 2 linear equations, at least 3 homographies or captured images are sufficient. Assuming that n homography transforms are known, the linear system composed of n instances of Equation (2.37) can be written as
with V referring to a 2n × 6 matrix. This linear system can be solved by employing the standard technique of Singular Value Decomposition. Using the computed solution vector b, the intrinsic parameters can be derived  as follows. The computed vector b = [B11,B12,B22,B13,B23,B33]T is expanded into the symmetric matrix B, by inserting again the omitted symmetric matrix elements. Rewriting the matrix B as B = λK-T K-1 (Equation 2.34), we finally find the intrinsic parameters as
The position and orientation of the camera can be recovered from Equation (2.24) and written as
Given the vectors r1 and r2 and bearing in mind that the rotation matrix is orthogonal, the third vector r3 to complete the matrix R is found by
where × denotes a cross product. The scaling parameters are calculated by λ1 = 1∕∥K-1h1∥ and λ2 = 1∕∥K-1h2∥ and λ3 = (λ1 + λ2)∕2. Theoretically, we have λ1 = λ2. However, due to inaccuracies in the feature-point estimation procedure, both terms may differ and have to be treated separately.
To refine the obtained camera parameters, it is possible to perform a non-linear minimization of a projection function. This function is projecting the 3D points onto the image plane and accumulating the differences between corresponding points. The function is specified by
where j denotes the image index and i corresponds to the point-correspondence index.
The camera calibration algorithm can be summarized as follows.
- Capture N (at least 3) images with the checkerboard pattern and estimate point-correspondences.
- Estimate and correct the radial lens distortion.
- For each captured image, calculate the N homography transforms.
- Using the N homography transforms, calculate the intrinsic and extrinsic parameters.
- Refine the calculated camera parameters as explained in Section D above.
As opposed to the standard technique , we perform the correction of the non-linear radial lens distortion prior to the camera calibration procedure. This slight modification yields more accurate results for obtaining the homographies Hi and, afterwards, more accurate calibration parameters.
3Note that the first multiplication represents an inner product and the second equation leads to a scalar.