2.3  Camera calibration

Camera calibration involves the estimation of both extrinsic and intrinsic camera parameters. This section does not present a new calibration algorithm, but instead shortly describes a parameter-estimation technique for calibrating a multi-view camera setup. To compute the camera parameters, a practical technique [108] based on a calibration rig is employed. As for correcting the lens distortion, the estimation of camera parameters is based on a calibration rig with known geometry, such as a checkerboard pattern. Using various perspective views of the checkerboard, the algorithm estimates the position, orientation and internal parameters of the camera. The process of estimating all camera parameters is known as strong calibration, which will be addressed in the sequel.

2.3.1  Camera parameters estimation

Let us consider a planar checkerboard that defines a right-handed 3D world coordinate system (see Figure 2.8). The first stage of the camera calibration procedure is to establish correspondences between 2D points in the image and 3D points on the checkerboard, i.e., so-called point-correspondences. In practice, a robust extraction of point-correspondences can be performed by exploiting the topological structure of the checkerboard pattern [90].

Figure 2.8 A planar checkerboard pattern defines a 3D world coordinate system, and additionally enables the detection of 3D feature points.

Because each 3D feature point belongs to the board plane Z = 0, the projection of each 2D point onto the image plane can be written as

                               (        )
 (  x )          [           ]     X
λ(  y ) =  [K |0 ]  R   - RC    ||   Y    || ,
               3   0T3    1     (  Z = 0 )
    1                               1

which can be reformulated to

  ( x )                       (  X  )
λ ( y )  =     K [ r r  t ]   (  Y  ) ,
    1      homography transform H   1

where ri corresponds to the ith column of the rotation matrix R and t = -RC. Because the matrix product K[ r r  t ]
  1  2 is a 3 × 3 matrix, it corresponds to a so-called planar homography transform, which is typically denoted as a matrix H. A planar homography transform is a non-singular linear relation between two planes [37]. In our case, the homography transform defines a linear mapping of points between the planar checkerboard (3D position) and the image plane (pixel position). The calculation of the camera parameters requires the estimation of the homography transform H:

  (    )   (                ) (    )
  ( x  )   (  h11  h12  h13 ) ( X  )
λ    y   =    h21  h22  h23      Y   .
     1     ◟--h31--h◝3◜2--h33-◞    1
               homography H
(2.25)  A. Homography transform estimation

The homography H can be estimated by detecting point-correspondences pi ↔Pi′ with pi = (xi,yi,1)T being the pixel position and Pi′ = (Xi,Y i,1)T being the world coordinates of the corresponding feature point of index i on the checkerboard. Using the previously introduced notation, Equation (2.25) may be written as
λpi = HP i.

In order to eliminate the scaling factor, one can calculate the cross product of each term of Equation (2.26), leading to

pi × (λpi) = pi × (HP i),

and since pi ×pi = 03, Equation (2.27) may be written as

pi × HP ′i = 03.

Writing the matrix product HPi′ as

       ⌊ h1T P ′⌋
    ′  ⌈   2T i′⌉
HP i =   h 3TPi′  ,
         h   Pi

with hmT being the transpose 2 of the m -th row of H, we rework Equation (2.28) into

           (     )   [       ]   [                    ]
              xi       h1TP ′i        yih3T P′i - h2T P′i
pi × HP ′i =   yi   ×   h2TP ′i  =    h1T P′i - xih3TPi′   = 03.
              1        h3TP ′i       xih2T P′i - yih1T P′i

Since Equation (2.30) is linear in hmT and noting that hmT Pi′ = Pi′T hm, Equation (2.30) may be reformulated as

⌊   0T3    - P′iT   yiPi′T  ⌋ ⌊ h1  ⌋
⌈   P ′T     0T    - x P ′T ⌉ ⌈ h2  ⌉ = 0 .
     i ′T     3′T     iTi        3      9
  - yiP i   xiPi      03       h

Note that the rows of the previous matrix are not linearly independent: the third row is the sum of -xi times the first row and -yi times the second row. Thus, for each point-correspondence, Equation (2.31) provides two linearly independent equations. The two first rows are typically used for solving H. Because the homography transform is written using homogeneous coordinates, the homography H is defined using 8 parameters plus a free 9th homogeneous scaling factor. Therefore, at least 4 point-correspondences providing 8 equations are required to compute the homography. Practically, a larger number of correspondences is employed so that an over-determined linear system is obtained. By rewriting H in a vector form as h = [h11,h12,h13,h21,h22,h23,h31,h32,h33]T , n pairs of point-correspondences enable the construction of a 2n × 9 linear system, which is expressed by (i corresponds to the index of the feature point, hence i = 1 for first two rows, i = 2 for the second two rows, etc.)

                                                       (      )
(  0    0   0  - X   - Y   - 1  y X      y X     y   )    h11
|                 1     1        1  1     1 1     1  | ||  h12 ||
||  X1  Y1   1   0     0    0   - x1X1   - x1Y1  - x1 || ||  h13 ||
|  0    0   0  - X2  - Y2  - 1  y2X2     y2X2    y2  | ||  h21 ||
||  X2  Y2   1   0     0    0   - x2X2   - x2Y2  - x2 || ||  h22 || = 09.
||   ...   ...   ...   ...     ...     ...     ...        ...      ...  || ||  h23 ||
|(                                                    |) |  h31 |
   0    0   0 - Xn   - Yn  - 1  ynXn    ynXn     yn    |(  h   |)
◟--Xn--Yn---1---0-----0---◝0◜-----xnXn---- xnYn--- xn--◞    32
                          C                               h33

Solving this linear system involves the calculation of a Singular Value Decomposition (SVD). Such an SVD corresponds to reworking the matrix to the form of the matrix product C = UDV T , where the solution h corresponds to the last column of the matrix V . To avoid numerical instabilities, the coordinates of point-correspondences should be normalized. This technique is known as the normalized Direct Linear Transformation (DLT) algorithm [37]. It should be noted that a non-linear refinement of the solution may be performed afterwards. However, such a non-linear refinement does not yield a significant improvement and moreover, it involves the calculation of a Jacobian matrix, required for the non-linear Levenberg-Marquardt optimization.  B. Calculation of intrinsic parameters

Assuming that the homography transform H is calculated, the intrinsic parameters can be recovered using the a-priori knowledge that r1 and r2 are orthonormal. Denoting the homography transform as H = [h1 h2 h3], it can be derived from Equation (2.24) that [h1 h2] = λ′K ⋅ [r1 r2] which is equivalent to K-1 ⋅ [h1 h2] = λ′[r1 r2], where λ′ is a homogeneous factor. Assuming that the vectors r1, r2 of the rotation matrix R are orthonormal, then the vectors r1 and r2 are perpendicular to each other and have equal norm 3, so that
  T   -T    -1
(hT1K -T  ⋅K-1 h2) = 0T, -T    - 1
h1 K    ⋅K   h1 = h 2K    ⋅K   h2.

The internal term K-T K-1 appears regularly, which for further use is defined as B and can be written as

                (  B11  B12  B13  )
B = K  -TK - 1 = ( B    B    B    )
                     12    22   23
  (     1          B13γ  B23  B33       γoy-βox          )
       α2         - α2β                --α2β--
= ||   - -γ2-       γ2+2α22            (-γ2-α2)2o2y+-βγox       || .
  (  γo α-ββo   (- γ2- αα2β)o +βγo   (γ2+ α2)o2-2βαγβoxoy+β2o2+α2β2 )
     --yα2β-x  -----α2βy2----x  -------y---α2β2-----y-----

This matrix covers the intrinsic parameters of the camera, such as the focal length α = f, aspect-ratio related parameter β = ηf and skew γ = τ. This description corresponds to the notation adopted in the literature [108]. By writing B in a vector form and omitting the double terms from the symmetry in B, we find a reduced vector b with only 6 terms, thus b = [B11,B12,B22,B13,B23,B33]T . Referring to Equation (2.33), and starting with the bottom equation and generalizing it with the difference in index in the top equation, we can write the general expression of hkT Bhl as

hT Bh  =  vTb,
  k   l    kl


v  =   [h  h  ,h  h  + h  h  ,h  h ,
 kl    h k1h l1+ kh1 lh2 ,h k2h l1+ kh2 l2h ,h  h  ]T ,
        k3 l1   k1 l3  k3 l2    k2 l3  k3 l3

and hk being the kth column vector of H with hk = [hk1,hk2,hk3]T . For each homography or image of the checkerboard, Equation (2.33) can be written as

[             ]
  (v11 - v22)T  b = 0.

Note that the vector b which summarizes the intrinsic parameters, is a 6-element vector so that 6 equations are necessary to recover all camera parameters. Therefore, since each homography provides 2 linear equations, at least 3 homographies or captured images are sufficient. Assuming that n homography transforms are known, the linear system composed of n instances of Equation (2.37) can be written as

V b = 0,

with V referring to a 2n × 6 matrix. This linear system can be solved by employing the standard technique of Singular Value Decomposition. Using the computed solution vector b, the intrinsic parameters can be derived [108] as follows. The computed vector b = [B11,B12,B22,B13,B23,B33]T is expanded into the symmetric matrix B, by inserting again the omitted symmetric matrix elements. Rewriting the matrix B as B = λK-T K-1 (Equation 2.34), we finally find the intrinsic parameters as

oy =   B11B222-B212  ,
λ = B33 - B13+oy(B12BB13-B11B23),
    ∘ --λ-          11
α = ∘ -B11,-----
β =    --λB11--2,
    - BB11αB22β2-B12
γ = ---1λ2--,
ox = γoβy-  B13αλ2-.
(2.39)  C. Calculation of extrinsic parameters

The position and orientation of the camera can be recovered from Equation (2.24) and written as
r1 = λ1K -1h1,
r2 = λ2K   h2,
t = - RC = λ3K  -1h3.

Given the vectors r1 and r2 and bearing in mind that the rotation matrix is orthogonal, the third vector r3 to complete the matrix R is found by

r3 = r1 × r2,

where × denotes a cross product. The scaling parameters are calculated by λ1 = 1∕∥K-1h1∥ and λ2 = 1∕∥K-1h2∥ and λ3 = (λ1 + λ2)∕2. Theoretically, we have λ1 = λ2. However, due to inaccuracies in the feature-point estimation procedure, both terms may differ and have to be treated separately.  D. Non-linear refinement of the camera parameters

To refine the obtained camera parameters, it is possible to perform a non-linear minimization of a projection function. This function is projecting the 3D points onto the image plane and accumulating the differences between corresponding points. The function is specified by
∑N ∑n         (      [              ]   )
       ∥ pij - [K |03]  Rj   - RjCj   Pij  ∥2,
 j  i                   0      1

where j denotes the image index and i corresponds to the point-correspondence index.  E. Summary of camera calibration steps

The camera calibration algorithm can be summarized as follows.
  1. Capture N (at least 3) images with the checkerboard pattern and estimate point-correspondences.
  2. Estimate and correct the radial lens distortion.
  3. For each captured image, calculate the N homography transforms.
  4. Using the N homography transforms, calculate the intrinsic and extrinsic parameters.
  5. Refine the calculated camera parameters as explained in Section D above.

As opposed to the standard technique [108], we perform the correction of the non-linear radial lens distortion prior to the camera calibration procedure. This slight modification yields more accurate results for obtaining the homographies Hi and, afterwards, more accurate calibration parameters.

2For simplicity, we denote (hm)T = hmT, according to the notation used in [37].

3Note that the first multiplication represents an inner product and the second equation leads to a scalar.