SPIE Membership Get updates from SPIE Newsroom
  • Newsroom Home
  • Astronomy
  • Biomedical Optics & Medical Imaging
  • Defense & Security
  • Electronic Imaging & Signal Processing
  • Illumination & Displays
  • Lasers & Sources
  • Micro/Nano Lithography
  • Nanotechnology
  • Optical Design & Engineering
  • Optoelectronics & Communications
  • Remote Sensing
  • Sensing & Measurement
  • Solar & Alternative Energy
  • Sign up for Newsroom E-Alerts
  • Information for:

SPIE Photonics West 2019 | Call for Papers

SPIE Defense + Commercial Sensing 2019 | Call for Papers

2018 SPIE Optics + Photonics | Register Today



Print PageEmail PageView PDF

Electronic Imaging & Signal Processing

Enhancing camera-motion estimation

A novel projective Newton-on-the-manifold algorithm provides accurate computation of camera position and orientation with reduced complexity.
12 May 2009, SPIE Newsroom. DOI: 10.1117/2.1200905.1610

Estimating a moving camera's motion parameters is a challenging subfield in computer vision and robotics. In general, we first calculate and track a number of feature points for a given sequence of images, which we can then use to reconstruct the scene's initial 3D model, as well as the corresponding camera positions and orientations (‘poses’). Nevertheless, estimates of both the motion and the 3D model are susceptible to a variety of errors. If P is a point reconstructed in 3D space, it will be reprojected on the image, using these erroneous values, at a position , which is different from its real locus p (see Figure 1). To reduce the reprojection error, defined by the distance p and , we must apply nonlinear minimization to simultaneously adjust these errors for all points in all images. This is known as the ‘bundle adjustment’ (BA).1–5

For motion refinement only, BA reduces errors to estimate the pose of a moving camera. The most common applicable minimization scheme is the Levenberg-Marquardt (LM) approach,6,7 which is based on a Newton-type iterative algorithm that finds successively better approximations. With some initial estimates of the camera parameters, LM refines the reprojection errors to reach a global solution. Convergence is guaranteed, provided the initial values are sufficiently close to the global solution (otherwise instability problems might occur). As in any Newton-type algorithm, for each iteration LM computes the gradient and Hessian of the cost function defined by the reprojection errors. In addition, the Hessian used is approximate only, since LM ignores the second-order derivatives of the cost function. To enhance the convergence rate (and thus the accuracy of the results), one can use a general Newton algorithm that computes the full numerical form of the Hessian instead of LM.8 However, this leads to a significant increase in computational complexity. We, therefore, developed a projective Newton-on-the-manifold approach to enhance the accuracy of the motion estimate while not requiring significant increases in computational power.9

Figure 1. The problem setup. O: Camera center, P: Real object, p: Real position, : Erroneous reprojection.

Designing an optimization scheme on a manifold is equivalent to making the algorithm walk iteratively on the surface of the curve described by the motion variables. To do so, the manifold must be differentiable and smooth, so that we can define the distances and angles required to determine the optimization direction. Relevant examples include the Riemannian manifolds, which are equipped with tangent spaces. These allow smooth transitions from one point to another.10–12

The variables defining 3D rigid motion are described by the special Euclidean group SE3, a Riemannian manifold. Its tangent space is defined by the associated Lie algebra (se3). Our projected Newton method exploits the local parametrization of the motion variables on the manifold using the associated Lie algebra. Therefore, we determine the optimization direction (i.e., the gradient and Hessian computation of the cost function) for each iteration on the tangent space. We subsequently project the result back onto SE3to update the motion variables (see Figure 2). This back projection is possible since se3 is related to SE3 through exponential mapping. Finally, we repeat these steps until the minimization converges to the global solution. By analyzing the projected Newton algorithm, we found its computational requirements are 60% lower than for a general Newton algorithm (since the evaluation of both the Hessian and the gradient is simpler).9 We can now use the full numerical form of the Hessian with a numerical complexity close to that of LM. This will also render the results of the projective Newton scheme more accurate. To visualize this, in Figure 3 we show an example application of the two algorithms to the Dinosaur sequence, provided by the visual geometry group at the University of Oxford (UK).

Figure 2. Projective Newton algorithm on the manifold. SE3: Special Euclidian group (a Riemannian manifold), se3: Associated Lie algebra, M: Camera-motion estimate, : 6D Euclidean space.

In conclusion, application of a projective Newton-on-the-manifold algorithm to refine a camera's motion parameters is more accurate than using LM, since the former uses the full numerical form of the Hessian in the optimization. In addition, the complexity involved in the Hessian computation is 60% lower than for a general Newton algorithm and close to that required by LM. We will next integrate our technique in structure-from-motion problems and explore the underlying manifold structure of 3D objects.

Figure 3. Example application of (left) LM and (right) the projective Newton-on-the-manifold algorithm to the Dinosaur sequence.

Michel Sarkis, Klaus Diepold
Lehrstuhl für Datenverarbeitung
Technical University Munich
Munich, Germany