[

Abstract

In computer vision, camera pose estimation from correspondences between 3D geometric entities and their projections into the image has been a widely investigated problem. Although most state-of-the-art methods exploit low-level primitives such as points or lines, the emergence of very effective CNN-based object detectors in the recent years has paved the way to the use of higher-level features carrying semantically meaningful information. Pioneering works in that direction have shown that modelling 3D objects by ellipsoids and 2D detections by ellipses offers a convenient manner to link 2D and 3D data. However, the mathematical formalism most often used in the related litterature does not enable to easily distinguish ellipsoids and ellipses from other quadrics and conics, leading to a loss of specificity potentially detrimental in some developments. Moreover, the linearization process of the projection equation creates an over-representation of the camera parameters, also possibly causing an efficiency loss. In this paper, we therefore introduce an ellipsoid-specific theoretical framework and demonstrate its beneficial properties in the context of pose estimation. More precisely, we first show that the proposed formalism enables to reduce the ellipsoid pose estimation problem to a position or orientation-only estimation problem in which the remaining unknowns can be derived in closed-form. Then, we demonstrate that it can be further reduced to a 1 Degree-of-Freedom (1DoF) problem and provide the analytical expression of the pose as a function of that unique scalar unknown. We illustrate our theoretical considerations by visual examples. Finally, we release this work in order to contribute towards more efficient resolutions of ellipsoid-related pose estimation problems.

Pose estimation, Object modeling, Ellipsoid, Ellipse
\jyear

2022

Perspective--Ellipsoid]Perspective--Ellipsoid: Formulation, Analysis and Solutions of the Ellipsoid Pose Estimation Problem in Euclidean Space

[1,2]\fnmVincent \surGaudillière

2]\fnmGilles \surSimon

2]\fnmMarie-Odile \surBerger

1]\orgdivSnT - Interdisciplinary Centre for Security, Reliability and Trust, \orgnameUniversity of Luxembourg, \orgaddress\street29 Avenue John F. Kennedy, \postcodeL-1855, \cityLuxembourg, \countryLuxembourg

2]\orgnameLoria - Inria - Université de Lorraine, \orgaddress\streetCampus Scientifique, BP 239, \postcodeF-54506, \cityVandoeuvre-lès-Nancy, \countryFrance

1 Introduction

Estimating the relative pose between a camera and a scene has been representing a core aspect of computer vision for many years. Indeed, this task is at the root of many applications, from robot navigation (Bonin-FontOO08) to Augmented Reality (MarchandUS16).

Historically, pose estimation has been addressed by leveraging 2D-3D correspondences between low-level geometric features such as points (PP: Perspective--Point)(LepetitMF09; Hartley2004) or lines (PL: Perspective--Line)(PnL). More recently, the field has been significantly impacted by the raise of deep learning and pose estimation is now widely addressed by end-to-end trainable methods (HoqueAXMW21). However, while deep learning has proven to be indispensable in solving the problem of perception, it is still not the best choice in terms of accuracy throughout all steps of a pose estimation pipeline, as figured out in a recent object pose estimation challenge (KisantalSPIMD20). Indeed, in this challenge, the two most accurate methods were hybrid approaches in which keypoints are located by deep regression models then used as inputs to a PP-based solver.

Following the same proven hybrid approach, the appearance of very effective object detectors in the recent years (Redmon_2016_CVPR; Redmon_2017_CVPR; YOLOv3; LiuAESRFB16; Girshick_2014_CVPR; Girshick_2015_ICCV) has been enabling the substitution of low-level primitives (e.g. points, lines), often extracted in droves and carrying limited semantic information, by object-level features providing a deeper scene understanding at a lower computational matching cost. Therefore, the choice of object representation has become crucial.

While modeling 3D objects by cuboids along with their 2D projections by bounding boxes (i.e. outputs of most object detectors) has been investigated in the context of pose computation (context_relevance_iros_2017; wide_baseline_2018; LiMD19), it appears that the ellipse-ellipsoid modeling paradigm has the unparalleled advantage of analytically linking 2D and 3D models (Hartley2004; Eberly-backproj). In other words, ellipsoids always project onto planes in the form of ellipses, and the underlying closed-form projection equation can be leveraged to efficiently solve pose estimation problems (Crocco_2016_CVPR; 7919240; IROS; ISMAR; IROS2; RAL). As an indicator of that increasingly attractive research direction, more and more object detectors have been modeling object projections by ellipses instead of traditional bounding boxes whose sides are parallel to image axes (Li19; PanFWZR21; ZhaoJFLY21; ZinsSB20; DongRPI21; abs-2101-05212).

However, performing pose estimation at the level of objects through ellipse/ellipsoid-modeling has mainly been formulated under the standard projective geometry formalism (Hartley2004) (Crocco_2016_CVPR; Gay_2017_ICCV; 7919240; QSLAM; ROB-059; abs-2004-05303; abs-2110-08977), and mostly through least square estimations where the unknowns are general quadric surfaces. This framework may present limitations since ellipses (resp. ellipsoids) are specific categories of conics (quadrics) and since the linearization of the projection equation increases the numbers of apparent unknowns (see Section 2.1). In addition, these papers do not address the case of a small number of ellipse-ellipsoid correspondences, which is of high practical importance when a few objects are observed or when computing the solutions with minimal sampling size in RANSAC algorithms (FischlerB81).

In this paper, we address the fundamental problem of camera pose estimation from one ellipse-ellipsoid correspondence, referred to as Perspective--Ellipsoid (PE) in what follows. Conversely, it consists in reconstructing an ellipsoid of known size from its projected ellipse given the camera intrinsic parameters (ellipsoid pose estimation).

There are several interests in solving the PE problem. First, on the theoretical side, and except in the case of a spherical object, we demonstrate that the solutions are a variety of dimension 1 and we provide an effective way to reconstruct the camera trajectory (i.e. solutions). This problem has been addressed in WokesP10 in the particular case of a spheroid (specific ellipsoid having an axis of revolution) but was never considered for general ellipsoids. To the best of our knowledge, we are the first to propose a constructive solution to the PE problem without resorting to any additional approximation nor prior knowledge.

The proposed formalism relies on Cartesian coordinates instead of homogeneous ones (Hartley2004), and this enables us to develop an ellipsoid-specific framework. In this study, we also consider two particular cases of important practical interest: (i) computing the ellipsoid position when the orientation is known (ii) computing the orientation when the position is known.

Besides the theoretical aspects, solving the PE problem opens the way towards automatic positioning solutions in texture-less or low-textured environments, for instance leveraging several ellipse-ellipsoid correspondences or one with several point pairs. Industrial or other indoor scenes, in which objects would be approximated by ellipsoids, represent concrete places that could take advantage of these results.

The paper is organized as follows: In Section 2.1, we discuss the current State-of-the-Art object-based pose estimation methods and the limits of the homogeneous representation of ellipses and ellipsoids. In Section 3, we present the Euclidean formulation of the ellipsoid pose estimation problem, previously introduced in Eberly-backproj. Sections 4, 5 and 6 contain our core contributions:

  • In Section 4, we exhibit several mathematical properties of the PE solutions needed in the demonstrations of sections 5 and 6.

  • Section 5 is dedicated to the specific case of PE where either the position or the orientation of the ellipsoid is known. We bring out that the problem formulation induces an inherent decoupling between orientation and position, one of which being possibly inferred in closed-form from the other one. The orientation to position solver was introduced in IROS; ISMAR then leveraged in IROS2; RAL.

  • The general PE problem is solved in Section 6. We demonstrate that the 6DoF ellipsoid pose estimation problem can be reduced to a 1DoF problem, and present an effective way to reconstruct the solutions.

2 Related Work

2.1 Quadrics-based Pose Estimation

Most methods proposing to solve pose estimation problems at the level of objects using ellipsoid modeling are based on the projective geometry formalism. Any quadric is thus linked to its projected conic by the Equation

where is the camera projection matrix (Hartley2004) and are the dual forms of and ( up to a scale factor for a non-singular symmetric matrix ). It is important noting that, under that formalism, ellipses and ellipsoids can be distinguished from other conics and quadrics only by certain algebraic conditions on and entries, these conditions being difficult to leverage in practice.

Quadric-modeling of objects has often been implemented in the context of Semantic SLAM (ROB-059) for improving the process accuracy through multi-objective optimization (QSLAM; abs-2110-08977; abs-2004-05303). On a theoretical level, Crocco_2016_CVPR addresses the object-based Structure-from-Motion (SfM) problem and introduces an analytical solution to reconstruct both quadric and affine camera poses. The problem is solved in a least square sense with a matrix over-represented by a Kronecker product . This work is extended with CAD model priors in Gay_2017_ICCV, while 7919240 present a closed-form solution to the problem of reconstructing a quadric from three calibrated pinhole camera views in which the object projections are detected. However, in this method, nothing ensures that the reconstructed quadric is an ellipsoid, forcing the authors to add a costly post-processing non-linear optimization of the results.

Another limitation while using homogeneous quadrics and over-representation of appears in P12Q. In this method, the so-called gold-standard algorithm used to retrieve the camera projection matrix from 2D-3D point correspondences (Hartley2004) is adapted to conic-quadric correspondences. To compute , 12 conic-quadric pairs are required whereas only 6 point pairs are sufficient in the same context.

We argue that these limitations may be due in part to the fact that ellipse and ellipsoid homogeneous formulations are not clearly enough distinguished from other members of their geometric families, and also to a non-minimal representation of the projection matrix. In our paper and to overcome these difficulties, we thus propose an ellipsoid-specific theoretical framework and highlight its advantages.

2.2 Perspective--Spheroid

In WokesP10, a comprehensive study of the spheroid pose estimation problem is conducted. To this end, the authors introduce a spheroid-specific parameterization of the problem that enables solving it but prevent from extrapolating the method to the general case of ellipsoids.

In a nutshell, the authors demonstrates that the spheroid pose estimation problem has two distinct solutions. In 6.4, we retrieve the same result by restricting our general formulation to the case of spheroids.

3 Formulation of the Ellipsoid Pose Estimation Problem

3.1 Problem Statement

Following the notations introduced in Eberly-backproj and presented in Fig. 1, we consider an ellipsoid defined by Equation

where is the center of the ellipsoid, is a real positive definite matrix characterizing its orientation and size, and is any point on it.

Given a center of projection and a projection plane of normal which does not contain , the projection of the ellipsoid is an ellipse of center and of semi-diameters et . Its principal directions are represented by unit vectors and , such that is an orthonormal set.

Illustrating the projection plane, projection center, ellipsoid and projected ellipse.
Figure 1: Illustrating the projection plane, projection center, ellipsoid and projected ellipse.

3.1.1 Projection Cone

The projection cone refers to the cone of vertex tangent to the ellipsoid. According to Eberly-backproj, it is characterized by matrix

where , so that the points belonging to the projection cone are those who satisfy the Equation . Note that is a real, symmetric and invertible matrix which has two eigenvalues of the same sign and the third one of the opposite sign.

3.1.2 Backprojection Cone

The backprojection cone refers to the cone generated by the lines passing through and any point on the ellipse. Eberly shows that it is characterized by matrix

where

Here again, the points on the backprojection cone are those who meet , while shares properties (real, symmetric, invertible with signature (2,1) or (1,2)).

3.1.3 The Cone Alignment Equation

Given an ellipsoid, a central projection (center and plane) and an ellipse in the projection plane, the ellipse is the projection of the ellipsoid if and only if the projection and backprojection cones are aligned (Eberly-backproj), i.e. if and only if there is a non-zero scalar such that :

(1)

where .

Note that encapsulates the relative configuration between the camera center and the ellipsoid center . It is thus negative if is outside the ellipsoid and positive otherwise.

Equation (1), to whom we refer as the Cone Alignment Equation, encodes the ellipsoid pose estimation problem. An equivalent formulation is given by Equation (1) (see proof of equivalence in Appendix A):

(1’)

3.2 Pose Problem Analysis

While Equation (1) has been established in the camera coordinate frame, it is also valid in the ellipsoid coordinate frame, where matrix is diagonal, and in the cone coordinate frame where is diagonal. Since and are symmetric, both can be diagonalized using an orthogonal matrix such that . For that reason, Equation (1) remains the same whatever the choice of the coordinate frame (camera, ellipsoid or cone) in which matrices and vectors are expressed. In the following, if there is no restriction on the coordinate frame, we will adopt notations without subscript. Otherwise, subscripts , or will be used.

One can theoretically distinguish between camera pose estimation, which consists in estimating the pose of the camera with respect to its environment, and its reference frame counterpart, i.e. object pose estimation, which consists in estimating the pose of an object with respect to the camera. Fundamentally, the sought transformations are the inverse of each other thus, in this paper, we may focus on estimating the pose of the ellipsoid or the pose of the camera, according to the most convenient setup for mathematical developments, without loss of generality.

The givens of the problem are the ellipse detected in the image, the camera intrinsic parameters and the ellipsoid size (i.e. lengths of its three radii). Therefore, , and are known, as well as , and (whose entries are all zero). Since expressions of and in specific coordinate frames are known, their eigenvalues are also known. In addition, matrix properties such as trace and determinant, that are fully constrained by the eigenvalues, are also known.

In the paper, we most often work in the camera frame. We thus aim at computing vector and matrix

from which we then retrieve the ellipsoid position and orientation .

In the other case, we aim at computing and to then derive the camera position and orientation .

Finally, vector encodes the relative position between the ellipsoid and the camera, while couple characterizes their relative orientation, denoted . Indeed, its expression in the camera frame is

while its expression in the ellipsoid frame is

recalling that and are known.

Solving Equation (1) therefore consists in determining the value(s) of and the expressions of in a common coordinate frame (camera or ellipsoid).

4 Properties of the Solutions

In this section, we demonstrate some properties of the solutions of (1) and exhibit relationships between the different variables. These results are new except Result 1, which was demonstrated in Eberly-backproj. It must be noted that the positive definite nature of matrix , i.e. what differentiates an ellipsoid from any other quadric, plays a fundamental role in the demonstrations of these properties.

4.1 Link with a Generalized Eigenvalue Problem

Let be a set of solutions of Equation (1). Result 1 shows that they are also solutions of a generalized eigenvalue problem (GoluVanl96).

Result 1.

If and satisfy

(1)

they also satisfy

(2)
Proof.

Let’s right-multiply (1) by . Since is a scalar, the right hand term can be simplified:

Since is invertible, the generalized eigenvectors and eigenvalues of pair are the eigenvectors and eigenvalues of matrix .

4.2 Generalized Eigenvalues of

Result 2.

The couple has exactly two distinct generalized eigenvalues, that are non-zero and of opposite signs.

Proof.

The generalized eigenvalues of are non-zero because is not singular.

We can then observe that

is an annihilator polynomial of (see proof in Appendix B):

(3)

In linear algebra, the minimal polynomial is defined as the monic annihilator polynomial having the lowest possible degree. It can be shown (lang2002) that (i) divides any annihilator polynomial and (ii) the roots of are identical to the roots of the characteristic polynomial. Since is an annihilator polynomial of degree 2, we can thus infer that , and thus , has at most two distinct eigenvalues.

We are now going to prove by contradiction that has exactly two distinct eigenvalues. Let’s thus assume that the couple has only one eigenvalue with multiplicity 3 denoted .

Since is positive definite and is symmetric, the couple has the following properties (GoluVanl96) (Corollary 8.7.2, p. 462):

  1. their generalized eigenvalues are real,

  2. their reducing subspaces are of the same dimension as the multiplicity of the associated eigenvalues,

  3. their generalized eigenvectors form a basis of , and those with distinct eigenvalues are -orthogonal.

According to property 2. above, we have

i.e.

which is impossible because represents an ellipsoid whereas represents a cone. So has exactly two distinct generalized eigenvalues.

Let’s then denote (multiplicity 1) and (multiplicity 2) these two eigenvalues. Observing that and are the generalized eigenvalues of , we can write, according to minimax (Theorem 3)

If and were of the same sign, then , would be of that sign (since ). Yet, it is impossible since is neither positive nor negative definite (cone). We thus conclude that the two distinct eigenvalues are of opposite signs. ∎

4.3 Characterization of

Let’s denote (multiplicity 1) and (multiplicity 2) the two generalized eigenvalues of .

Result 3.

is the generalized eigenvalue of with multiplicity 1:

(4)
Proof.

Let’s consider and the generalized eigenvalues and eigenvectors of , such that .

We are going to prove (4) by contradiction.

Let’s suppose that there is such that is solution of Equation (1).

By injecting these values into (1), we therefore have

where

According to property 2 of the proof of Result 2,

whence, since A is invertible,

However, defining

the subspace of dimension 2 orthogonal to , we observe that, ,

Since is positive definite, , whence it comes

It means that

whence the direct sum

is a subspace of of dimension

We end up with a contradiction since .

As a result, triplets cannot be solutions of (1), thus solutions are necessarily in the form , where . ∎

4.4 Characterizations of

Result 4 demonstrates that the secondary scalar variable is also closely linked to the generalized eigenvalues of .

Result 4.

is equal to the ratio between the two generalized eigenvalues of :

(5)
Proof.

Trace of is given by its eigenvalues:

Whence, by squaring the matrix,

Therefore, since and , applying the operator to Equation (3) leads to

which is equivalent to

i.e.

Since and are of opposite signs, Equation (5) shows that .

Result 5 now highlights the connection between and .

Result 5.

and are linked through Equation (6):

(6)
Proof.

Determinant of is given by its eigenvalues:

i.e.

(7)

One obtains (6) by injecting (5) into (7) and using . ∎

4.5 Link between and

Result 6.

The scalar and the camera-ellipsoid distance are linked through Equation (8):

(8)
Proof.

Injecting Equations (4) and (5) into (1) leads to

Applying then squaring:

(9)

Furthermore, injecting (4) into (7) leads to the following expression for :

(10)

Equation (8) is then obtained by injecting (10) into (9). ∎

5 Decoupling between Orientation and Position

In this section, we consider two sub-problems of significant practical interest: (i) computing the ellipsoid position when the orientation is known and (ii) computing the orientation when the position is known. We demonstrate that the position can be inferred in closed-form from the orientation (Section 5.1, Result 7), while the latter can be analytically derived from the former up to the ellipsoid symmetries (Section 5.2, Result 8). We assimilate these properties to a decoupling phenomenon between orientation and position.

5.1 Position from Orientation

In this case, or is known. Since and are also known, eigenvalues of , (using Result 3) and (Result 4) can be retrieved. Result 7 then provides that is unique and fully determined.

Result 7.

Assuming that the relative camera-ellipsoid orientation is known, their relative position is given by

(11)

where

Proof.

Injecting Equations (4), (5) and into Equation (1) leads to

Whence, by applying ,

The two vectors define ellipsoid centers that are symetric with respect to the camera center . The only one that satisfy the chirality constraint (ellipsoid located in front of the camera) is the one whose dot product with vector is negative (see Fig. 1). ∎

Result 7 highlights the fact that solving Equation (1) may consist only in determining or . Indeed, and can then be uniquely derived. In other words, the relative position is fully constrained by the orientation.

As mentioned above, this result is of high practical interest. Indeed, getting the camera orientation, e.g. from physical sensors or image analysis, is usually easier than getting the camera position, and especially indoors where the GPS is useless. In a multi-object scene, the fact that only one ellipse-ellipsoid association is needed to compute the pose allows using a RANSAC-like strategy with low combinatorial cost both to detect the wrong associations and to choose the correct one when a label is shared by several objects.

A re-localization algorithm based on this strategy was presented in (ISMAR). The system operates in real time from YOLO detections (YOLOv3) and IMU data or vanishing points – both methods were assessed. Figure 2 shows a few qualitative results obtained with images from the standard RGB-D TUM dataset (SturmEEBC12). Quantitative results as well as a detailed analysis of the advantages and limitations of this algorithm can be found in (ISMAR). Other applications of Result 7 are presented in (IROS; IROS2; RAL).

Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset
Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset
Figure 2: Camera relocalization based on the decoupling between orientation and position, here applied to images from the RGB-D TUM dataset (ISMAR). The first row shows the detected boxes, the inscribed ellipsoids (in yellow) and the outlines of the reprojected ellipsoids (in green), with the automatically generated labels. The blue ellipses correspond to the reprojected ellipsoids when the QuadricSLAM residual error is minimized (QSLAM). The second row shows the reprojected ellipsoids (in green the inliers, in white the outliers and undetected ellipsoids) when using the method in (ISMAR).

5.2 Orientation from Position

In this case or is known. and are unknown, but since their eigenvalues are known, , , and are known. can thus be deduced from Result 6 and from Result 5. Result 8 then explains how to retrieve or and how to derive the orientation in closed-form, up to the cone or ellipsoid symmetries.

Result 8.

Assuming that the relative camera-ellipsoid position is known, their relative orientation is given by eigenvectors of

(1)

in the ellipsoid reference frame, or of

(12)

in the camera reference frame.

Proof.

Injecting (2) into (1) leads to

whence (12) by isolating . ∎

Once for instance is retrieved, one can compute its eigenvalue decomposition:

For a triaxial ellipsoid (), there are 4 solutions for . Therefore, the ellipsoid orientation can be analytically derived from its position up to the ellipsoid symmetries.

6 Closed-form Solutions

In this section, we introduce the core contribution of the paper, that is closed-form solutions to the general PE problem. Explicit 1DoF solutions are provided based on the fact that the ellipsoid is triaxial (all eigenvalues are different) or not, and the cone is circular (two eigenvalues are equal) or not.

In Section 6.1, we first present the different types of ellipsoids and cones along with their possible co-occurences. An overview of the solutions is given in Section 6.2. In Section 6.3, we consider the case of a triaxial ellipsoid and present for the first time a Necessary and Sufficient Condition (NSC) on to be solution of Equation (1). Then we derive the analytical expressions of the other variables as functions of . In Section 6.4, we address the case of the spheroid (ellipsoid with an axis of revolution). That part enables to retrieve, from another formalism, the results presented in WokesP10. In Section 6.5, we finally present the solutions for the sphere.

In what follows, the problem is solved either in the ellipsoid or in the cone coordinate frame. In brief, the choice is linked to the ability to define a frame associated to the considered structure without ambiguities. The case of the triaxial ellipsoid can thus be addressed in both frames since the two structures are unambiguous. The case of the spheroid is different, and depending on the properties of the cone, solutions are derived in one or the other frame.

6.1 Preliminaries: Co-occurences of Ellipsoid and Cone Types

In this paper, we address the full ellipsoid pose estimation problem, i.e. we cover every possible types of ellipsoids and thus cones (see Appendix D). However, a specific type of ellipsoid cannot necessarily be tangent to any type of cone, what we refer to as possible or impossible co-occurence.

Obviously, only circular cones can be tangent to a sphere. Furthermore, we are going to prove that only a non-circular elliptic cone can be tangent to a triaxial ellipsoid.

Let’s prove it by contradiction, and assume that the projection cone has a revolution axis (circular cone).

Let’s also assume that the ellipsoid center does not belong to that axis. Since the ellipsoid is tangent to the cone, any new ellipsoid obtained by rotating the original one around the cone revolution axis shall still be tangent to the cone, thus be solution of (1). Yet, in this case, the locus of ellipsoid centers would be a circle located in a plane orthogonal to that axis and whose center would belong to it. Every center would thus be at a fixed distance to the cone vertex , whence there would be an infinite number of solutions for the same , given (8). However, this contradicts Equation (19). Therefore, the center of the ellipsoid must belong to the revolution axis of the cone.

If the ellipsoid center belonged to the cone revolution axis, then would be parallel to that axis, i.e. would be an eigenvector of , whence of given (1). However, in such a case, the symmetries of the cone-ellipsoid pair would impose that the tangent ellipse (intersection between the ellipsoid and the polar plane derived from (Wylie)) belongs to a plane orthogonal to the cone revolution axis, that is also a principal axis of the ellipsoid. Therefore, that tangent ellipse should be both a circle (orthogonal section of a circular cone) and a non-circular ellipse (section of an ellipsoid by a plane parallel to one of its principal planes), which is impossible. Therefore, the cone cannot have a revolution axis.

In brief, Table 1 summarizes the possible and impossible co-occurences between ellipsoid and cone types.

Ellipsoid
Triaxial Spheroid Sphere
Projection cone Non-circular
Circular
Table 1: Possible co-occurences of ellipsoids and projection cones according to their types. ✓ indicates that ellipsoid and projection cone of the corresponding types may occur simultaneously. indicates that they cannot.

6.2 Overview of the Solutions

In the rest of this section, we determine the solutions of the Cone Alignment Equation (1), and derive the camera-ellipsoid relative poses. To this end, we distinguish between the three different types of ellipsoids (triaxial, spheroid, sphere). We demonstrate, in particular, that

  • there is an infinite number of triaxial fixed-size ellipsoids that are tangent to a given backprojection cone (Fig. 2),

  • as already demonstrated in WokesP10, there are only two fixed-size spheroids solutions (see Fig. 4).

In the first case, the infinite number of ellipsoids tangent to the cone (or, conversely, the infinite number of cones tangent to the ellipsoid) explains the infinite number of camera solutions (see Fig. 5), and provides a parameterization of them. In the second case, the infinite number of change of basis matrices between the spheroid and the camera explains the infinite number of camera solutions (see Fig. 6). The mathematical developments leading to these results are presented below.

Loci of the centers (black) and principal axes endpoints (red, green, blue) of the ellipsoids solutions. A video is available
Figure 3: Loci of the centers (black) and principal axes endpoints (red, green, blue) of the ellipsoids solutions. A video is available22footnotemark: 2.
The two spheroids solutions with a non-circular backprojection cone.
Figure 4: The two spheroids solutions with a non-circular backprojection cone.
Triaxial ellipsoid: locus of cone vertices
Figure 5: Triaxial ellipsoid: locus of cone vertices i.e. camera centers.
Spheroid: locus of cone vertices
Figure 6: Spheroid: locus of cone vertices i.e. camera centers.

6.3 The Triaxial Ellipsoid

6.3.1 Solving for

In practice, not all values give rise to a solution of the problem. When the ellipsoid has three distinct radii (triaxial), Theorem 1 provides a characterization of the scalars solutions of (1).

Theorem 1.

Let’s denote the three distinct eigenvalues of . Then is solution of Equation (1) if and only if the three entries of vector are all non-negative:

(13)
(14)

with

(15)
(16)

It must be noted that is the only unknown parameter of vector V(m) since all the other ones derive from and eigenvalues.

Proof.

Let’s assume that Equation (1) is satisfied:

(1)

Therefore, equivalent Equation (1) is also satisfied:

(1)

Since the trace of a product of matrices does not depend on the order of the matrices, we have

and, similarly

Given, in addition, , applying to Equations (1) and (1) leads to the following system:

Although the two scalar unknowns and appear in the right hand side, they can be expressed as functions of a third unknown. Indeed, denoting

and

Equation (6) can be rewritten

whence, by raising it to the power of ,

(17)

Furthermore, given

we have

Therefore, is solution of the following system with unknown :

(18)

The above equations are independent from the considered coordinate frame. Considering the ellipsoid frame and denoting the corresponding expression of , the above system can be rewritten:

i.e.

Since eigenvalues are all different (triaxial ellipsoid), Vandermonde matrix is not singular (cf Appendix C). Therefore, the system can be inverted:

(19)

Left hand side elements are all non-negative, whence the result.

Let’s now assume that the three entries of are all non-negative.

Let be a vector such that

Such a definition is possible due to the positivity of the three entries.

One can therefore demonstrate, with the help of a formal calculus software (the corresponding Maple code is provided in Appendix E as reference), that, irrespective of the sign assumptions made for entries, the matrix

has the same eigenvalues with same multiplicities as . Given that both matrices are diagonalizable (since symmetric), this amounts to say that they are similar, and thus that Equation (1) is satisfied. ∎

6.3.2 Solving for Camera Poses

Theorem 2.

Considering a triaxial ellipsoid, each solution of (1) gives rise to eight backprojection cones tangent to the ellipsoid. These cones are symmetric with respect to the three principal planes of the ellipsoid (see Fig. 7).

In addition, each backprojection cone defines two camera solutions (see Fig. 8).

Illustrating the eight backprojection cones tangent to the triaxial ellipsoid for a given
Figure 7: Illustrating the eight backprojection cones tangent to the triaxial ellipsoid for a given value.
Proof.

Theorem 1 provides a NSC on to be solution of (1). Moreover, its proof exhibits that vectors solutions are expressed in the ellipsoid frame in the form

(20)

where

There are thus eight vectors solutions for a given (thus ), and they are symmetric with respect to the three principal planes of the ellipsoid.

The cone vertices (i.e. camera positions) can then be derived:

Illustrating the two cameras compatible with each backprojection cone tangent to the triaxial ellipsoid.
Figure 8: Illustrating the two cameras compatible with each backprojection cone tangent to the triaxial ellipsoid.

Let us now solve for camera orientations. Equation (1) provides the expression of :

(21)

Orientations of the cameras then verify:

Since the cone is non-circular (see Section 6.1), and eigenvectors are defined with minimum ambiguity. By arbitrarily fixing the directions of eigenvectors for instance, it then remains four ways of choosing the directions of eigenvectors so that the change of basis matrix is a rotation matrix. Yet, over the four resulting orientations, only two leads to an ellipsoid located in front of the camera.

To summarize, to each corresponds sixteen camera poses covering eight different positions (see Fig. 8).

6.4 The Spheroid

When the ellipsoid has a revolution axis (i.e. spheroid), we use a different approach since Vandermonde matrix is now singular and thus cannot be inverted. We then determine the set of spheroids tangent to the backprojection cone, and distinguish between the two possible types of cone. It is worth noting that this problem has already been addressed in WokesP10 using a different parameterization. The authors especially show that in the general case (non-circular elliptic cone), there are only two tangent spheroids, and we retrieve this result below.

6.4.1 The Non-circular Elliptic Cone

Let us first consider a non-circular elliptic cone. Expressing the Cone Alignment Equation in the cone coordinate frame, and given that the three eigenvalues are different, solutions can be characterized in a similar way to Theorem 1.

Result 9.

Let’s denote the three distinct eigenvalues of . Then is solution of Equation (1) if and only if the three entries of vector are all non-negative:

(22)
(23)
Proof.

The proof is based on the exact same arguments as the proof of Theorem 1. In particular, is related to the expression of in the cone frame:

It is interesting noting that the above result is also valid in the case of a triaxial ellipsoid since the cone is then non-circular (Section 6.1). It can therefore be used to reconstruct the ellipsoids in the camera coordinate frame (see Fig. 2).

Unlike the triaxial ellipsoid for which there is an infinite number of solutions, each one giving rise to a fixed number (16) of camera poses, we demonstrate in Theorem 3 that there is only one solution for the spheroid, which gives rise to an infinite number of camera poses.

Let’s consider (multiplicity 1) and (multiplicity 2) the eigenvalues of . Let’s also consider the eigenvalues of , where and have the same sign (opposed to the sign of ). Finally, let’s assume, even if it means exchanging the roles, that .

Theorem 3.

Considering a spheroid along with a non-circular backprojection cone, there is only one value solution of Equation (1):

(24)

That value gives rise to two spheroids tangent to the cone, that are symmetric with respect to one of the cone principal planes (see Fig. 4).

Proof.

According to Theorem 9, is solution of (1) if and only if the three entries of the following vector are all non-negative:

Yet, developing the right hand term leads to the following system:

where

and where

The locus of scalars solutions is the subset of on which , and are all non-negative.

To study the variations of these three polynomials, four cases–described in Table 2 and depending on the relative order of and eigenvalues–need to be considered. Only case #1 is addressed below, since other cases can be solved using a similar reasoning.

#1 #2
#3 #4
Table 2: Configurations of the problem depending on and eigenvalues.

Let’s denote the root of with multiplicity 1, and the root with multiplicity 2, such that

In configuration #1, studying the variations of every leads to only one solution to obtain simultaneous non-negative values, which is the root of with multiplicity 2: (the proof is given in Appendix G).

The unique solution is then given by

after replacing by its expression as a function of and eigenvalues.

Furthermore, since is a root of , vectors expressed in the cone frame verify:

Then

Since the sign of the third entry () is fixed under the chirality constraint (ellipsoid located in front of the camera), there remains two possible expressions for . The two resulting vectors are symmetric with respect to the cone principal plane whose normal is the eigenvector corresponding to eigenvalue .

Equation (12) then provides the expressions of in the cone coordinate frame:

One can therefore derive the expressions of and in the camera frame using (known):

then the spheroid centers are:

Now that the spheroid solutions have been determined in the camera coordinate frame, one can deduce the poses of camera solutions.

Result 10.

Considering a spheroid along with a non-circular backprojection cone, the axial symmetry of the spheroid leads to an infinite number of camera solutions. The solutions belong to two planes orthogonal to the revolution axis of the spheroid and located at the same distance from its center (see Fig 6).

Proof.

Since there is only one possible value for , vectors all have the same norm (cf. Result 6), i.e. the cameras are located at a fixed distance from the spheroid center.

Furthermore, orientations of these cameras verify:

Since the spheroid has a revolution axis, arbitrarily fixing eigenvectors for instance leaves two choices for one of eigenvectors (the one corresponding to the revolution axis) and an infinite number of choices for the other two. ∎

6.4.2 The Circular Cone

Considering the case of a circular cone (elliptic cone with a revolution axis), we are going to demonstrate that there is only one spheroid tangent to it (Result 11).

In this case, has two distinct eigenvalues. Let’s denote them (multiplicity 1) and (multiplicity 2).

Result 11.

Considering a spheroid along with a circular backprojection cone, there is only one value solution of Equation (1):

(25)

That value gives rise to one spheroid tangent to the cone, and both revolution axes coincide (See Fig. 9). The distance between the cone vertex and the spheroid center is given by:

(26)
The spheroid solution with a circular backprojection cone.
Figure 9: The spheroid solution with a circular backprojection cone.
Proof.

Given that has two distinct eigenvalues, its minimal polynomial is

being an annihilator polynomial of , evaluating gives

where

Left-multiplying by and right-multiplying by gives

(27)

Injecting the expressions of , and as functions of from (18) into (27) and observing that

and that

we obtain, after simplification

Let’s call the above polynomial whose is root:

Developing and , one can observe that it can be rewritten

At this stage, one can note that the sign of is the sign of :

Thus the signs of the roots of are:

Since , the only possible value for it is the second one. Therefore, is given by:

Let’s now focus on , and consider the first two equations of System 18:

Considering the ellipsoid frame, its left hand side can be rewritten in a matrix form:

That Vandermonde matrix is not singular given that , thus the system can be inverted.

After developing the right hand side, we finally obtain

and

Therefore,

(28)

is thus an eigenvector of corresponding to eigenvalue , i.e. coincides with the revolution axis of the spheroid.

Equation (2) then requires that is also an eigenvector of . It must be the one corresponding to the revolution axis of the cone since, if not, the ellipsoid center would be located outside of the cone. In that respect, both axes of revolutions (cone and spheroid) coincide, and is given by:

can then be obtained from (12), then and using (known).

Now the spheroid solution has been determined in the camera frame, we can infer the corresponding camera poses.

Result 12.

When considering a spheroid along with a circular backprojection cone, there are two solutions for camera position, that are located on the revolution axis of the spheroid and at the same distance from its center, and an infinite number of solutions for camera orientation.

Proof.

Equation (28) gives the two possible solutions for . For reasons of symmetry, both of them are actual solutions.

Furthermore, orientations of these cameras verify:

Just as when considering a non-circular elliptic cone (Result 10), we conclude on the infinite number of camera orientations. ∎

6.5 The Sphere

When the ellipsoid is a sphere, the matrix has the same expression in every basis:

(29)

where is the sphere radius.

Given its observation as an circle in the image, there is obviously an infinite number of camera solutions, that are located at the same distance from the sphere center.

Using the formalism of our study, we can precise these properties with the following result:

Result 13.

When considering a sphere, there is only one solution of Equation (1):

(30)

That value defines a unique sphere tangent to the cone, whose center belongs the revolution axis of the cone. The distance from the cone vertex to sphere center is given by:

(31)

The locus of camera positions is then a sphere with radius around the ellipsoid center.

Proof of Result 13 is provided in Appendix H.

6.6 Examples of Retrieved Poses

We provide in Fig. 10 a few examples of retrieved camera trajectories from one ellipse-ellipsoid correspondence on a real scene from the T-LESS dataset (SturmEEBC12). Ellipsoidal models of objects (all triaxial) were reconstructed using 7919240. Ellipsoids have then been reprojected into one image of the sequence using the groundtruth projection matrix. Finally, the trajectories of the camera solutions were retrieved using our method (Section 6.3). In the figure, ellipse and ellipsoid colors coincide with those of the trajectories. Naturally, all the trajectories intersect at the ground-truth camera position . This example illustrate the practical interest of our method. However, a comprehensive study of numerical aspects, e.g. noise robustness, are outside the scope of this paper.

Examples of camera trajectories corresponding to six different ellipse-ellipsoid pairs. The image is extracted from the T-LESS dataset Examples of camera trajectories corresponding to six different ellipse-ellipsoid pairs. The image is extracted from the T-LESS dataset
Figure 10: Examples of camera trajectories corresponding to six different ellipse-ellipsoid pairs. The image is extracted from the T-LESS dataset (SturmEEBC12).

7 Conclusion

We propose in this paper a complete characterization of the PE problem and noticeably extend previous works that only partially addressed this problem. Besides its theoretical interest, this paper also proposes a constructive solution of the camera trajectories. The closed-form solution provided for the position-from-orientation case has proven its practical interest. The orientation-from-position solution, even if not yet exploited, may also represent a convenient manner to simplify the pose estimation problem.

Future investigations concern numerical aspects and especially a sensitivity analysis of the method to image noise. Another important concern will be the joint use of the method with a minimal number of other image features, such as points or other ellipse-ellipsoid correspondences, to ensure the computation of a unique solution.

Declarations

The authors declare that they have no conflict of interest.

Appendix A Equivalent Problem Formulations

To prove that Equations (1) and (1) are equivalent, we demonstrate below that implies then implies .

Multiply (1) on the right by to obtain (see Appendix B). Whence

Left-multiplying by

Then right-multiplying by

Finally ()

Multiply (1) on the right by then on the left by to obtain

Then right-multiply by to obtain , whence () .

Injecting that result into he previous equation leads to

Appendix B Proving that

Replacing (2) into (1), we obtain:

We can then deduce the following expression for A:

Whence, denoting the identity matrix and defining , then left-multiplying by , we obtain

Squaring that expression leads to

Defining :

Finally, we have

Whence, denoting ,

Appendix C Vandermonde Matrix

A Vandermonde matrix is a matrix with the terms of a geometric progression in each row or column:

The determinant of a square Vandermonde matrix (when ) is given by

Therefore, is not singular (i.e. ) if and only if all are distinct.

Appendix D Ellipsoid and Cone types

See Tables 3 and 4.

Triaxial ellipsoid Spheroid Sphere
Illustration
Lengths
of principal
axes
Eigenvalues
of
Signs of
eigenvalues
Characteristic
polynomial
of
Minimal
polynomial
of
Table 3: The different types of ellipsoids.
Non-circular elliptic cone Circular cone
Illustration
Eigenvalues of
Signs of eigenvalues or or
Characteristic
polynomial of
Minimal
polynomial of
Table 4: The different types of elliptic cones.

Appendix E Theorem 1: Maple code

> with(linalg);
> A:=matrix([[lA1,0,0],[0,lA2,0],[0,0,lA3]]);
> B:=matrix([[lB1,0,0],[0,lB2,0],[0,0,lB3]]);
> M_A:=transpose(vandermonde([lA1,lA2,lA3]));
> d:=(det(A)/det(B))^(1/3);
> V:=transpose(matrix([[trace(inverse(A))-
    trace(inverse(B))*m/d,1-m^3,
    trace(B)*d*m^2-trace(A)*m^3]]));
> Delta2:=multiply(inverse(M_A),V);
> Delta:=transpose(matrix([[sqrt(Delta2[1,1]),
    sqrt(Delta2[2,1]),sqrt(Delta2[3,1])]]));
> inv_B:=evalm(d/m*(inverse(A)-multiply(Delta,
    transpose(Delta))));
> eigenvalues(inv_B);

Appendix F Theorem 9: Maple code

> with(linalg);
> A:=matrix([[lA1,0,0],[0,lA2,0],[0,0,lA2]]);
> B:=matrix([[lB1,0,0],[0,lB2,0],[0,0,lB3]]);
> M_B:=transpose(vandermonde([lB1,lB2,lB3]));
> d:=(det(A)/det(B))^(1/3);
> V_:=transpose(matrix([[trace(inverse(A))-
    trace(inverse(B))*m/d,1-m^3,
    trace(B)*d*m^2-trace(A)*m^3]]));
> V:=multiply(inverse(matrix([[1,0,0],[0,d*m^2
    ,0],[0,0,d^2*m^4]])),V_);
> Delta2:=multiply(inverse(M_B),V);
> Delta:=transpose(matrix([[sqrt(Delta2[1,1]),
    sqrt(Delta2[2,1]),sqrt(Delta2[3,1])]]));
> inv_A:=evalm(multiply(Delta,transpose(Delta))
    +evalm(m/d*inverse(B)));
> eigenvalues(inv_A);

Appendix G Solving the Polynomial Equation in Theorem 3

In case #1, the signs of eigenvalues ensure that

then that

Let’s denote the root of with multiplicity 1, and the root with multiplicity 2, such that

Considering the signs of and the fact that , it comes

One can therefore note that the roots of and are negative, while those of are positive. Since, in addition, , we have

Let’s now focus on the signs of and to determine the locus of possible values. Since , their roots verify

Therefore, one can distinguish between two configurations regaring the roots order:

or

For the first case, the variations of the three polynomials are presented in Table 5.

0
+ 0 - 0 -
- 0 + 0 +
+
Table 5: Signs of when .

One can observe that the three polynomials are never all non-negative, thus this configuration is impossible. For the second case, however, there is one value for which all three polynomials are non-negative: (see Table 6).

0
+ 0 - 0 -
- 0 + 0 +
+
Table 6: Signs of when .

Appendix H Proof of Result 13

Proof.

Left-multiplying by right-multiplying it by , we obtain

Injecting the first two equations of System (18), we then have

Developing and leads to

Furthermore, one can observe that

Whence, injecting this into the former equation:

Denoting

the last equation means that is root of the polynomial

Yet, is an obvious root of this polynomial:

Even if obtaining a formal expression of the two other roots is not straighforward, Vieta’s formulas provide the following constraints:

If roots are complex, then is the only possible value for . If they are real, then, since ( eigenvalues are of opposite signs), the second formula requires that and are of the same sign, and the first formula requires that they are positive. Finally,

Corresponding value is:

Applying to Equation (1) gives the value of :

Given

and

the right hand side can be rewritten

i.e.

References