A number of instruments have been built to obtain range images — a two-dimensional array of numbers that gives the depth of a scene along many directions from a central point in the instrument. Instead of measuring the brightness of many points in a scene, as in a television camera, these instruments measure where each point is in a three-dimensional space. Both range images and the more conventional intensity images from digital cameras have been used in the computer vision research community to determine the pose of observed objects or surface shapes. “Pose” refers to a complete description of an object's position and orientation. For a rigid object, this requires six numbers — such as X, Y, Z, pitch, yaw, and roll — or six equivalent coordinates. The previous methods for pose estimation all suffer from either a lack of generality or from time inefficiency.
It is possible to do pose estimation using tripod operators (TOs), which are feature extraction operators for surfaces. They are useful for recognition and/or localization (pose estimation) based on range or tactile data. They extract a few sparse point samples in a regimented way so that N surface points yield only N-3 independent scalar features containing all the pose-invariant surface shape information in these points, and no other information. They provide a powerful index into sets or pre-stored surface representations.
A TO consists of three points in space fixed at the vertices of a triangle of fixed edge lengths, and a procedure for making several depth measurements in the coordinate frame of the triangle, which is placed on the surface like a surveyor's tripod. These measurements take the form of arc-lengths along probe curves at which the surface is intersected.
The objective of this invention is to provide a technique for estimating the pose of surface shapes in six degrees of freedom from a range image containing an object possessing such a surface shape. A software procedure, with associated hardware, estimates the pose of an object from a range image containing the object. A range image is a two-dimensional array of numbers that represents the distances from a reference point in the range imaging instrument to observed surface points in a scene. All six parameters of the pose of an object are estimated: three translational and three angular parameters.
This technique involves combining TOs with a technique known as “nonpose-distinctive placement removal.” The technique is composed of two steps. The first is training the system on a new object so it will be able to estimate the pose of that object when seen again in some range image. The second is the actual pose estimation, where a TO is placed at a random location on a new range image containing the object of interest. Then the nearest neighbor in the TO feature space signature from the training data is computed. If the distance to the nearpoint is less than some appropriate threshold, then the surface is recognized, and pose estimation proceeds by computing the six pose parameters of a central triangle of the new TO placement in the coordinate system of the range imaging instrument. Then the pose parameters associated with the nearpoint are retrieved. An estimate of the pose of the surface shape in the new image is recovered using those two pose six-vectors. The pose of the central TO triangle in the new image and the retrieved pose of the central TO triangle in the training image are composed together to determine an estimate where the object actually is with respect to the location of its original model used in training.
This method can be used for face recognition and to train robots to “see.” Unlike other image-based methods, this approach is completely insensitive to variation in lighting or viewpoint, and is extremely time-efficient. The approach differentiates parts in cluttered environments, and is applicable to nearly any surface shape. It is easily trained from a range image or computer model, and is high-speed (milliseconds). Coupled with a suitable range imaging scanner, this technique enables the automation of many tasks previously relegated to human labor.
The system has been demonstrated for use in parts recognition for automated assembly, and in support of spacecraft docking maneuvers. Other applications include range-based target recognition and mobile robot vision.