Visual Object Recognition and Tracking of Tools
- Created on Saturday, 01 January 2011
This method can be used to track tools held and used by humans, such as surgical tools.
A method has been created to automatically build an algorithm off-line, using computer-aided design (CAD) models, and to apply this at runtime. The object type is discriminated, and the position and orientation are identified. This system can work with a single image and can provide improved performance using multiple images provided from videos.
The spatial processing unit uses three stages: (1) segmentation; (2) initial type, pose, and geometry (ITPG) estimation; and (3) refined type, pose, and geometry (RTPG) calculation. The image segmentation module files all the tools in an image and isolates them from the background. For this, the system uses edge-detection and thresholding to find the pixels that are part of a tool. After the pixels are identified, nearby pixels are grouped into blobs. These blobs represent the potential tools in the image and are the product of the segmentation algorithm.
The second module uses matched filtering (or template matching). This approach is used for condensing synthetic images using an image subspace that captures key information. Three degrees of orientation, three degrees of position, and any number of degrees of freedom in geometry change are included. To do this, a template-matching framework is applied. This framework uses an off-line system for calculating template images, measurement images, and the measurements of the template images. These results are used online to match segmented tools against the templates.
The final module is the RTPG processor. Its role is to find the exact states of the tools given initial conditions provided by the ITPG module. The requirement that the initial conditions exist allows this module to make use of a local search (whereas the ITPG module had global scope). To perform the local search, 3D model matching is used, where a synthetic image of the object is created and compared to the sensed data. The availability of low-cost PC graphics hardware allows rapid creation of synthetic images. In this approach, a function of orientation, distance, and articulation is defined as a metric on the difference between the captured image and a synthetic image with an object in the given orientation, distance, and articulation. The synthetic image is created using a model that is looked up in an object-model database.
A composable software architecture is used for implementation. Video is first preprocessed to remove sensor anomalies (like dead pixels), and then is processed sequentially by a prioritized list of tracker-identifiers.
This work was done by James English, Chu-Yin Chang, and Neil Tardella of Energid Technologies for Johnson Space Center. For more information, download the Technical Support Package (free white paper) at www.techbriefs.com/tsp under the Information Sciences category.
In accordance with Public Law 96-517, the contractor has elected to retain title to this invention. Inquiries concerning rights for its commercial use should be addressed to:
124 Mount Auburn Street
Suite 200 North
Cambridge, MA 02138
Phone No.: (617) 401-7090
Toll Free No.: (888) 547-4100
Refer to MSC-23947-1, volume and number of this NASA Tech Briefs issue, and the page number.