Guiding Robot Planes With Hand Gestures
Aircraft-carrier crew use a set of standard hand gestures to guide planes on the carrier deck. But as robot planes are increasingly used for routine air missions, researchers at MIT are working on a system that would enable them to follow the same types of gestures. Yale Song, a Ph.D student in MIT's Department of Electrical Engineering and Computer Science, his advisor, computer science professor Randall Davis, and David Demirdjian, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), recorded a series of videos in which several different people performed a set of 24 gestures commonly used by aircraft-carrier deck personnel. In order to test their gesture-identification system, they first had to determine the body pose of each subject in each frame of video.
Transcript
00:00:00 Gesture recognition is a novel approach to human-computer interaction that allows you to use your natural body movement to interact with computers. Because gestures are a form of human communication that is natural and expressive, they allow you to concentrate on the task itself, using what you already do, rather than having to learn new ways to interact. Our goal is to enable unmanned vehicles to recognize the aircraft handling gestures already made by deck crews. The aircraft handling gestures use both body posture and
00:00:37 hand shapes; so it is important for our system to know both information. My research concentrates on developing a vision-based system that recognizes body and hand gestures from a continuous input stream. My system uses a single stereo camera to track body motion and hand shapes simultaneously and combines this information together to recognize body-and-hand gestures. We use machine learning to train the system with lots of examples allowing the system to learn how to recognize each gesture. There are four steps that our system takes to recognize
00:01:19 gestures. First, from the input image obtained from a stereo camera, we calculate 3D images and remove the background. The second, our system estimates 3D body posture by fitting a skeletal body model to the input image. We extract various visual features, including 3D point cloud, contour lines and the history of motion. These features are computed both from the image and the skeletal model. Then, the two sets are features are compared allowing our program to come up with the most probable posture. The third [step], once we know the body posture, we know
00:01:57 approximately where the hands are located. We search around each of the estimated wrist positions, compute visual features in that region and estimate the probability that what we see there is one of the known hand shapes used in aircraft handling. For example: palm open, closed, and thumb up and thumb down. As the last step, we combine the estimated body posture and hand shape to determine gestures. We collected twenty-four aircraft handling gestures from twenty people, giving us four hundred sample gestures to use to teach the system to
00:02:38 recognize the gestures. We use a probabilistic graphical model called a Latent Dynamic Conditional Random Field. This model learns the distribution of the patterns of each gesture as well as the transition between gestures. We use this with a sliding window to recognize gestures continously and apply the multi-layered filtering technique we developed to make the recognition more robust. There is still a considerable amount of work to be done in the field of gesture recognition. Things we continue to work on include improving the reliability, adaptability to new gestures and
00:03:19 developing appropriate feedback mechanisms; for example the system can say, "I get it" or, "I don't get it."

