Researchers Extract Audio from Visual Information

Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. In one set of experiments, the team was able to recover intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass.

"When sound hits an object, it causes the object to vibrate,” says Abe Davis, a graduate student in electrical engineering and computer science at MIT and first author on the new paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realize that this information was there.”

Reconstructing audio from video requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal. In some of their experiments, the researchers used a high-speed camera that captured 2,000 to 6,000 frames per second.

The researchers’ technique has obvious applications in law enforcement and forensics, but Davis is more enthusiastic about the possibility of what he describes as a “new kind of imaging.”

“We’re recovering sounds from objects,” he says. “That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.” In ongoing work, the researchers have begun trying to determine material and structural properties of objects from their visible response to short bursts of sound.


Also: Learn about Enhanced Auditory Alert Systems.