Meet SonicSense: Enabling Robot Object Perception

SonicSense is a holistic design of hardware and software to enable rich robot object perception through in-hand acoustic vibration sensing. Watch this video to learn how the framework underscores the significance of in-hand acoustic vibration sensing in advancing robot tactile perception.

“Robots today mostly rely on vision to interpret the world,” explained Jiaxun Liu  , lead author of the paper and a first-year Ph.D. student in the laboratory of Boyuan Chen, professor of mechanical engineering and materials science at Duke. “We wanted to create a solution that could work with complex and diverse objects found on a daily basis, giving robots a much richer ability to ‘feel’ and understand the world.”



Transcript

00:00:00 we introduced Sonic sense an integrated hardware and software framework to enable acoustic vibration sensing for Rich robot object perception recent work has leveraged acoustic vibration sensing for object material and category classification position prediction estimating the amount and flow of granular material and collectively performing object spatial reasoning for

00:00:23 visual reconstruction however previous work focused on a small number of primitive objects with homogeneous material composition constrained settings for data collection and single finger testing therefore it is not clear whether acoustic vibration sensing can be helpful for object perception under noisy and less controlled conditions we

00:00:44 present Sonic sense a holistic design on both hardware and algorithm advancements for object perception through enhan acoustic vibration sensing our robot hand has four fingers a p Electric contact microphone is embedded inside each fingertip and around counterweight is mounted on the outer shell surface to increase the momentum of the finger motion our intuitive mechanical design

00:01:07 enables a range of interactive motion Primitives for object perception including tapping grasping and shaking motions the embedded contact microphone is able to collect highfrequency acoustic vibrations created by the contact between object objects or object hand interactions our robot can infer the geometry and inventory status of various objects inside a container from

00:01:31 their unique acoustic vibration signatures during interactions we derive 12 interpretable features based on traditional acoustic signal processing methods to help distinguish these different acoustic vibration signatures We performed an unsupervised nonlinear dimensionality reduction with tne on this 12-dimensional feature vector by shaking the container our robot can

00:01:54 successfully distinguish different numbers of dice or dice with different shapes inside the container when pouring water inside the bottle held by our robot we can detect the subtle differences in acoustic signatures based on different existing amounts of water inside the bottle our robot can also detect different amounts of water inside the bottle when shaking it in more

00:02:15 challenging object perception tasks we developed a data set with 83 diverse Real World objects our objects cover nine material categories and a variety of geometries from simple Primitives to complex shapes unlike previous work that uses humans to manually hold the robot's hand to interact with objects or design fixed interaction poses and forces for replay we derive a simple but effective

00:02:40 heuristic based interaction policy to autonomously collect the acoustic vibration response of objects our policy works well for all our Real World objects covering variable sizes and geometries we trained a material classification model that takes in the Mel spectrogram of our collected acoustic vibration signal from the impact sound and learns to predict the

00:03:02 material label the network takes the form of three convolutional neural network layers followed by two MLP layers the initial result of our method leads to a 0.523 F1 score however we observed object materials are relatively uniform and smooth around local regions based on this assumption we can iterative refine our prediction our final average F1

00:03:25 score reaches to 0.763 our shape Recon construction model takes the sparse and noisy contact points to generate a dense and complete 3D shape of the object we stack two pointed layers to encode the input and then feed the global feature Vector into a decoder network with fully connected layers to produce the final Point Cloud our results obtained an average of z. Z

00:03:50 Z 876 M champ for distance score the prediction on objects with primitive shapes generally has near perfect performance additionally our method exhibit the capability to reconstruct objects with complex shapes only through spars and noisy contact Point estimations when an object has been interacted with by the robot with its acoustic vibration responses we aim to

00:04:13 have our robot re-identify the object through a set of 15 new tapping interactions we input 15 both the collection of Mel spectrograms and their Associated contact points to the network to predict the label of this object among 82 objects in our data set our robot can re-identify the same object with more than 92% accuracy our robot has a strong resistance against ambient

00:04:37 noises and only focuses on vibration signals through physical contact this ensures high quality and reliable sensing data under challenging environmental conditions our entire robot hand costs $215 with commercially available components and 3D printing our experimental results demonstrate the versatility and efficacy of our design on variet ities of object perception

00:05:01 tasks including solid and liquid object inventory status estimation within containers material classification 3D shape reconstruction and object reidentification overall our method presents unique contributions to tactile perception with acoustic vibrations and opens up new opportunities for future robot designs to build a more robust complete

00:05:23 versatile and holistic perceptual model of the world