For decades, robots in controlled environments like assembly lines have been able to pick up the same object over and over again. More recently, breakthroughs in computer vision have enabled robots to make basic distinctions between objects. Even then, though, the systems don’t truly understand objects’ shapes, so there’s little the robots can do after a quick pickup.

With the DON system, a robot can do novel tasks like look at a shoe it has never seen before and successfully grab it by its tongue. (Photo: Tom Buehler/CSAIL)

A system called Dense Object Nets (DON) lets robots inspect random objects and visually understand them enough to accomplish specific tasks without ever having seen them before. The system looks at objects as collections of points that serve as sort of visual roadmaps. This approach lets robots better understand and manipulate items and most importantly, allows them to even pick up a specific object among a clutter of similar objects — a valuable skill for the kinds of machines that companies like Amazon and Walmart use in their warehouses. Someone might use DON to get a robot to grab onto a specific spot on an object; for example, the tongue of a shoe. From that, it can look at a shoe it has never seen before, and successfully grab its tongue. None of the data was actually labeled by humans. Instead, the system is “self-supervised,” not requiring any human annotations.

Two common approaches to robot grasping involve either task-specific learning or creating a general grasping algorithm. These techniques both have obstacles. Task-specific methods are difficult to generalize to other tasks and general grasping doesn’t get specific enough to deal with the nuances of particular tasks, like putting objects in specific spots. The DON system, however, essentially creates a series of coordinates on a given object that serve as a kind of visual roadmap to give the robot a better understanding of what it needs to grasp and where.

The system was trained to look at objects as a series of points that make up a larger coordinate system. It can then map different points together to visualize an object’s 3D shape, similar to how panoramic photos are stitched together from multiple photos. After training, if a person specifies a point on an object, the robot can take a photo of that object and identify and match points to be able to then pick up the object at that specified point.

In one set of tests done on a soft caterpillar toy, a Kuka robotic arm powered by DON could grasp the toy’s right ear from a range of different configurations. This showed that, among other things, the system has the ability to distinguish left from right on symmetrical objects. When testing on a bin of different baseball hats, DON could pick out a specific target hat despite all of the hats having very similar designs — and having never seen pictures of the hats in training data before.

In factories, robots often need complex part feeders to work reliably but DON can understand objects’ orientations and could take a picture and be able to grasp and adjust the object accordingly. In the future, the team hopes to improve the system to perform specific tasks with a deeper understanding of the corresponding objects, like learning how to grasp an object and move it.

For more information, contact Rachel Gordon at This email address is being protected from spambots. You need JavaScript enabled to view it.; 617-258-0675.