Training interactive robots may one day be an easy job for everyone, even those without programming expertise. Roboticists are developing automated robots that can learn new tasks solely by observing humans. At home, domestic robots could learn how to do routine chores; in the workplace, robots could be shown how to perform many duties.
Making progress on that vision, researchers have designed a system that lets these types of robots learn complicated tasks that would otherwise hinder them with too many confusing rules. One such task is setting a dinner table under certain conditions.
At its core, the Planning with Uncertain Specifications (PUnS) system gives robots the humanlike planning ability to simultaneously weigh many ambiguous — and potentially contradictory — requirements to reach an end goal. In doing so, the system always chooses the most likely action to take, based on a “belief” about some probable specifications for the task it is supposed to perform.
The researchers compiled a dataset with information about how eight objects — a mug, glass, spoon, fork, knife, dinner plate, small plate, and bowl — could be placed on a table in various configurations. A robotic arm first observed randomly selected human demonstrations of setting the table with the objects. Then, the arm was tasked with automatically setting a table in a specific configuration in real-world experiments and in simulation, based on what it had seen.
To succeed, the robot had to weigh many possible placement orderings, even when items were purposely removed, stacked, or hidden. Normally, all of that would confuse robots but the researchers’ robot made no mistakes over several real-world experiments and only a handful of mistakes over tens of thousands of simulated test runs.
Robots are good planners in tasks with clear specifications that help describe the task the robot needs to fulfill, considering its actions, environment, and end goal. Learning to set a table by observing demonstrations is full of uncertain specifications. Items must be placed in certain spots, depending on the menu and where guests are seated and in certain orders, depending on an item’s immediate availability or social conventions. Present approaches to planning are not capable of dealing with such uncertain specifications. A popular approach to planning is “reinforcement learning,” a trial-and-error machine-learning technique that rewards and penalizes them for actions as they work to complete a task. But for tasks with uncertain specifications, it’s difficult to define clear rewards and penalties. In short, robots never fully learn right from wrong.
PUnS enables a robot to hold a “belief” over a range of possible specifications. The belief itself can then be used to dish out rewards and penalties. The system is built on “linear temporal logic” (LTL), an expressive language that enables robotic reasoning about current and future outcomes. The researchers defined templates in LTL that model various time-based conditions such as what must happen now, must eventually happen, and must happen until something else occurs. The robot’s observations of 30 human demonstrations for setting the table yielded a probability distribution over 25 different LTL formulas. Each formula encoded a slightly different preference — or specification — for setting the table. That probability distribution becomes its belief.
In simulations asking the robot to set the table in different configurations, it only made six mistakes out of 20,000 tries. In real-world demonstrations, it showed behavior similar to how a human would perform the task. If an item wasn’t initially visible, for instance, the robot would finish setting the rest of the table without the item. Then, when the fork was revealed, it would set the fork in the proper place.
Next, the researchers hope to modify the system to help robots change their behavior based on verbal instructions, corrections, or a user’s assessment of the robot’s performance.