Researchers have designed an algorithm that allows an autonomous ground vehicle to improve its existing navigation systems by watching a human drive. The approach — called adaptive planner parameter learning from demonstration (APPLD) — fused machine learning from demonstration algorithms and more classical autonomous navigation systems. Rather than completely replacing a classical system, APPLD learns how to tune the existing system to behave more like the human demonstration. The deployed system retains all the benefits of classical navigation systems — such as optimality, explainability, and safety — while also allowing the system to be flexible and adapt to new environments.

A single demonstration of human driving, provided using an Xbox wireless controller, allowed APPLD to learn how to tune the vehicle’s existing autonomous navigation system differently, depending on the particular local environment; for example, when in a tight corridor, the human driver slowed down and drove carefully. After observing this behavior, the autonomous system learned to also reduce its maximum speed and increase its computation budget in similar environments. This allowed the vehicle to successfully navigate autonomously in tight corridors where it had previously failed.

The team’s experiments showed that, after training, the APPLD system was able to navigate the test environments more quickly and with fewer failures than with the classical system. Additionally, the trained APPLD system often navigated the environment faster than the human who trained it.

From a machine learning perspective, APPLD contrasts with end-to-end learning systems that attempt to learn the entire navigation system from scratch. Such approaches tend to require large amounts of data and may lead to behaviors that are neither safe nor robust. APPLD leverages the parts of the control system that have been carefully engineered, while focusing its machine learning effort on the parameter tuning process, which is often done based on a single person’s intuition.

APPLD represents a paradigm in which people without expert knowledge in robotics can help train and improve autonomous vehicle navigation in a variety of environments. Rather than small teams of engineers trying to manually tune navigation systems in a small number of test environments, a virtually unlimited number of users would be able to provide the system the data it needs to tune itself to an unlimited number of environments.

Current autonomous navigation systems typically must be re-tuned by hand for each new deployment environment. This process must be done by someone with extensive training in robotics and requires trial and error until the right systems settings can be found. In contrast, APPLD tunes the system automatically by watching a human drive the system — something that anyone can do if they have experience with a video game controller. During deployment, APPLD also allows the system to re-tune itself in real time as the environment changes.

The team will test the APPLD system in a variety of outdoor environments and experiment with a wider variety of existing autonomous navigation approaches. Additionally, the researchers will investigate whether including additional sensor information such as camera images can lead to learning more complex behaviors such as tuning the navigation system to operate under varying conditions, such as on different terrain or with other objects present.

For more information, contact the U.S. Army CCDC Army Research Laboratory Public Affairs at 703-693-6477.