Let's say two vehicles are heading right at each other down a one-way street.

If you're behind the wheel in this kind of tight, challenging driving scenario, you can negotiate with the parties nearby. You can pull over to the side of the road and then motion for the driver ahead to pull through the thin lane. Through interaction, you can figure out maneuvers that keep everybody safe and onto their destination.

A self-driving car has a tougher challenge, and must somehow understand nearby drivers, and their willingness to play nice.

A new algorithm under development can guide an autonomous vehicle through tough traffic on a crowded, narrow street.

The algorithm, built by researchers at the Carnegie Mellon University Argo AI Center for Autonomous Vehicle Research  , makes its decisions by modeling different levels of driver cooperativeness — how likely a driver is to pull over to let another driver pass.

With "Multi-Agent Reinforcement Learning," or MARL, the team, led by researcher Christoph Killing, got autonomous vehicles to exhibit human-like behaviors, including defensive driving and interpreting the behavior of other agents — in simulation, so far.

The algorithm has not been used on a vehicle in the real world, but the results are promising, thanks to the model's reward-based system.

"We incentivize interactions with safety in mind," said Killing, a former visiting research scholar in the School of Computer Science's Robotics Institute  and now part of the Autonomous Aerial Systems Lab at the Technical University of Munich.

{youtube} ttps://www.youtube.com/watch?v=5njRSHcHMBk  {/youtube}

In a short Q&A with Tech Briefs below, Christoph explains more about how his team's incentive-based model navigates tough traffic situations, where there are no official rules of the road.

Tech Briefs: Would you characterize your model as more cooperative or aggressive, when navigating a challenge that requires a little bit of both?

Christoph Killing: As in any driving scenario, autonomous vehicles should put safety first and follow all traffic rules. However — and this is the beauty and challenge of the scenario considered — there exists no coordinating traffic rules in this kind of scenario (in contrast to 4-way stop intersections, for example). Two vehicles of equal right of way have to negotiate essentially who goes first and who waits.

If both vehicles are purely focused on safety, they will both pull over. The key challenge we were faced with in our research was: How do we make one vehicle pull over and one go — not to make both vehicles pull over, not to make both vehicles go, when each makes their own decisions without any coordinating instance.

We incentivize interactions with safety in mind; crashing at speed is worse than timing out — but time-outs also result in a small penalty to incentivize agents to learn to interact and pass by each other.

Tech Briefs: What are the main parameters that your model is using to execute the drive? What criteria is the algorithm basing its decisions off of?

Christoph Killing: Our algorithm perceives what would be available on an actual car. We have distance and relative velocity measurements around the front of the car (see Fig. 2 in the report here  ). Notably, compared to related work, we do not use a birds-eye view on the scenario but an egocentric perspective. This makes it a little bit trickier since we now have blind-spots. This observation is augmented by further parameters, such as the cooperativeness mentioned above to tell the agent how aggressive to behave, but also the current steering angle and throttle position (which you would also know of when driving yourself in this scenario).

Tech Briefs: What is still challenging for the algorithm to get right?

Christoph Killing: There are two main challenges: overly aggressive pairings and overly passive pairings. ( Compare the visualizations here  .) Notably, our policies are able to negotiate the scenario most of the times. Yet, human passengers might be quite unhappy with their cars doing some of the maneuvers shown here  .

Tech Briefs: What does the algorithm do when it’s clear that an opposing driver is being an aggressively, “bad” driver? Or an overly “cooperative” driver?

Christoph Killing: We test our driving policies by assigning a cooperativeness value to each vehicle, telling it how aggressive to behave. Each only knows about its own cooperativeness, not about the one of the opposing car. These cooperativeness values translate to driving behaviors in a quite straight forward manner: An uncooperative driver is only interested in its own progress. A highly cooperative driver doesn’t mind which vehicle makes progress first, as long as somebody goes. These values are fixed throughout the interaction.

(We do not consider “losing your temper.” I am not going to deep dive here but let’s just keep it at “for mathematical reasons.")

Tech Briefs: Does part of the model require a kind of “read” of the opposing driver?

Christoph Killing: A word about the “read”: In robotics, we distinguish between the state of the world (i.e., the planet Earth as it is right now) and an observation. Our vehicles do not have a memory module. So, how do we deal with things we do not see at the moment?

Let’s say, for instance, that you are on a Zoom call with somebody. You perceive a partial observation of the planet Earth so to say. The other party takes a coffee mug from outside the field of view of their camera, takes a sip, and puts it back down outside their camera's field of view. If you only take into consideration the very last observation you made after the mug was put down and are being asked what they drink, you simply do not know (because there is no memory). Yet, if you stack together (we call it "concatenate") several observations throughout the past seconds, you can infer something about the state of the world as you then see the mug being moved throughout several frames. Based on how rapidly they move it, you might even be able to tell something about their mood.

Equally, in our scenario, each car only knows the other agent, based on what it can observe from the observation space ( shown in Fig 2. in the paper  ). Internal states (the cooperativeness value of the other car, for example) are unknown. We concatenate several of those partial observations of each vehicle to allow them to implicitly form a believe about how cooperative the other vehicle might be. We don’t manually do this but have the Deep Neural Network, the artificial intelligence, absorb the task. This Neural Net also has to learn the answer to your question, namely what to do after it noticed a certain aggressiveness or overly cooperative behavior.

Tech Briefs: How does the model note an "aggressive" or "cooperative" behavior, and respond accordingly?

Christoph Killing: An overly aggressive agent might, for instance, just proceed right into this bottleneck of the scenario, essentially forcing the other agent to wait. An overly cooperative agent would — as soon as the full extent of the bottleneck is perceivable by its sensors — slow down and wait. Here our policy is trained to immediately select the complementary action: detect a slow-down and go, or vice versa.

Tech Briefs: What’s next for this research?

Christoph Killing: Plenty of things: Three major points: Firstly, the current work is autonomous vehicle confronted with autonomous vehicle only. We will need to extend this to an autonomous vehicle confronted with a human and see how well we do cooperating with those. Secondly, in our work vehicles can move forward only, we do not allow reversing. However, this could help recover from situations where we are stuck. Thirdly, our work currently is simulation only. Transferring it to a real-world solution is a major step we need to take at some point.

What do you think? Share your questions and comments below.