Swarming is a method of operations where multiple autonomous systems act as a cohesive unit by actively coordinating their actions. Future multi-domain battles will require swarms of dynamically coupled, coordinated, heterogeneous mobile platforms to overmatch enemy capabilities and threats targeting U.S. forces.

The Army is looking to swarming technology to be able to execute time-consuming or dangerous tasks. Finding optimal guidance policies for these swarming vehicles in real time is a key requirement for enhancing warfighters’ tactical situational awareness.

Reinforcement learning provides a way to optimally control uncertain agents to achieve multi-objective goals when the precise model for the agent is unavailable; however, the existing reinforcement learning schemes can only be applied in a centralized manner, which requires pooling the state information of the entire swarm at a central learner. This drastically increases the computational complexity and communication requirements, resulting in unreasonable learning time.

To solve this issue, researchers tackled the large-scale, multi-agent reinforcement learning problem. The main goal of this effort is to develop a theoretical foundation for data-driven optimal control for large-scale swarm networks, where control actions will be taken based on low-dimensional measurement data instead of dynamic models.

The current approach is called Hierarchical Reinforcement Learning (HRL) that decomposes the global control objective into multiple hierarchies — namely, multiple, small group-level microscopic control and a broad swarm-level macroscopic control. Each hierarchy has its own learning loop with respective local and global reward functions. Running the loops in parallel significantly reduced the learning time.

Online reinforcement learning control of swarm boils down to solving a large-scale algebraic matrix Riccati equation using system or swarm input-output data. The researchers’ initial approach for solving this equation was to divide the swarm into multiple smaller groups and implement group-level local reinforcement learning in parallel while executing a global reinforcement learning on a smaller dimensional compressed state from each group.

The current HRL scheme uses a decoupling mechanism that allows the team to hierarchically approximate a solution to the large-scale matrix equation by first solving the local reinforcement learning problem and then synthesizing the global control from local controllers (by solving a least squares problem) instead of running a global reinforcement learning on the aggregated state. This further reduces the learning time. Experiments have shown that compared to a centralized approach, HRL was able to reduce the learning time by 80% while limiting the optimality loss to 5%.

Current HRL efforts will allow development of control policies for swarms of unmanned aerial and ground vehicles so that they can optimally accomplish different mission sets even though the individual dynamics for the swarming agents are unknown.

The team is working to further improve the HRL control scheme by considering optimal grouping of agents in the swarm to minimize computation and communication complexity while limiting the optimality gap. They are also investigating the use of deep recurrent neural networks to learn and predict the best grouping patterns and the application of developed techniques for optimal coordination of autonomous air and ground vehicles in Multi-Domain Operations in dense urban terrain.

For more information, contact the U.S. Army CCDC Army Research Laboratory Public Affairs at 703-693-6477.