Approximating Model Predictive Control Using Imitation Learning
Watch this video to see work showing that diffusion-based AMPC significantly outperforms L2-regression-based approximate MPC for multi-modal action distributions. In contrast to most prior research on IL, this work also focuses on running the diffusion-based controller at a higher rate and in joint space instead of end-effector space.
Transcript
00:00:00 In this video, we provide supplementary material for our paper on diffusion-based approximate model predictive control. MPC is an optimizationbased control framework that provides feasible and locally optimal policies by solving an optimization problem at every closed loop step. As an example, we control a seven degree of freedom robotic arm by
00:00:20 commanding joint velocities for high-speed tracking of the endeector's full pose. The MPC solutions in this task can be multimodal due to both the redundant degree of freedom and the numerical solver finding different local minima. However, due to large computational times in the numerical optimization, complex MPC formulations cannot be deployed on dynamical or
00:00:41 unstable systems at high control rates. We observed high overshoot that potentially led to self collisions. To overcome this limitation, previous works have used lease squares regression to approximate the MPC solution distribution and deploy it at high control rates, bypassing online optimization. However, as le squares models cannot model multimodal
00:01:03 distributions, they fail to maintain local optimality as well as feasibility and theoretical guarantees provided by the MPC formulation. Instead, we found that diffusion models can effectively capture the different modes of the MPC solution distribution. This avoids mode averaging and leads to a better retention of MPC's theoretical guarantees such as
00:01:24 feasibility and convergence. Since diffusion models capture the different modes of the MPC, naive deployment can result in high jerk due to the inconsistent mode selection in subsequent closed loop steps. To address this, we propose gradient guidance to condition the distribution to the previously picked mode as well as early stopping noise injection to ensure
00:01:51 smooth commands. With these improvements, we can sample from the distribution in closed loop within less than 1 millisecond on a GPU using just five dnoising steps. We deploy the system at 250 Hz and observe better stability and convergence compared to the original MPC. We also explore various online sampling strategies to leverage the
00:02:15 diffusion model's ability to capture multiple modes from the underlying MPC distribution. These strategies include clustering for democratic voting or selecting the mode with the minimum cost. We hope to inspire researchers and practitioners to adopt diffusion models as fast approximators of MPC. Thank you for watching.

