Next-Token Prediction Meets Full-Sequence Diffusion
Diffusion forcing combines the strength of full-sequence diffusion models and next-token models, acting as either or a mix at sampling time for different applications without retraining. Watch this video to learn more.
“With Diffusion Forcing, we are taking a step to bringing video generation and robotics closer together,” says senior author Vincent Sitzmann , MIT assistant professor and member of CSAIL, where he leads the Scene Representation group.