Four ways to take the gradient · animated paper teaser

The same denoising trajectory, with each of the four panels staging its own "way of asking Q." Loops automatically.

1.0×
0%

Try it: drag the center of the blue contours (Q's peak) in any panel—all gradient arrows recompute live. Notice that only the gradient at BPTT's endpoint and the QGF gradient always point toward the peak; the OOD question-mark arrow ignores you, while BPTT's gradient propagated back to a_t keeps jittering.

① Behavior flow policyPure flow denoising: start from z~N and follow the velocity field v_t step by step to the clean action a₁. No Q involved — this is the "chassis" being guided.
② BPTTFirst roll out (dashed) to a₁, take ∇Q at a₁ (trustworthy), then multiply a chain of Jacobians da/da to backprop hop by hop to a_t. As the chain grows long, the returned arrow jitters violently = high variance, and it is expensive.
③ Noisy-point gradient (OOD)Query Q directly at the half-noisy a_t. But Q has never seen such a half-finished input, so the gradient direction is untrustworthy — marked with a question mark, pointing every which way.
④ QGF (ours)Take a one-step dashed jump along v_t to the clean estimate â₁, take ∇Q at â₁ (trustworthy), and set J=I to carry it back to a_t as-is. Cheap, stable, and pointing the right way.