Q-guided flow · 2-D animation

Heatmap = the critic Q's value landscape (bright = high value). Thin white lines = the velocity field the particles follow. Bright dots = action samples during real denoising. Drag inside the canvas to move the reward peak a*.

Q value: low → high Action particles Data modes / reward peaks
0.7

How to play: ① set guidance strength to 0 = pure flow; particles fill all 5 data modes (the distribution behavior cloning learned). ② raise the strength and Q tightens the particles onto the modes near the reward peak — this is "guiding the flow with Q." ③ drag the reward peak and the whole velocity field and the particles' landing spots track it live. ④ crank the strength very high and particles leave the data modes and pile directly onto a* (a sign of over-guidance / exploiting the critic).