Flow policy: denoise noise into an action

denoise steps F 8

current step0 / 8

–

A flow policy starts from Gaussian noise x⁰ and walks it through F deterministic "denoise" steps x⁰→x¹→…→x^F, landing on an action. The blue dots are the cloud being refined; the teal targets are the modes of a multi-modal action distribution (a plain Gaussian policy could only hit one). The whole path is deterministic given x⁰ — the only randomness is that first noise draw. Hit re-roll noise to see different x⁰ flow to different modes.