offline RL · flow policies
denoise noise into an action · reverse the flow to fill in the trajectory · guide it to higher value