offline RL · flow policies

Reversal Q-Learning

denoise noise into an action · reverse the flow to fill in the trajectory · guide it to higher value

drag in the panel to set the action