Pathwise gradient ascent on Q

step size η 0.12

Q at action–

–

This is the actor update DDPG and SAC run, in one picture. Q(s, ·) is the landscape over the 2-D action space (bright = high value). Drag the action anywhere; the arrow is the pathwise gradient ∇_aQ. Hit ascend and the action just walks uphill along that gradient toward a high-value mode — that is exactly "push the action toward higher value," and it is one cheap backprop hop from Q to the action. RQL applies this very same step, but to each denoise step of the flow.