5

Hit backprop and watch the gradient (amber pulse) flow back into the actor. DDPG and SAC actors produce the action in one forward pass, so the gradient ∇aQ reaches the parameters in a single hop. The naive black-box treatment of a flow would force the gradient through all F steps (BPTT). RQL's trick — a value V(s, xᶠ, f) defined on every partial action — means each step gets its own value surface, so the gradient travels just one hop per step, never through the whole chain. Same cheap, stable hop as DDPG/SAC, just applied F times.