6

Walk the value along the chain x⁰→…→xF→s'→x'⁰→…. Inside one action the denoise steps are deterministic, reward-free, and un-discounted, so in RQL the value is flat across the whole block and equals the final action's Q; γ fires once, only at the real environment transition (the drop). That's why V(s,x⁰,0)=…=V(s,xF,F)=Q(s,xF), the TD target doesn't depend on f, and the horizon never inflates. Flip to "naive" to see the mistake — discounting every denoise step makes the value decay F× faster and stretches the horizon.