Data has three peaks (a=−2,0,1), with the true optimum a*=1. The critic Q has a "spurious bump" near a≈3 (an OOD misjudgment). Switch the guidance method, adjust the weight, and watch where the particles land.
Hint: with no guidance, particles land randomly across the three peaks. Switch to QGF and raise the weight, and they concentrate at a*=1 (highest true return). Switch to OOD and raise the weight, and the particles are fooled by the "spurious bump" toward a≈3 — the critic Q is very high there, but the true return is very low. This is OOD exploiting the critic.