Steel-blue = base πbase; teal = steered π★ ∝ πbase·eQ/α; amber dashes = value Q(a). The red shaded walls are outside the support (zero base mass) — the global optimum hides behind the right wall. Crank 1/α and the steered policy only ever climbs to the in-support ceiling; the gap Δ doesn't budge. Widen the base and Δ collapses.