0.56
0.08
achieved E[Q]
in-support ceiling
irreducible gap Δ

Steel-blue = base πbase; teal = steered π ∝ πbase·eQ/α; amber dashes = value Q(a). The red shaded walls are outside the support (zero base mass) — the global optimum hides behind the right wall. Crank 1/α and the steered policy only ever climbs to the in-support ceiling; the gap Δ doesn't budge. Widen the base and Δ collapses.