40
tail vs reality gap
free (predict, target) pairs

Red = the tail the policy predicted at time t and then threw away. Teal = what reality actually did over that same window (revealed later). The shaded gap between them is a free, reality-grounded training target: "you predicted this, the world did that." Every discarded tail, once the future arrives, hands you a labeled (prediction, ground-truth) pair — for nothing. Scrub t to harvest more of them.