what determines real-robot success ~ already eaten by SOP + priors residual only real exploitable sim correlation success is decided here contact dyn. sensor noise long tail

Simulation is a cheaper evaluator, but it's biased exactly where it matters. The part of sim that correlates with reality (blue) you can mostly already capture with a good SOP and known priors. What's left — the residual (red): contact dynamics, sensor noise, materials and lighting, the real long tail — is precisely the part that decides real-robot success, and it can only be measured on hardware. You can't bootstrap a guarantee about the real robot purely from sim.