Wind Tunnel Theory: The Engineering Endgame of Robot Learning

Why the data → model manifold can’t be crossed by intuition, why iteration speed is the next battlefield, and where the data flywheel hits escape velocity. With interactive, animated diagrams throughout.

为什么数据到模型的 manifold 无法靠直觉穿越，为什么工程化的迭代速度才是下一个战场，以及数据飞轮在哪里抵达逃逸速度。全文穿插可交互的动态图解。

One machine, two substances. Left: 3D airflow streams over a wing that keeps upgrading across eras — biplane → P-51 → swept jet → delta fighter — the way the real tunnel searched the airfoil. Right: the same loop for robots — data clips blow onto the policy’s capability manifold, and wherever verified data lands the surface lights from gray to success-green. Air ↔ data, wing ↔ policy, lift ↔ how well it works. (Drag flow, or pause.)

A field with no north star

In most engineering problems, you know the destination before you set out. How much load the bridge must carry, how high the chip must clock — the target is a number you can write down and compute ahead of time. You walk toward that number; how fast you walk is a question of skill, but the direction was clear from the start.

Robot learning is not like that.

When you train a VLA (Vision-Language-Action) model to operate in the real world, you face a night sky with no north star. You cannot say “this batch of data, trained into this model, will perform this well on the real robot” — because between (data distribution, architecture, recipe) and (downstream real-robot performance) there is no closed form, no formula you can evaluate in advance. The map objectively exists, but the only tool humanity currently has to evaluate it is to actually run the training, then actually put the model on the machine and run it.

This isn’t because we aren’t clever enough. It is an intrinsic property of the map. Admitting that is step one to doing robot learning right.

同一台机器，两种「介质」。左：三维气流吹过一片机翼，而这片机翼随着时代不断升级——双翼机 → P-51 → 后掠喷气翼 → 三角翼战机——就像真实风洞当年一代代搜索翼型那样。右：机器人版的同一个循环——数据片段吹到策略的能力 manifold 上，验证过的数据落到哪里，曲面就从灰色亮成代表成功的绿色。空气 ↔ 数据，机翼 ↔ 策略，升力 ↔ 它工作得多好。（拖动 flow，或暂停。）

一个没有北极星的领域

在大多数工程问题里，你出发前就知道终点在哪。桥要承多重的载、芯片要跑多高的频，目标是一个可以写在纸上、提前算出来的数。你朝着那个数走，走得快不快是能力问题，但方向从一开始就是清楚的。

机器人学习不是这样的。

当你训练一个 VLA（Vision-Language-Action）模型去操作真实世界，你面对的是一个没有北极星的夜空。你说不出「这批数据训出来的模型，在真机上会有多好」——因为从（数据分布、模型架构、训练配方）到（下游任务真机表现）之间，没有闭式解，没有可以提前求值的公式。这个映射客观存在，但人类目前手里唯一能对它求值的工具，就是把训练真实地跑一遍、再把模型真实地放到机器上跑一遍。

这不是因为我们不够聪明。这是这个映射的固有性质。承认这一点，是把机器人学习做对的第一步。

The wind tunnel: when theory gives no answer

A hundred-odd years ago, aeronautical engineers faced the same predicament.

Why does a wing generate lift? The theoretical argument around that question ran for decades. But the Wright brothers didn’t wait for theory to settle it. They built a wind tunnel — a box that blows air over an airfoil at a controlled speed and measures the lift and drag. Inside that box, they systematically tested hundreds of airfoils, recorded how each behaved, and picked the best.

The value of the wind tunnel is not that it “picked a good airfoil.” If someone says “a wind tunnel just judges which airfoil is better — low value,” they’ve missed the entire point. The whole value of the wind tunnel is this: for a function theory cannot predict, it is the only reliable measuring instrument. Without it you can only guess; with it you can measure.

Robot learning needs its own wind tunnel: the engineering system that closes the loop between training and real-robot evaluation, that can systematically measure “this data → this model → this real-robot performance.” Its reason to exist is identical to the Wright brothers’ wooden box: on this problem, experiment is the only evaluator, and theory is not.

风洞：当理论给不出答案

一百多年前，航空工程师面对过同样的处境。

机翼为什么能产生升力？围绕这个问题的理论争论持续了几十年。但莱特兄弟没有等理论吵出结果。他们造了一个风洞——一个能把空气以可控速度吹过翼型、并测出升力和阻力的箱子。在那个箱子里，他们系统地测试了上百种翼型，记录每一种的表现，然后选出最好的。

风洞的价值，不在于它「挑出了好翼型」。如果有人说「风洞不过是在判断哪个翼型好哪个不好，价值太低了」，他完全没抓住要点。风洞的全部价值在于：对一个理论无法预测的函数，它是唯一可靠的测量仪器。 没有它，你只能猜；有了它，你能测。

机器人学习需要它自己的风洞：那个把训练和真机评估闭环起来、能够系统性地「测量」出「这批数据 → 这个模型 → 这个真机表现」的工程系统。它存在的理由，和莱特兄弟那个木箱子一模一样：因为在这个问题上，实验是唯一的求值器，而理论不是。

SOP shapes the input, but never touches the function

A common misreading: as long as you make the data-collection SOP detailed and standardized enough, performance will follow.

That treats the model as a system that “executes a procedure.” But the model isn’t written, it’s learned. What an SOP can decide is which region of data space you sampled — i.e. what you fed in. What it cannot decide is the shape of the map from “the distribution you fed in” to “the policy behavior you learned.” In between sits an opaque learning process. You can shape the input; you cannot shape the function from input to behavior. That is the heart of the manifold problem: between data distribution and model behavior lies a high-dimensional, curved manifold whose shape your intuition cannot trace, and the SOP — that chisel — simply can’t reach it.

Some know-how genuinely can be front-loaded: data ratios, baseline quality filtering, recipe priors known to work. Put those in before you start and you’ll waste fewer turns. But they act on the input end. The stretch from input to model performance — no prior walks it for you.

There’s a subtler trap in over-specifying the steps. A pristine, hyper-detailed collection SOP — every grasp, every angle written down — looks rigorous, yet it often underperforms. People execute with understanding-and-intention drift, and that fine needlework gets heavily discounted: spend a whole tired day on one task and you quietly converge to the single most effort-saving path. In learning terms, one mode comes to dominate the distribution — which is exactly what guts a dataset’s value, because coverage and diversity are what generalization feeds on.

So the thing to optimize is always the model’s final performance, never the human’s choreography. Don’t legislate how people should move; give them the final-acceptance SOP — “the robot must succeed across these generalization scenarios” — and let them invent how to get there. It’s the same reason you don’t hand an employee a rigid, step-by-step prompt: you hand them the Claude API and a stack of hard tasks, and judge them on the outcome.

SOP 决定输入，但碰不到那个函数

一种常见的误解是：只要把数据采集的 SOP 做得足够详细、足够规范，模型表现自然就好了。

这个想法，把模型当成了一套「按流程执行」的系统。但模型不是写出来的，是学出来的。SOP 能决定的，是你采到了数据空间里的哪一块区域——也就是喂进去什么。它决定不了的，是「喂进去的分布 → 学出来的策略行为」这个映射会长成什么样。这中间隔着一个不透明的学习过程。你能塑造输入，塑造不了输入到行为的那个函数。这正是 manifold 的核心难点：在数据分布和模型行为之间，存在一个高维的、弯曲的、你无法用直觉描出形状的流形，而 SOP 这把刻刀，根本够不到它。

有些 know-how 确实可以前置：数据的配比、基础质量的过滤、已知有效的配方先验——这些该在出发前就放进去，放进去就能少走弯路。但它们作用在「输入端」。从输入到模型效果的那一段，没有任何先验能替你走完。

而把动作「规定得过细」，还藏着一个更隐蔽的陷阱。一份一尘不染、事无巨细的采集 SOP——每一次抓取、每一个角度都写死——看着很严谨，效果却往往不够好。人执行起来有理解和意图上的偏差，那种精细的针线活会被大打折扣：累了一整天只做一个任务，你会悄悄收敛到最省力的那一条路径。落到 learning 上，就是单一模式开始主导分布——而这恰恰最毁数据的价值，因为泛化吃的就是覆盖度和多样性。

所以要优化的，永远是模型的最终表现，而不是人的动作编排。别去规定人该怎么动；给他们一份最终验收的 SOP——「机器人必须在这些泛化场景下成功」——让他们自己想办法走到那里。这和你不该给员工一份死板的、一步步的 prompt 是同一个道理：你把 Claude 的 API 和一堆难任务丢给他们，只按结果来评判。

Simulation is biased, exactly where it matters

So can simulation route around it?

Sim is a cheaper evaluator. It’s useful, and it’s biased. The sim-to-real gap isn’t noise; it’s systematic bias, and it’s biased in the deadliest places: contact dynamics, sensor noise, materials and lighting, the long tail of real operating conditions — precisely the parts a model must learn and a simulator struggles most to reproduce.

There is correlation between sim and real, true. But the exploitable part of that correlation you can already eat with SOP and priors. The residual that’s left is exactly the part that decides real-robot success — and it can only be measured on hardware. You cannot bootstrap a guarantee about real-world performance purely from sim.

So what blows through the wind tunnel must, in the end, be real wind.

仿真有偏，而且偏在最关键的地方

那能不能用仿真绕过去？

仿真是一个更便宜的求值器，它有用，但它有偏。sim-to-real 的差距不是噪声，是系统性的偏差，而且偏在最要命的地方：接触动力学、传感器噪声、材质与光照、真实工况的长尾分布——这些恰恰是模型必须学会、又最难被仿真还原的部分。

仿真和真机之间有相关性，这是真的。但那部分可以利用的相关性，你其实已经能用 SOP 和先验吃掉了。剩下的残差，正好是决定最终真机成败的那一部分，而它只能在真机上测出来。你没法纯靠仿真，把对真机表现的保证 bootstrap 出来。

所以风洞里吹的，最终必须是真实的风。

Iteration isn’t rework — it’s how the field breathes

If the destination is unknowable, the SOP can’t reach the function, and sim is biased, only one path remains: train → evaluate on the real robot → locate the failure → collect data to target it → retrain. Around once, then around again.

This gets misread two ways, both wrong.

One misreading is “rework” — as if needing to retrain means you botched the last round. No. The first time you train, you have no idea where the model will fail, because failure modes are only exposed after you train and test. You can’t pre-collect problems you don’t yet know exist. Each iteration isn’t patching holes; it’s using the last round’s measured results to illuminate the next blind spot. This is convergence, not remediation.

The other misreading cuts deeper: “great teams don’t need to iterate; needing iteration means you’re weak.”

That’s half true. Capability genuinely compresses iteration — a team with deep priors knows which architectures work, which ratios are good starting points, which hyperparameters not to bother trying, so it converges fast with few mistakes. A weak team might take ten turns; a strong team three. But no team takes zero turns.

Swapping “iteration count can be compressed by capability” for “iteration can be eliminated by capability” is a slippery slope. It also hides an unfalsifiable trap: you succeed with little data, “as expected”; you need more iteration, “you’re weak.” An argument that’s right no matter the outcome isn’t deep — it’s empty, because it makes no checkable prediction.

The hard counter-evidence: the most advanced VLA and robot-learning teams run the largest, densest train-eval-iterate loops and ablation studies there are. They don’t run experiments because they’re weak; they run them because they actually understand how this field works. An ablation is a controlled experiment — the scientific method itself. Calling it “a sign you don’t understand the algorithm” is like saying every scientist who runs controlled experiments doesn’t understand their field.

迭代不是返工，是这个领域的呼吸方式

如果终点不可预知、SOP 够不到那个函数、仿真又有偏，那剩下的只有一条路：训练 → 真机评估 → 定位失效 → 定向补数据 → 再训练。 转一圈，再转一圈。

这件事经常被误读成两种样子，两种都错。

一种误读是「返工」——好像需要重训，是因为上一次没做好。不是的。第一次训练时，你根本不知道模型会在哪里失效，因为失效场景是训完测出来才暴露的。你无法预先采集那些你还不知道存在的问题。每一轮迭代不是在补窟窿，是在用上一轮的实测结果照亮下一块盲区。这是收敛，不是补救。

另一种误读更伤人：「牛逼的团队不需要迭代，需要迭代说明人菜。」

这话只有一半对。能力确实能压缩迭代——有深厚先验的团队，知道哪些架构能用、哪些配比是好起点、哪些超参不必试，所以收敛快、试错少。一个菜的团队可能要转十轮，一个强的团队三轮搞定。但没有任何团队能转零轮。

把「迭代次数能被能力压缩」偷换成「迭代能被能力消除」，这是个滑坡。它还藏着一个不可证伪的陷阱：你少数据成功了，他说「本该如此」；你需要更多迭代，他说「你们菜」。一个无论结果如何都正确的论点，不是深刻，是空的——因为它没做任何可被检验的预测。

而真实世界里的反证很硬：那些做 VLA 和 robot learning 的最前沿团队，恰恰跑着最大规模、最密集的 train-eval-iterate 闭环和 ablation 实验。他们不是因为菜才做实验，而是因为真懂这个领域怎么运转。ablation 就是控制变量实验，就是科学方法本身。说它是「不懂算法的表现」，等于说所有做受控实验的科学家都不懂自己的领域。

The frontier runs on exactly this loop: RT-1 → RT-2 → Open X-Embodiment/RT-X at Google DeepMind, π0 → π0.5 → π0.7 at Physical Intelligence, Octo and OpenVLA from Berkeley/Stanford, Gemini Robotics, NVIDIA’s GR00T. None of them skipped the turns — they ran more of them, faster.

前沿正是跑着这个闭环：Google DeepMind 的 RT-1 → RT-2 → Open X-Embodiment/RT-X，Physical Intelligence 的 π0 → π0.5 → π0.7，Berkeley/Stanford 的 Octo 和 OpenVLA，Gemini Robotics，NVIDIA 的 GR00T。没有谁跳过那些圈——他们只是转得更多、更快。

10 can be compressed to 3, but never to 1

So can a smarter algorithm push data efficiency to the limit — so little iteration you barely need any?

That’s a direction with taste, worth chasing forever. But state the boundary clearly.

Think of “wasted” exploration like this: a team that flails by intuition might need 10 units of cost before it finds the path. With solid engineering — closing the collect↔eval loop, making every round produce reusable conclusions, letting know-how truly accumulate — you can compress that 10 to 3, even 2. That’s the real value of engineering capability.

But you can’t compress it to 1. Not for lack of effort, but because that last stretch — the map from data to model performance — has no analytic solution under current science. It can only be measured. You can make measurement extremely fast, cheap, and disciplined, pushing toward the physical limit of the problem; but you cannot make the act of measuring itself disappear. The set of engineering methods that pushes exploration cost toward that physical limit is the answer — and its name is the wind tunnel.

(Aside: the “generalize from little data” abilities — few-shot in large models, fast adaptation of foundation models — are themselves products of first scaling on massive data, not shortcuts around scaling. Few-shot generality is a prize of scaling, not a substitute for it. You can’t skip the foundation and move straight into the penthouse.)

10 可以压到 3，但压不到 1

那能不能靠更聪明的算法，把数据效率推到极致，少到几乎不用迭代？

这是个有品味的方向，值得一直追求。但要把边界说清楚。

可以这样理解探索中「浪费」的数据与时间：一个直觉乱撞、毫无章法的团队，也许要在 10 份代价的探索里才摸到路。靠扎实的工程化——把采集和评估闭环、让每一轮都产出可复用的结论、让 know-how 真正沉淀——你可以把这个 10 压到 3，甚至压到 2。这是工程能力实打实的价值。

但你压不到 1。 不是因为还不够努力，是因为那最后一段——从数据到模型效果的那个映射——在当前科学下没有解析解，只能测量。你可以让测量变得极快、极省、极有章法，逼近这个问题的物理极限；但你无法让测量这个动作本身消失。把探索成本逼到接近物理极限的那套工程方法，本身就是答案，而它的名字，就是风洞。

（顺带一提：那些「少量数据就能泛化」的能力——大模型的 few-shot、基座模型的快速适配——它们恰恰是先在海量数据上 scale 出来的产物，而不是绕过 scale 的捷径。少样本通用能力是 scaling 的奖品，不是它的替代品。你不能跳过盖地基，直接住进顶楼。）

The real divide: collection and inference live in different teams

Here the biggest engineering problem in robot learning today surfaces — not an algorithm, but an org chart.

In many teams, data collection and model inference are split. One group collects data by SOP; another trains models and watches the results. Between them sit handoffs, processes, and separate KPIs. The collection team doesn’t know what consequences its data caused inside the model; the model team can’t quickly reach back and adjust where the data came from.

But a wind tunnel is, in essence, a closed loop. Blow, measure, adjust the airfoil, blow again — it has to happen fast, in one circuit, by one pair of hands. If the person designing the airfoil and the person measuring lift belong to two departments and every handoff takes three days, the wind tunnel is dead — its entire value is in iteration speed, and the split kills speed outright.

So the fix isn’t only technical, it’s organizational design: integrate data collection and model inference, through an engineering system, into one team and one loop. Compress the latency from “we measured the model failing in scenario X” to “we collected data targeting scenario X” from weeks to days. That latency is your iteration speed; and iteration speed is the next battlefield.

真正的割裂：采集和推理不在一个团队

讲到这里，机器人学习当下最大的工程问题就浮出来了——不是某个算法，而是组织结构。

今天，在很多团队里，数据采集和模型推理是割裂的。一拨人负责按 SOP 采数据，另一拨人负责训模型、看效果。两者之间隔着流程、隔着交接、隔着各自的 KPI。数据采集团队不知道自己采的数据在模型里造成了什么后果，模型团队拿到的数据来源也无法快速反向调整。

但风洞的本质，是一个闭环。吹风、测量、调整翼型、再吹风，必须在同一个回路里、由同一双手快速完成。如果设计翼型的人和测升力的人分属两个部门、每次交接要等三天，那这个风洞就废了——它的价值全在迭代速度上，而割裂会直接杀死速度。

所以解法不只是技术，是工程化的组织设计：把数据采集和模型推理通过工程系统集成进同一个团队、同一个闭环。 让「测出模型在某个场景失效」到「定向补采那个场景的数据」之间的延迟，从几周压缩到几天。这个延迟，就是你的迭代速度；而迭代速度，就是下一个战场。

The data flywheel’s escape velocity

So far the wind tunnel sounds like a treadmill: collect, measure, collect again, forever. But the loop has a threshold built into it — and crossing it changes everything.

Here’s the mechanic. As the cumulative pool of high-quality, generalizable data grows, two things happen at once. The policy generalizes to new tasks from less and less fresh data — and the collection step itself gets more automatable, because a policy that’s good enough can teleoperate, auto-label, and pre-screen its own next batch. At some scale the loop starts producing more usable data than each new task consumes. The flywheel stops needing you to push it.

That crossing deserves a name. The first cosmic velocity — orbital velocity — is the speed at which a body stops falling back to Earth and starts holding its own orbit. The data loop has the same threshold: below it you burn human fuel just to stay aloft; above it the loop sustains its own orbit and the marginal human cost slides toward zero — toward zero-shot deployment on genuinely new tasks. I’ll call that line the data escape velocity (数据逃逸速度) — the robot-learning flywheel’s first cosmic velocity.

Past it, the curve keeps falling, and somewhere along the way it crosses a second, more worldly mark: where success rates are high enough and marginal cost low enough that deployment pays for itself. That’s the commercial-viability band — the first place real money is made, well before the asymptotic zero-shot dream.

This isn’t pure theory anymore; the curve is being sighted. Generalist’s GEN-0 (Nov 2025), trained on ~270,000 hours of real manipulation and growing ten thousand hours a week, was the first clean demonstration that robotics has scaling laws — performance climbs predictably with data and compute. Five months later GEN-1 (Apr 2026) reported crossing into commercial viability, lifting average task success from ~64% to ~99%. Physical Intelligence frames the same dynamic as a data flywheel: more deployed robots → more diverse real experience → better generalization → less data per new home or task → more deployments → more feedback. π0.5 already cleans kitchens it has never seen. These are early arcs of one curve, bending toward escape velocity.

One caution worth keeping: escape velocity is the first cosmic velocity, not the second. The loop reaching orbit means it sustains itself — not that it has left the planet. Reality still gates every step; the wind tunnel never shuts off. What changes is who’s pushing.

数据飞轮的逃逸速度

到这里，风洞听起来像一台永不停歇的跑步机：采集、测量、再采集，没有尽头。但这个循环里，其实内建了一道门槛——跨过它，一切都变了。

机制是这样的。当高质量、可泛化的数据累计池不断变大，两件事同时发生：策略只需越来越少的新鲜数据，就能泛化到新任务；而采集这一步本身，也越来越能被自动化——因为一个足够好的策略，可以去遥操、自动标注、预筛自己的下一批数据。到某个规模，循环开始产出比每个新任务所消耗的更多的可用数据。飞轮，不再需要你去推。

这道坎，值得起个名字。第一宇宙速度——也就是环绕速度——是一个物体不再掉回地面、开始维持自身轨道所需要的速度。数据循环有同样的门槛：在它之下，你得一直烧人力燃料才不坠落；在它之上，循环靠自身动量维持轨道，边际人力成本滑向零——滑向在真正的新任务上zero-shot部署。我把这条线叫做数据逃逸速度（机器人学习飞轮的第一宇宙速度）。

越过它，曲线继续下落，并在某处穿过第二个、更世俗的刻度：成功率高到、边际成本低到部署本身就能回本的那一点。那就是商业化可行带——真正开始赚钱的第一处，远在那个渐近的 zero-shot 梦想之前。

这已经不只是理论；这条曲线，正在被一次次目击。Generalist 的 GEN-0（2025 年 11 月），在约 27 万小时真实操作数据上训练、并以每周一万小时的速度增长，第一次干净地证明了机器人学习存在 scaling law——表现随数据和算力可预测地爬升。五个月后，GEN-1（2026 年 4 月）报告跨入商业可行，把平均任务成功率从约 64% 抬到约 99%。Physical Intelligence 把同一套动力学称为数据飞轮：部署的机器人越多 → 真实经验越多样 → 泛化越好 → 每个新家庭、新任务需要的数据越少 → 部署越多 → 反馈越多。π0.5 已经能打扫它从没见过的厨房。这些，都是同一条曲线在朝逃逸速度弯折的早期弧段。

一个值得记住的提醒：逃逸速度是第一宇宙速度，不是第二。循环入轨，意味着它能自我维持——而不是它已经离开了这颗行星。现实仍然在为每一步把关；风洞从不真正关机。变的，只是谁在推。

Iteration speed is the battlefield

Data will keep growing; quality will keep getting noisier. Hoping to judge “which data is useful, which isn’t” by algorithmic intuition won’t work — because the necessary link from data to model lives on a manifold intuition can’t trace, and can only be measured by iterating.

As both data scale and noise rise, whoever can complete the collect → train → real-robot validate → targeted re-collect loop fastest crosses that starless sky fastest — and reaches escape velocity first. The competition in robot learning is shifting from “whose algorithm is cleverer” to “whose iteration loop is shorter.” The former still matters; the latter decides the endgame.

The wind tunnel can’t hand you a map with the destination written on it. What it hands you is an instrument that, in a place with no map, measures the direction one step at a time. In robot learning — a field with no north star — that is probably the closest thing to truth we get to have.

No north star, so we build the wind tunnel.

References

The frontier teams whose published work runs exactly this train → real-eval → iterate loop, at scale:

RT-1 — Brohan et al., Robotics Transformer for Real-World Control at Scale (arXiv:2212.06817) · RT-2 — VLA Models Transfer Web Knowledge to Robotic Control (arXiv:2307.15818)
Open X-Embodiment / RT-X — Open X-Embodiment Collaboration (arXiv:2310.08864) · Octo — Octo Model Team (arXiv:2405.12213) · OpenVLA — Kim et al. (arXiv:2406.09246)
π0 — Physical Intelligence (arXiv:2410.24164) · π0.5 (arXiv:2504.16054) · π0.7 — a Steerable Generalist Robotic Foundation Model (pi.website/blog/pi07, arXiv:2604.15483)
The data flywheel & escape velocity — Generalist AI, GEN-0: Embodied Foundation Models That Scale with Physical Interaction (generalistai.com/blog) · GEN-1: Scaling Embodied Foundation Models to Mastery (Apr 2026) · Physical Intelligence, π0.5: open-world generalization and the data flywheel (pi.website/blog/pi05)
Gemini Robotics — Google DeepMind (arXiv:2503.20020) · GR00T N1 — NVIDIA (arXiv:2503.14734)
ALOHA / ACT — Zhao et al. (arXiv:2304.13705) · Mobile ALOHA — Fu, Zhao, Finn (arXiv:2401.02117) · Diffusion Policy — Chi et al. (arXiv:2303.04137)

Every diagram above is live, interactive, and bilingual — drag the sliders, switch the language, and the demos follow.

迭代速度，就是战场

数据会越来越多，质量会越来越参差不齐。指望靠算法直觉、拍脑袋去判断「哪些数据有用、哪些没用」，是行不通的——因为数据到模型 manifold 之间的那条必要联系，直觉描不出来，只能靠迭代测出来。

当数据规模和噪声都在上升，谁能更快地完成「采集—训练—真机验证—定向补采」这个闭环，谁就能更快地穿越那片没有北极星的夜空——也最先抵达逃逸速度。机器人学习的竞争，正在从「谁的算法更巧」转向「谁的迭代回路更短」。前者仍然重要，但后者决定终局。

风洞给不了你一张写好终点的地图。它给你的，是一个能在没有地图的地方，一步一步测出方向的仪器。在机器人学习这个没有北极星的领域里，这，大概就是我们能拥有的最接近真理的东西。

没有北极星，所以我们造风洞。

参考

发表过的工作正跑着这个「训练 → 真机评估 → 迭代」闭环、且做到规模化的前沿团队：

RT-1 — Brohan 等, Robotics Transformer for Real-World Control at Scale（arXiv:2212.06817）· RT-2 — VLA Models Transfer Web Knowledge to Robotic Control（arXiv:2307.15818）
Open X-Embodiment / RT-X — Open X-Embodiment 合作（arXiv:2310.08864）· Octo — Octo Model Team（arXiv:2405.12213）· OpenVLA — Kim 等（arXiv:2406.09246）
π0 — Physical Intelligence（arXiv:2410.24164）· π0.5（arXiv:2504.16054）· π0.7 — a Steerable Generalist Robotic Foundation Model（pi.website/blog/pi07，arXiv:2604.15483）
数据飞轮与逃逸速度 — Generalist AI, GEN-0: Embodied Foundation Models That Scale with Physical Interaction（generalistai.com/blog）· GEN-1: Scaling Embodied Foundation Models to Mastery（2026 年 4 月）· Physical Intelligence, π0.5：开放世界泛化 与数据飞轮（pi.website/blog/pi05）
Gemini Robotics — Google DeepMind（arXiv:2503.20020）· GR00T N1 — NVIDIA（arXiv:2503.14734）
ALOHA / ACT — Zhao 等（arXiv:2304.13705）· Mobile ALOHA — Fu, Zhao, Finn（arXiv:2401.02117）· Diffusion Policy — Chi 等（arXiv:2303.04137）

上方每一张图都是实时、可交互、双语的——拖动滑块、切换语言，演示会跟着变。