Cell LabA self-managing digital "service cell" under hidden disturbances — a UNI active-inference controller vs rule-based and random baselines, designed so it can be proven wrong.
A hidden 216-state world (health × demand × resource × dependency × config) is hit by disturbance families it never announces. Controllers see only four noisy telemetry modalities and must keep the cell viable. The UNI controller infers a belief q(s) and plans by expected free energy; every number below is labeled (VFE / EFE / surprisal / ambiguity / risk / RecoveryScore). This is a benchmark, and a UNI loss is reported as plainly as a win.
Marginal probability the agent assigns to each level of each hidden factor, from its 216-state belief. The agent never observes these directly — it infers them from telemetry.
Disturbances are hidden from the agent — it must infer trouble from telemetry alone.
Depth-2 = 100 policies (live). Depth-3 = 1000 (heavier; offline bench uses depth-3).
Mulberry32 seed — any run reproduces exactly from its seed.
| Controller | RecoveryScore |
|---|---|
| Run a comparison to populate. | |
The downloaded run carries the seed + per-tick log + scores so anyone can reproduce or refute it.
- If a baseline (rule-based or random) matches or beats UNI on RecoveryScore across seeds, the "active inference helps here" claim is falsified for this world.
- If the UNI controller's belief is uninformative (high entropy throughout), its planning signal (EFE) is noise.
- If results don't reproduce from the seed, the benchmark is broken.
- If a Rao-style structural controller wins, that is a real result — shown, not hidden.
Upload a knowledge structure (knowledge_nodes + relations) — the operational conditions a self-managing cell recognizes from telemetry, and the remediations between them — then run it head-to-head against UNI on shared seeds. The Rao-native controller acts directly from that graph; it is our own structural controller (Class C — code-indicated), framed after Mikkilineni (2022), Information 13(1):24, and it does not reproduce Rao's verified method. The same structure is also compiled into UNI priors (the UNI-translated controller). Results are reported honestly: a Rao-native win is shown as plainly as a UNI win — and so is a rule-based or random win.
database_flaky is the committed case where a heuristic already beats UNI — a fair place to try to prove UNI wrong.
| Controller | median RecoveryScore (6 seeds) | outcome |
|---|---|---|
| Load the example (or upload your own structure), then run the head-to-head. Contenders: Rao-native, UNI-translated, UNI (no overlay), rule-based, random. | ||
Higher RecoveryScore is better. Significance is a bootstrap 95% CI on the median paired difference (excludes 0), never a single seed. The whole run is reproducible from its seeds.
Loading the committed benchmark…
| Disturbance family | Ranked by RecoveryScore (controller: score) | UNI vs neural (median Δ, 95% CI) |
|---|---|---|
| Loading… | ||
Ranked from the committed cell-bench-cache.json (no in-browser recompute; seeds shown above). "sig" = the bootstrap 95% CI for the median paired difference excludes 0. A UNI loss (neural or rule-based ahead) is shown as plainly as a win.