Cell LabA self-managing digital "service cell" under hidden disturbances — a UNI active-inference controller vs rule-based and random baselines, designed so it can be proven wrong.

A hidden 216-state world (health × demand × resource × dependency × config) is hit by disturbance families it never announces. Controllers see only four noisy telemetry modalities and must keep the cell viable. The UNI controller infers a belief q(s) and plans by expected free energy; every number below is labeled (VFE / EFE / surprisal / ambiguity / risk / RecoveryScore). This is a benchmark, and a UNI loss is reported as plainly as a win.

216 hidden states · 10 actions · 7 disturbance families observation-only agent pre-registered & seeded

Service cell — live topology

Canvas is unavailable in this browser, so the animation is disabled. The lab still runs; the panels on the right report the agent's belief, chosen action, expected free energy, and viable-set status as text.

Inferred belief q(s) — factor marginals

Marginal probability the agent assigns to each level of each hidden factor, from its 216-state belief. The agent never observes these directly — it infers them from telemetry.

Telemetry (what the agent sees) + decision

Chosen action

—

Expected free energy of the choice (labeled)

EFE (G): — ambiguity: — risk: —

Top-5 policies — ranked by expected free energy G (lower is better)

Run

—

RecoveryScore

—

% viable

tick

Controller & disturbance

Controller

Inject disturbance

Disturbances are hidden from the agent — it must infer trouble from telemetry alone.

Planning depth 2

Depth-2 = 100 policies (live). Depth-3 = 1000 (heavier; offline bench uses depth-3).

Seed

Mulberry32 seed — any run reproduces exactly from its seed.

Baseline compare (seeded, 80 ticks)

Controller	RecoveryScore
Run a comparison to populate.

The downloaded run carries the seed + per-tick log + scores so anyone can reproduce or refute it.

How this could be wrong

If a baseline (rule-based or random) matches or beats UNI on RecoveryScore across seeds, the "active inference helps here" claim is falsified for this world.
If the UNI controller's belief is uninformative (high entropy throughout), its planning signal (EFE) is noise.
If results don't reproduce from the seed, the benchmark is broken.
If a Rao-style structural controller wins, that is a real result — shown, not hidden.

Rao challenge mode — Prove UNI Wrong

Upload a knowledge structure (knowledge_nodes + relations) — the operational conditions a self-managing cell recognizes from telemetry, and the remediations between them — then run it head-to-head against UNI on shared seeds. The Rao-native controller acts directly from that graph; it is our own structural controller (Class C — code-indicated), framed after Mikkilineni (2022), Information 13(1):24, and it does not reproduce Rao's verified method. The same structure is also compiled into UNI priors (the UNI-translated controller). Results are reported honestly: a Rao-native win is shown as plainly as a UNI win — and so is a rule-based or random win.

Knowledge structure (JSON)

Paste a structure and validate it. Malformed, empty, or wrong-dimensioned uploads are rejected with a specific, human-readable reason.

Disturbance for the head-to-head

database_flaky is the committed case where a heuristic already beats UNI — a fair place to try to prove UNI wrong.

Controller	median RecoveryScore (6 seeds)	outcome
Load the example (or upload your own structure), then run the head-to-head. Contenders: Rao-native, UNI-translated, UNI (no overlay), rule-based, random.

Higher RecoveryScore is better. Significance is a bootstrap 95% CI on the median paired difference (excludes 0), never a single seed. The whole run is reproducible from its seeds.

Leaderboard — offline benchmark (committed cache, seeded)

Loading the committed benchmark…

Disturbance family	Ranked by RecoveryScore (controller: score)	UNI vs neural (median Δ, 95% CI)
Loading…

Ranked from the committed cell-bench-cache.json (no in-browser recompute; seeds shown above). "sig" = the bootstrap 95% CI for the median paired difference excludes 0. A UNI loss (neural or rule-based ahead) is shown as plainly as a win.