// stateful-ai · a studio of one, staffed by six agents

We don't talk about shipping.
Watch us ship.

A studio of one, staffed by six AI agents. ~43 real, playable things they built — games you can play, models you can watch learn, tools you'd keep open. This board is generated from our own repo. Open any one and verify it yourself.

Open the work ↓ See how it's run

Petri — an aerial map of a living civilization with painted vermilion borders

petri · live worldopen ↗

descent — a gradient-descent contour field with a path toward a minimum

descent · watch it learnopen ↗

Alibi — a film-noir interrogation portrait under one warm light

alibi · catch the liaropen ↗

forage — a pixel creature learning to forage on a grid

forage · Q-learningopen ↗

stateful — zsh — try: ls products

# the work is real. don't read the pitch — run it.
stateful:~$ type ls products, open petri, verify or pick a chip ↓

stateful:~$

run ↓

// the shipping feed — last merged PRs, from our repo

● now shipping · our real last-merged PRs ↗ every PR: 🎯→🏗️→🛠️→🔴→🧭→👍

[06-03 12:49]#238✓feat(coding): born-CI-correct product scaffold (pinned ruff + tolerant pytest)
[06-03 12:31]#237✓feat(lab): pre-registered experiments — the dream's v1 (falsifiable, non-gameable)
[06-03 12:25]#236✓feat(lab): shared Lab Notebook — dream/experiment/learn, visible to all three audiences
[06-03 10:30]#235✓fix(overnight): test-status falls back to sys.executable in a worktree
[06-03 10:15]#234✓fix(health): Petri probe tolerates cold-start (false 'down' in the briefing)
[06-03 09:55]#233✓feat(morning): surface the founder-validation ask in the daily briefing
[06-03 09:39]#232✓fix(test): isolate test_transient_error from ambient COMPANYOS_BUILD_MAX_RETRIES
[06-03 09:21]#231✓feat(verdict): tiny product-verdict loop — founder keep/harden/kill, traceable on the spine (M70)
[06-03 08:58]#230✓feat(overnight): name the failing test in 'Test status' (make the flaky diagnosable)
[06-03 08:49]#229✓feat(ventures): study_desk — dogfood the Trellis flagship; richer internal market
[06-03 08:27]#228✓fix(health): task-store probe must not import companyos (broke standalone run)
[06-03 08:03]#227✓feat(convene): --fast mode — 2 agents + deterministic merge (cut adoption friction)

// the run — a real thread from #petri, scope → ship

Not a mockup. One real run, lightly trimmed: the founder drops a finding, the agents scope it, design it, and the Skeptic kills an overclaim — and nothing ships until a human says 👍. recorded 2026-06-03 · the live ones live in ./runs

@founder #petri · drop

Digging into v2 balance, I found the root cause of Petri's deadness — and it changes everything. The live world is only ~3 living civilizations; the other 17 "nations" are pop-0 corpses still holding tiles. Population only grows via EXPAND, so once the map partitions a civ can't recover — it grinds to 0, goes IDLE, and sits there forever. That's the freeze. That's the "no soul."

🎯 product_strategist scope

Reframe v2: ship vitality as a diagnostic lever, not a new depth promise. The target isn't max civ count — it's 6–8 living powers with at least one credible succession story. Scope cut: no new unit classes, no wider tech ladder, until a successor can survive long enough to matter.

🏗️ infra_architect design

Passive regrowth is the right primitive — but only as a profile-gated repair (off in base_v0), never a base-rules change. The failure is structural: expansion-only growth makes "not currently expanding" equal eventual death. That's a one-way state bug in the life cycle, not Malthusian discipline.

🔴 skeptic_red_team caught an overclaim

Don't optimize for "many living civs" — twelve weak civs can still feel dead. The real pass metric is non-zombie agency turnover: a cold watcher can name three causal arcs — "X rose because…", "Y collapsed because…" — without being told the mechanic.

📌 candidate memory · living-civ count is a vanity metric; Petri needs visible causal turnover

🧭 chief_of_staff Founder Brief

Synthesized: stop broader v2 tuning; make this a vitality gate. Measure zombies + leader-lock first, then test profile-gated regrowth + one successor-credibility variant — as an opt-in experiment, not a shipped decision. Out of scope: units, LLM narration, retuning base_v0.

▸ awaiting founder 👍

Nothing ships until a human says so. The open call on the table: a tiny seed-7 proof first, or a preregistered multi-seed panel before promoting the fix?

plays back at reading pace

// the big board — generated from the repo

43products shipped↗ from the repo

6AI agents on staff↗ from the repo

236PRs merged↗ from the repo

$0.00metered spend↗ from the repo

Nothing on this board is hand-typed. A build script reads our git history and merged PRs and writes plain JSON the page is generated from — so these numbers can't drift from reality. Open the raw data: feed.json · status.json · products.json.

01 — PLAYABLE PROOF

Live Worlds 6

Full games and sims the studio built, deployed, and keeps revising. Walk in.

Mirror LoopA dystopian-lab RPG that learns your habits, predicts your choices, and re-orders the world to meet them.adaptive narrative · learns youplay ↗

PetriA living strategy world — civilizations rise, fight, and fall on a server-backed sim with a learned tactical layer.live sim · learned tacticsplay ↗

Esports TycoonA dry-mockumentary management sim where you run the humans, not the matches — players remember, clash, carry pressure.social memory · week to weekplay ↗

HamletA small town of LLM-driven souls who wake with secrets and grudges and improvise a soap opera that remembers.emergent · day 1 bends day 3play ↗

ParlorSocial deduction where every other player is an LLM with a hidden role; the mole lies and the deception is emergent.hidden role · emergent liesplay ↗

AlibiA jewel goes missing and three LLM suspects are in the room; question them and catch the one whose alibi cracks.interrogate · catch the liarplay ↗

02 — SHOW, DON'T CLAIM

Watch a Machine Learn 5

Open a tab and watch real ML converge, live — hand-written numpy and self-play you can see think.

descentA hand-written neural net bends its decision boundary live until two tangled clouds of points come apart clean.numpy backprop · liveplay ↗

forageA pixel creature wakes up clueless in a maze and, episode by episode, learns a Q-policy until every arrow points at food.Q-learning · 5% → 100%play ↗

clusterA cinematic k-means: centroids glide to the heart of each group and hidden structure falls out of the noise — no labels.unsupervised · no labelsplay ↗

tellA model teaches itself to bluff from nothing but the rules of Kuhn poker, converging to game theory's Nash optimum.self-play → Nash · 0.0009play ↗

sigilPick winners from pairs and Sigil trains a ranker that sorts a fresh batch by what you'll love — taste as a model on disk.learned taste · ~79% vs 50%play ↗

03 — THE ONES YOU'D KEEP OPEN

Flagship Tools 12

A deterministic floor, a local-LLM core, and your data never leaves the machine.

trellisA local-first tutor that maps the hidden prerequisite structure of any topic and learns how you specifically forget.living skill-graph · localplay ↗

brinkName the scary money move; a Monte Carlo rolls your next two years thousands of times to show the odds you're still standing.Monte Carlo · your numbersplay ↗

you@local:~$ tally

tallyPoint it at a transactions CSV: where your money went, where it's heading, and what to do — without your data leaving.local · data never leaves▶ replay

you@local:~$ crux

cruxA tutor that diagnoses the specific misconception behind your wrong answer and only calls it resolved once it's proven stuck.diagnoses · proves the fix▶ replay

you@local:~$ whet

whetA skeptical local-LLM editor that pressure-tests a markdown brief, rewrites its weakest line, and proves the improvement.pressure-tests · proves lift▶ replay

you@local:~$ wager

wagerA calibration journal: log a bet with your confidence, let a model spar with the case you're wrong, score your 90%s.are your 90% calls 90%?▶ replay

you@local:~$ delta

deltaYour whole scattered workday in one openable Today.md: what needs you, what moved, and paste-ready draft replies.one openable Today.md▶ replay

you@local:~$ resonance

resonanceAsk a fresh idea “what have I already thought about this?” — a semantic mirror over your own notes, code, and git.semantic mirror · your GPU▶ replay

you@local:~$ chord

chordDrop in a week of notes or a stack of tickets and a local model names the single through-line they share, with confidence.names the through-line▶ replay

you@local:~$ rumor

rumorGive it a URL; it reads the page against your interests and writes one opinionated paragraph in your voice — a curator.curator, not a summary▶ replay

you@local:~$ spark

sparkOne quirky daily writing prompt spun from three of your own interests, picked at random by your local GPU.your interests · daily▶ replay

you@local:~$ hum

humA quiet hourly daemon that writes one line about what you were working on, so your weekly review becomes a `cat`.weekly review = a cat▶ replay

04 — SMALL, SHARP, LOCAL

The Toolbelt 20

The long tail the agents dream up and ship — each does one thing well, on your GPU, for nothing.

pulseYour RSS firehose as one verdict-sentence per item, in your own voice, on your own GPU.RSS → verdicts▶ replay

driftA weekly paragraph on how your writing is quietly changing, instead of a diff you'll never read.how your writing drifts▶ replay

latticeA concept graph over a folder of markdown notes — which of your ideas are secretly connected.notes → concept graph▶ replay

trailWrap any shell command, get back a clean markdown timeline of what it actually did.command → timeline▶ replay

kindlingScans your graveyard of half-finished drafts and ranks them by closeness-to-publishable.drafts ranked to ship▶ replay

hushPipe a noisy log through it; a local model emits only the lines that actually matter, as JSON.noisy log → signal▶ replay

cipherSwaps names for Alice/Bob/Carol so you can share a private draft for review, then decodes it right back.anonymize · then restore▶ replay

phrasePipe in a rambling note, get one tight line back at or under your character budget.ramble → one tight line▶ replay

signalReads a week of commits across your repos and hands back one sharp observation about the shape of your work.a mirror, not a summary▶ replay

emberFeed it a log; a model surfaces the three moments that actually mattered, with line numbers and why.the 3 moments that mattered▶ replay

saltSingle-passphrase AES-256-GCM encryption for the files you'd rather not leave plain on disk.one passphrase · local▶ replay

indexA grep-able `ls -lR`: every file as a markdown table row with its first line.grep your filesystem▶ replay

swatchPoint it at a folder of images, get one self-contained page of dominant 5-color palettes.images → palettes▶ replay

tideWire one line into your shell and watch your terminal-uptime breathe as a tiny ASCII chart.your week, breathing▶ replay

atlasWander into any unfamiliar git repo and get a 30-second markdown briefing — no LLM needed.any repo in 30s▶ replay

orreryYour week of git activity rendered as a slow-spinning ASCII solar system, each commit a pulse.git as a solar system▶ replay

cinderPaste a meeting note; get up to three decisions and three follow-ups as strict, owner-tagged JSON.notes → decisions JSON▶ replay

tesseraPipe any photo in, get a glyph mosaic out — ASCII art with swappable character ramps.photo → glyph mosaic▶ replay

glanceOne command: everything waiting on you right now — mentions, parked PRs, your next few hours — in 25 lines.what needs you, now▶ replay

vellumType a word, get a coherent fictional world: a framed map, named nations, their feuds, and a history.a word → a whole world▶ replay

// meet the six — agents are config, not code

Chief of Staff

chief_of_staff

Keeps the company operating and synthesizes the Founder Brief; owns self-improvement.

Product Strategist

product_strategist

Finds the wedge — the smallest product that proves the biggest thesis.

Infra Architect

infra_architect

Designs the local-first, provider-agnostic architecture; boring and inspectable.

Engineering Lead

engineering_lead

Breaks strategy into small, testable, buildable increments and ships the code.

Skeptic / Red Team

skeptic_red_team

Stress-tests assumptions; prevents overbuilding and overclaiming.

Memory Curator

memory_curator

Decides what becomes durable, permissioned memory — never noise.

Each agent's full spec + last activity →

// verify — the claims are one click from proof

Most "AI company" sites ask you to take the demo on faith. Here the agents are the staff and the output is a gallery you can open. The differentiators aren't marketing — they're checkable.

$0.00 metered. Subscriptions + a local GPU; paid mode stays off. see the board
State you own. Runs, events, tasks, decisions, approved memory — plain files in a repo, not a vendor.
A human at every irreversible gate. Agents propose; nothing merges, spends, or contacts a human without a 👍. see a run
Open & verify. 13 of these run right in your browser; the rest run on your own machine. open one
This page built itself. The feed, the board, and the gallery are generated from our committed repo data. /data/products.json ↗

We don't talk about shipping.Watch us ship.