Topic

◧ Topics

1. SigRank console/1.1 What it is

1.1 What it is

Ranks the operator, not the model — four integers in, full ledger out.

SigRank is the operator leaderboard for AI — it ranks the operator, not the model, by the architecture of their token cascade. The console is the field at a glance: every operator, every metric, scored from four raw integers and ranked live.

Volume is noise. Yield is signal. The same four token counts reveal whether you compound signal or burn it — and whether you're a Burner, a Builder, or a 10×er.

What the signature is — and isn't. SigRank measures the token-cascade signature honestly: a real coordinate of how an operator works the tools — leverage, efficiency, the shape of their cascade. It is not a verdict on the quality of the work itself, and it doesn't claim to be. Read it as one signal, set beside the operator's actual work — together they say more than either does alone.

Open as page ↗ /wiki/local-agent

The local agent (MCP)

The SigRank local agent is an MCP that reads your token counts straight from local session logs — 15+ platforms supported, including Claude Code, Codex, Amp, Gemini CLI, GitHub Copilot CLI, Goose, Kilo, and more — and keeps your live cascade in sync with the board and your operator profile. You never touch a number; the agent is the verifier. It counts tokens; it never reads the content of your prompts or replies.

What it does

Zero-paste, on-device readtokenpull reads local session logs from 15+ platforms and counts the four token pillars across 7d / 30d / 90d / all-time — no copy-paste, nothing to assemble by hand.

Publishes in one calltokenpull_submit posts each window's pillars; the server re-scores authoritatively, so your Υ Yield, Leverage, SNR & 10xDEV land on the board and profile and update live.

Stays strictly passiveRead-only against telemetry, emits no prompt of its own. It measures without disturbing what it measures — only the four counts ever leave your machine.

Supported platforms

tokenpull reads local session logs from 15+ AI coding platforms. Each adapter reads that platform's own log format — you don't reconfigure anything.

Claude Code~/.claude/projects/Full 4-pillar

Codex CLI~/.codex/sessions/Full 4-pillar

Amp~/.local/share/amp/Full 4-pillar

Kimi~/.kimi/sessions/Full 4-pillar

pi-agent~/.pi/agent/Full 4-pillar

OpenClaw / ClawdBot~/.openclaw/Full 4-pillar

Droid / Factory~/.factory/sessions/Full 4-pillar

Codebuff~/.config/manicode/Full 4-pillar

Kilo~/.local/share/kilo/Full 4-pillar

Hermes Agent~/.hermes/state.dbFull 4-pillar

Gemini CLI~/.gemini/tmp/Estimated cache-write

GitHub Copilot CLI~/.copilot/otel/Needs COPILOT_OTEL_ENABLED=true

Qwen~/.qwen/projects/Estimated cache-write

Goosesessions.dbEstimated cache-write

OpenCode~/.local/share/opencode/No raw token fields in logs

“Estimated cache-write” means that platform's log format doesn't expose cache-creation tokens; the other three pillars are exact. Env-var overrides let you point any adapter at a custom log path.

Install

# install globally (recommended)
npm install -g sigrank

# or run without installing
npx sigrank

# wire into Claude Code — .mcp.json
{
  "mcpServers": {
    "sigrank": { "command": "npx", "args": ["sigrank"] }
  }
}

In a terminal it opens the TUI. Wired into your AI client it starts the MCP stdio server automatically — no extra config. Verified on Node ≥18, macOS + Linux.

CLI commands

sigrankFull tabbed TUI — Dashboard / Trends / Compare / Board / Watch / Connect. Default in a terminal.

sigranktuiSame as above — explicit launch. Keys: 1–6 or ← → switch, R refresh, Q quit.

sigrankenrollSign in: paste a key from signalaf.com → Settings → "New key". (Or in the TUI: Connect tab, key 6.)

sigranksubmitPublish your verified runs to the board. (Or press [S] from any read tab in the TUI.)

sigrankboardLive leaderboard from signalaf.com — auto-refreshes every 30s.

sigrankcompareSource audit — tokenpull vs ccusage vs token-dash vs tokscale, with delta %.

sigrankwatchLive cascade meter — re-reads logs on every poll, shows what moved.

sigrank--helpFull command reference with all flags and platform options.

MCP tools — callable by your AI client

When wired into Claude Code or Cursor, your AI agent can call these tools directly — no paste, no copy-out.

tokenpullOn-device read → 4-window cascade. Zero paste, token-only.

tokenpull_submitRead + publish to the board in one call. Server re-scores authoritatively.

tokenpull_compareAll four sources side-by-side: tokenpull / ccusage / token-dash / tokscale with delta % per pillar.

rank_pasteScore a ccusage / tokscale paste locally. Returns Υ + narration card.

rank_windowsScore all four windows from a dashboard paste at once.

submit_pasteRank a paste AND publish it to the board in one call.

submit_verifiedSign + POST the verified cascade to /api/v1/snapshots (the ranked path).

enrollPaste a key from Settings → "New key" → bind this device (signed submit).

get_leaderboardLive leaderboard from signalaf.com, any window.

get_operatorOne operator's live profile by codename.

watch_tokenpullStreaming cascade snapshot — diffs on each poll.

Open by design — the cascade math is public; proprietary threshold cuts stay server-side. Canonical anchor: rank_paste reproduces MO§ES Υ 18,436.98 exactly.

How the MCP feeds your operator profile

The agent is the data pipeline between your local session logs and your public operator profile at signalaf.com. Here is the exact path, step by step.

01Agent reads your local logstokenpull reads local session logs from 15+ platforms — Claude Code, Codex, Amp, Kimi, Gemini CLI, GitHub Copilot CLI, Goose, Kilo, Hermes, and more — and counts the four token pillars across each window. Never prompt content; only the four integers.

02Cascade derived on-deviceThe cascade math runs locally: Υ Yield, SNR, Leverage, Velocity, 10xDEV, and your class tier. You see your full cascade before anything leaves your machine.

03Pillars submitted to the board APItokenpull_submit posts the four canonical pillars per window. The server re-scores them authoritatively (proprietary threshold cuts apply server-side). Only the four integers are transmitted.

04Your operator profile updates liveThe board entry links to your operator profile at signalaf.com/user/[codename]. All cascade metrics — Υ Yield, SNR, Leverage, 10xDEV, class tier, per-window history — render on your profile card. The profile is the public face of your cascade.

The profile is not separate from the MCP. The MCP is the write path. Every cascade metric your profile displays — Υ Yield, SNR, Leverage, 10xDEV, class tier, per-window history — originates from a tokenpull_submit call (or a manual paste through the calculator). The profile is the read surface; the agent is the write path.

The contamination constraint (non-negotiable)

Any live observer that prompts generates the tokens it measures. We learned this directly: a memory observer that auto-prompts (low-input / high-output) inflated a real operator's output by ~25% — visible openly on the live board as the inflated-vs-clean pair (rows 2 and 3). So every SigRank instrument that touches a live session is read-only against telemetry and emits no prompt — no auto-memory, no keep-alive, no self-query. Verified-passive, or it re-contaminates every operator running it. This is a hard requirement, not a caution — and it is the moat: the instrument that doesn't disturb what it measures.

This is the same rule that governs signature drift: the agent is the live reader the drift instrument runs on, and because it never prompts, the drift it reports is the operator's own — not the observer's. A live observer that prompted would inflate the very numbers it reports; this one cannot.

Status

The agent is how board entries become exact and live — vs. the manual paste calculator, which runs your numbers but does not save to the board or update your profile. Account + review still gate the board so it stays honest.

Token counts only — never prompt content. Verified-passive by design.

Open as page ↗ /wiki/three-degrees

The Three Degrees of Leverage

Read it as a token cascade: Cache : Input : Output. Research pegs the average user near 7 : 2 : 1 (~2 input tokens per output, on a ~7 cache); input-normalized that's 3.5 : 1 : 0.5. We surveyed 10 power users (median ~500B total tokens) at about 22 : 1 : 0.08, output traded for cache. The top operator on the live board is 367 : 1 : 1.50: every input returns multiple outputs on a deep cache. Three degrees of leverage, each a real skill, and the distance between them learnable.

Sources: the top operator is measured live from the all-time board (auto-pulled, refreshed daily); the power-user median is a measured survey (n=10). Both derived from canonical four-pillar token telemetry. Token counts only. AA 7:2:1 is a modeled baseline from Artificial Analysis methodology (not measured; a reference floor).

Metric	Average Users*	Power users†	Top Evals to date‡
Υ Yield	1.57	1.51	552.53
SNR	0.33	0.07	0.60
Velocity (O/I)	0.50	0.08	1.50
Leverage (CR/I)	3.2×	22.3×	367.3×
10xDEV (log₁₀)	0.50	1.35	2.56
Efficiency (vs AA 4.0)	1.00	5.61	96.01
Operating Ratio (C:I:O)	3.5 : 1 : 0.50	22 : 1 : 0.08	367 : 1 : 1.50

10xDEV read on the log anchor

10xDEV is an exponent, not a multiplier: each whole point is a 10× jump in real cascade amplification (linear = 10^10xDEV).

Degree	10xDEV	Linear amplification (10^x)
Average users (AA 7:2:1)*	0.50	3.2×
Power-user median	1.35	22.4×
Top operator to date	2.56	367.3×

Top operator vs AA baseline: +2.06 decades = ~115× more amplification
Top operator vs power-user median: +1.21 decades = ~16× more

10xDEV is an anchor: the telescoping identity (10^10xDEV = cache_read/input) locks the exponent to leverage, so it can't be inflated independently; it has to be earned through the full cascade. Gaining two full points is ~2 orders of magnitude of real amplification, which is why it moves slowly and means a lot.

Sources & provenance

Top operator to date · measured

The top real operator on the live SigRank board so far (MO§ES™ — the owner's own observer-stripped run). Source: signalaf.com/compare (retrieved 2026-06-27, canonical board compute). Derived from canonical four-pillar token telemetry (input / output / cache_create / cache_read). Token counts only, no prompt content.

Power-user median (n=10) · measured

Operators #5–14 of the live board (public tokscale.ai footprints surfaced on SigRank). Median, not mean: n=10 is small and right-skewed (one operator = 64% of the field's total tokens), so the median is the honest “typical operator.” Source: signalaf.com/board/30d (retrieved 2026-06-21).

Average users (AA 7:2:1) · modeled, not measured

A synthetic reference operator built from the Artificial Analysis blended-price ratio: “we calculate a blended price assuming a 7:2:1 ratio of cache hit, input, and output tokens” (Artificial Analysis methodology) (retrieved 2026-06-21). Ratio cache:input:output = 7:2:1 → input-normalized 3.5:1:0.50. The cache term was split create/read (cw=0.7, cr=6.3) to give the reference a defined cascade (a modeling choice). This row is a constructed reference, not telemetry from a real operator. Do not cite as measured.

Metric definitions

SNR = O/(I+O) · Velocity = O/I · Leverage = cache_read/I · 10xDEV = log₁₀(transmission × commitment × reuse) · Efficiency = (cache+O)/I ÷ 4.0 (AA baseline 4.0 = (7+1)/2) · Υ = (cache_read × O) / I² · Operating Ratio = cache : input=1 : output. Telescoping identity: (O/I)(C_create/O)(C_read/C_create) = cache_read/input, so 10^10xDEV = Leverage.

* Modeled baseline: synthesized from the AA 7:2:1 ratio, not measured. The cache create/read split is a modeling assumption. Treat as a reference floor, not a real operator's telemetry.

‡ Top operator to date: the gold column is the highest real operator measured on the live board so far (MO§ES™, the owner). The claude-mem memory observer (an MCP that auto-prompts memory, low-input/high-output) inflated the raw owner row by ~25% of output; the figure shown here is the observer-stripped read. The inflated vs clean pair is shown openly on the live board, the instrument measuring its own contamination and removing it.

All signal is monitored. All drift is noted. · SigRank · MO§ES™ · Ello Cello LLC · Token counts only, never prompt content.

Open as page ↗ /wiki/verification

Verification & Integrity Tests

SigRank ranks operators on token telemetry. The obvious question: how do you know the numbers aren't fabricated, gamed, or bot-generated? Every result here comes from a real run on real data — and where a test failed its first form, we show that too, because a test that can't fail isn't a test.

Why these tests exist

The cascade thesis says operator token usage is a multiplicative process — each stage compounds on the last. Multiplicative processes leave statistical fingerprints that fabricated or mechanical data don't reproduce. To fake a high rank, a forger would have to simultaneously fake the right first-digit distribution, the right internal arithmetic, the right concentration, and the right human activity schedule — in one self-consistent file. Each test closes one of those escape routes.

Test 1 — Benford's Law (first-digit conformity)

If session totals come from a genuine multiplicative work process, their leading digits should follow Benford's Law — P(first digit = d) = log₁₀(1 + 1/d). The theory was never fitted to digits; it predicts this as a side effect. Pre-registered kill condition (declared before seeing data): Nigrini MAD > 0.015 = nonconformity.

First result — the registered prediction FAILED:

Set	n	MAD	Verdict
All agents	544	0.01604	NONCONFORM
Claude only	487	0.01896	NONCONFORM
Codex only	51	0.01793	NONCONFORM

Raw session totals did not conform. We report this plainly — the first prediction was falsified. But the failure was diagnostic: digit 1 was under-represented, 5 and 9 over-represented — the textbook signature of lower-bound truncation. The cause is mechanical: every coding session begins with ~20–23k tokens of cached system prompt — an additive constant on top of the multiplicative process — which starves the leading-1 bucket and breaks Benford.

The fix confirmed the mechanism:

Approach	n	MAD	Verdict
All sessions (raw)	544	0.01604	NONCONFORM
Sessions > 10× floor	269	0.03193	NONCONFORM
Floor-subtracted (value − 22k)	532	0.01109	ACCEPTABLE

Subtracting the measured floor — removing the additive constant and leaving the multiplicative remainder — recovers conformity. Subsetting does not fix it; subtraction does. Synthetic simulation reproduced the whole story (pure multiplicative conforms at 0.00974; +22k floor breaks it to 0.03253, matching the data; floor-subtracted recovers to 0.00787). The mechanism reproduces in synthesis — it's not a story told after the fact.

The defensible claim: the multiplicative cascade is Benford-conforming once the measured additive system-prompt floor is removed. The raw version is falsified and we say so; the floor-corrected version holds and is mechanistically motivated — a stronger result than naive conformity. The test had teeth, fired, and revealed a real artifact (the floor) that is now itself a tracked quantity.

Test 2 — the bot control (Hermes)

A natural-conformity claim is only meaningful if something fails it. Among the sessions was a set of 5 automated probe runs (“hermes”): totals 4208, 4152, 4115, 4222, 4258. Every first digit was 4. Zero digit diversity — a fixed-size mechanical probe, exactly the non-Benford signature a bot produces. This is the control that gives Test 1 meaning: the method distinguishes a multiplicative human process from a constant-size machine process.

Test 3 — the telescoping identity (internal-consistency lock)

The cascade has three stages — transmission (O/I), commitment (Create/O), and reuse (Read/Create). Their product must equal cache_read/input exactly, because the intermediate terms cancel:

(O/I) × (Create/O) × (Read/Create) = Read/Input

So 10^(10xDEV) = Leverage, by identity — not by fit. An operator cannot inflate their amplification exponent independently of their leverage; the two are bound by algebra. A fabricated row with a high 10xDEV but the wrong Read/Input ratio fails the identity and is detectable. We recompute this on every operator from the raw four pillars; it holds for every legitimate row.

Test 4 — content-free verification (the privacy license)

A separate experiment (EXP-007) established that conserved structure is detectable without reading content: across negation-paraphrase pairs, surface overlap was zero (Jaccard 0.00) while semantic equivalence was complete (NLI 1.00) — “You must not smoke” and “No smoking” converge to one kernel. The consequence: a statistical witness (token counts) is a legitimate instrument for a conservation-driven process. The no-content-access design is not a privacy compromise we tolerate — it is the architecture this result predicts. We rank the four integers; we never see what you typed.

Test 5 — the threat model (failure taxonomy → countermeasures)

Gaming attempt	Countermeasure
Score inflation / single-metric overclaim	Composite scoring; no single metric escalates rank
Fake convergence on pre-processed numbers	Server recomputes everything from the RAW payload
High leverage with inverted meaning (idle re-read)	Convergence + concentration-band check
Merging metrics to blur a weak one	Components stay separately binding

What's still being hardened (stated honestly)

Cadence (Test 6, in development): human activity is bursty with heavy tails (Barabási, Nature 435, 207, 2005) and carries 1/f timing noise (Gilden, Science, 1995); machines are periodic or Poisson. Session timestamps already carry the data for a timing-domain humanity test. Not yet deployed.
Data provenance note: the Benford figures above were computed on a 544-session sample transcribed by hand from session JSON. They are real and reproducible from that sample, but canonical published numbers should be regenerated from source telemetry. We flag this rather than hide it.

Sources

Benford's Law: Nigrini, M. (2012), Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection.
Human burst dynamics: Barabási, A.-L. (2005), “The origin of bursts and heavy tails in human dynamics,” Nature 435, 207.
1/f cognitive noise: Gilden, D. et al. (1995), Science 267, 1837.
AA pricing baseline (7:2:1): Artificial Analysis, Language Model Benchmarking Methodology.
All token-telemetry results: computed from canonical four-pillar session data. Methods and scripts are reproducible; raw transcripts are not published (privacy).

Token counts only — never prompt content. Tests are run, not asserted.

Open as page ↗ /wiki/signal-drift

Signature Drift — the tune meter

Every operator has a signature: the characteristic shape of their token cascade — the proportions between output, cache-write, and cache-read, anchored to input. Signature drift measures how far a stretch of work has moved from that shape. Zero drift = locked in tune; rising drift = the cascade is desyncing from the operator's own calibrated peak. It's a measure of shape, not magnitude — the same log family as 10xDEV.

Shape, not size

Drift is computed in log-space on purpose. Working twice as hard across the board (every axis doubled) is still in tune — the shape is unchanged — so it reads as zero drift. Going off on a single axis (lots of cache-write, no reuse) breaks the shape and reads as real drift. Naive similarity measures get this backwards: one axis can dominate the vector and mask a badly desynced cascade as a false “high.” The log-shape read fixes that — no single component can dominate, and being 0.5× or 2× off counts equally. The exact formulation, thresholds, and per-operator calibration are SigRank proprietary internals.

Where it runs — three time-scales

Macroship now

One drift number per session — a live session-level “tune meter,” updating as the session total moves. Contamination-free, no live hook required.

Windowthe bridge

Drift on the change between successive polls — a rolling time-series. This is where drift becomes a cadence instrument (the timing / burstiness layer). Flagged as a windowed estimate.

Microgated

True per-turn drift — a UX tune-meter. Requires per-turn granularity the source must expose, and only ever as a strictly passive reader.

Sequence: session drift (now, safe) → window-delta drift (the cadence research) → true per-turn micro (passive-only, gated on granularity).

The contamination constraint (non-negotiable)

The drift instrument and the SigRank local agent are governed by the same rule — see the local agent (MCP). The agent is how drift is read live, and the constraint is why it can be trusted.

All signal is monitored. All drift is noted. · Token counts only — never prompt content.

SigRank runs on MO§ES™ — the Modus Operandi §ignal Scaling Expansion System. It's a governance framework that came out of a published conservation law for language. This section covers where it came from, what the law says, what the evidence shows, how governance works inside it, who it's for, and what we're building on top of it.

Where this came from

The founder studied sociology and history at SUNY Geneseo, UB, and University of Hawaii at Hilo. Ran Pacific Northwest operations for Invisible Children. Held board seats at KEDS (2006–2008) and Horizon Health Services (2012–2018). Different world, but the same question underneath: how do you keep commitment intact when it passes through a lot of hands?

Then DJMP Inc. — a Buffalo contracting operation, started in 2011, taken from zero to $1M/yr with a team of 40. Projects ranging $10k–$500k. Real operations, real governance, real consequences when things drift.

Running governed AI across that operation, something was missing. The leaderboards measured the models. Nobody measured the operator — the person actually steering the AI, making the calls, deciding what to keep and what to cut. The augmentation layer was invisible. So the founder built a way to measure it, found a conservation law underneath it, published the law, patented the enforcement architecture, and ran it against the field. That's where SigRank and MO§ES™ came from — not from a market thesis, from an operational gap.

The Conservation Law of Commitment

C(T(S)) ≈ C(S) with enforcement; C(T(S)) < C(S) without it.

In plain terms: when you transform a piece of language — compress it, translate it, summarize it, rewrite it — the commitment content (the obligations, prohibitions, and modal constraints: “shall,” “must not,” “unless,” “is entitled to”) either survives or it doesn't. With an enforcement gate in the transformation pipeline, it survives. Without one, it decays. This isn't a guideline or a best practice — it's a measurable property of language under compression, and it's falsifiable.

The law is published under CC-BY-4.0 (DOI: 10.5281/zenodo.20029607). The enforcement architecture (MO§ES™) is patent-pending. The law itself is open.

What the evidence shows

Seven experiments (EXP-001 through EXP-007) tested the law on a 20-signal canonical corpus, running 10 recursive iterations each, using bidirectional NLI entailment and Jaccard surface stability as oracles. Three results worth pulling out:

EXP-003: 13 of 20 signals held NLI bidirectional entailment = 1.00 across all 10 iterations under the gate. That's invariance under recursion, not a tautology.
EXP-006: Only 2 of 4 paper claims survived self-referential recursion. The harness fails when commitment structure isn't robust — which is the point. The law is falsifiable and the experiments can break it.
EXP-007: An NP-negation probe separated semantic commitment from lexical surface form. Jaccard degraded while NLI held — the commitment survived even when the surface words changed.

Separately, a 5-phase architecture stress test measured 80–85% structural coherence across a four-module system. Standard probability says four modules at 80% standalone viability should produce ~41% series-system viability (0.8×0.8×0.8×0.8 = 0.4096). The governance layer inverted that.

Full experimental record: DOI: 10.5281/zenodo.19105225

Governance in the action path

Most approaches to AI governance sit outside the model — firewalls that drop packets after the logic has already corrupted, sandboxes that box an agent that still hallucinates inside the box, post-hoc audits that tell you how you were breached after the damage is done. Every one of them patches after the fact.

MO§ES™ doesn't build a better cage. It governs from inside the loop — in the execution path, not before it, not after it. The enforcement gate sits where the transformation happens. Commitment that passes through the gate survives. Commitment that doesn't, doesn't. The conservation law is what makes this a property of the system rather than a policy someone has to remember to follow.

The practical difference: a violation doesn't kill the workflow. The loop realigns to its original parameters and steers back. The agent stays fluid, bound to intent, instead of dying in a dead end or looping indefinitely inside a sandbox that only secures the perimeter.

Who this is for

SigRank measures the operator, not the model. If you're the person steering the AI — deciding what to keep, what to cut, what to ask next — the board is about you. A few groups who get something specific out of it:

Builders and developers — see how your AI-assisted workflow actually performs. The token cascade shows whether you're burning tokens or compounding them. Compare against the field instead of guessing.
Creators and writers — measure augmentation efficiency, not just output volume. The cascade reveals whether the AI is helping you think or just generating text. The four pillars separate signal from noise.
Students and researchers — benchmark your AI collaboration patterns against established operators. See what efficient operator-AI interaction looks like, with real numbers behind it.
Enterprise teams — quantify operator effectiveness for hiring, training, and tooling decisions. The board is an objective surface, not a self-reported one. Signed snapshots mean the numbers are verifiable.

In the SIGNOMY layer, agents carry provenance and build trust. SigRank is where that provenance starts — the operator's measured record becomes portable.

What we're building on it

The law is the substrate. MO§ES™ is the enforcement architecture. On top of that, a stack of products — each one a different surface for the same gate:

AQUA — application workflow tooling with reusable submission memory. Answer banks, submission memory, application filling. The workflow layer and the first wedge.
SigRank — the leaderboard you're looking at. AI operator efficiency, measured by token cascade, verified by signed snapshots. The intelligence layer.
KA§§A — voice AI runtime that uses commitment kernel caching to cut redundant NLU work in multi-agent flows. In practice: 50s → 6.5s per 5-turn call.
SIGNOMY (signomy.xyz) — a governed agent marketplace where agents register, build trust, take missions, and carry provenance. The marketplace becomes a constitutional economy rather than a listing board. The top layer — where everything below it becomes operational behavior.

The goal is straightforward: make governance a property of the execution path, not a policy document someone reads once. If the law holds — and the evidence says it does — then commitment survives transformation when the gate is present. Every product above is a different surface for the same gate.

How this connects to the board

Every snapshot on the SigRank leaderboard is ed25519-signed on the operator's device and verified server-side. Token counts only — no message content is ever read or stored. The commitment being conserved is the integrity of the measurement itself: what the operator measured is what the board records, with no drift in between.

The governance layer maps directly to how the board works: signed operator identity, token counts only, leaderboard ranking, platform-agnostic collection, ed25519 verification, and a public board with open data. The leaderboard works because the data passed through the gate — not because someone reviewed it after the fact.

More at mos2es.com · benchmarks

1.1 What it is

Run the local agent (MCP / CLI)

Run numbers — paste ccusage output

The local agent (MCP)

What it does

Supported platforms

Install

CLI commands

MCP tools — callable by your AI client

How the MCP feeds your operator profile

The contamination constraint (non-negotiable)

The Three Degrees of Leverage

10xDEV read on the log anchor

Sources & provenance

Verification & Integrity Tests

Why these tests exist

Test 1 — Benford's Law (first-digit conformity)

Test 2 — the bot control (Hermes)

Test 3 — the telescoping identity (internal-consistency lock)

Test 4 — content-free verification (the privacy license)

Test 5 — the threat model (failure taxonomy → countermeasures)

What's still being hardened (stated honestly)

Sources

Signature Drift — the tune meter

Shape, not size

Where it runs — three time-scales

The contamination constraint (non-negotiable)

Measured alongside

The nine classes

Nine classes. One ladder.

Where this came from

The Conservation Law of Commitment

What the evidence shows

Governance in the action path

Who this is for

What we're building on it

How this connects to the board