1.1 What it is
Ranks the operator, not the model — four integers in, full ledger out.
SigRank is the operator leaderboard for AI — it ranks the operator, not the model, by the architecture of their token cascade. The console is the field at a glance: every operator, every metric, scored from four raw integers and ranked live.
Volume is noise. Yield is signal. The same four token counts reveal whether you compound signal or burn it — and whether you're a Burner, a Builder, or a 10×er.
What the signature is — and isn't. SigRank measures the token-cascade signature honestly: a real coordinate of how an operator works the tools — leverage, efficiency, the shape of their cascade. It is not a verdict on the quality of the work itself, and it doesn't claim to be. Read it as one signal, set beside the operator's actual work — together they say more than either does alone.
Run the local agent (MCP / CLI)
The SigRank agent reads your local session logs across 15 AI coding platforms (Claude Code, Codex, Gemini CLI, Copilot CLI, Amp, Goose, Kilo, and more) and counts the four token pillars per window — zero paste, token counts only, never your prompt content. One command:
# install once (recommended) npm install -g sigrank # …or run with no install npx sigrank
npx sigrank opens your dashboard — the cascade across every detected platform and window, the 5-source token comparison, and your board position. Submitting to the board is managed from your profile (sign in, then publish) — so your numbers land verified and stay yours. Full command + tool reference is in the local-agent page ↗.
Run numbers — paste ccusage output
Run ccusage --json in your terminal and paste the output below to see your cascade and projected rank. Real token counts extracted directly. Accepts full JSON, partial fragments, Codex exports, or four bare numbers. Calculator only — not saved to the board.
▸ ▾ Estimate without token counts (advanced)
No token counts handy? This fallback estimates a rough cascade from coarse activity proxies (sessions, turns, account age) at reduced confidence. It is an approximation only — the board ranks on the four token pillars, so run the local agent (or paste ccusage --json above) for a real read.
Paste not parsing, reader we don't support yet, or just want to talk? Reach us — we read everything.
The local agent (MCP)
The SigRank local agent is an MCP that reads your token counts straight from local session logs — 15+ platforms supported, including Claude Code, Codex, Amp, Gemini CLI, GitHub Copilot CLI, Goose, Kilo, and more — and keeps your live cascade in sync with the board and your operator profile. You never touch a number; the agent is the verifier. It counts tokens; it never reads the content of your prompts or replies.
What it does
Supported platforms
tokenpull reads local session logs from 15+ AI coding platforms. Each adapter reads that platform's own log format — you don't reconfigure anything.
“Estimated cache-write” means that platform's log format doesn't expose cache-creation tokens; the other three pillars are exact. Env-var overrides let you point any adapter at a custom log path.
Install
# install globally (recommended)
npm install -g sigrank
# or run without installing
npx sigrank
# wire into Claude Code — .mcp.json
{
"mcpServers": {
"sigrank": { "command": "npx", "args": ["sigrank"] }
}
}In a terminal it opens the TUI. Wired into your AI client it starts the MCP stdio server automatically — no extra config. Verified on Node ≥18, macOS + Linux.
CLI commands
MCP tools — callable by your AI client
When wired into Claude Code or Cursor, your AI agent can call these tools directly — no paste, no copy-out.
tokenpullOn-device read → 4-window cascade. Zero paste, token-only.tokenpull_submitRead + publish to the board in one call. Server re-scores authoritatively.tokenpull_compareAll four sources side-by-side: tokenpull / ccusage / token-dash / tokscale with delta % per pillar.rank_pasteScore a ccusage / tokscale paste locally. Returns Υ + narration card.rank_windowsScore all four windows from a dashboard paste at once.submit_pasteRank a paste AND publish it to the board in one call.submit_verifiedSign + POST the verified cascade to /api/v1/snapshots (the ranked path).enrollPaste a key from Settings → "New key" → bind this device (signed submit).get_leaderboardLive leaderboard from signalaf.com, any window.get_operatorOne operator's live profile by codename.watch_tokenpullStreaming cascade snapshot — diffs on each poll.Open by design — the cascade math is public; proprietary threshold cuts stay server-side. Canonical anchor: rank_paste reproduces MO§ES Υ 18,436.98 exactly.
How the MCP feeds your operator profile
The agent is the data pipeline between your local session logs and your public operator profile at signalaf.com. Here is the exact path, step by step.
The profile is not separate from the MCP. The MCP is the write path. Every cascade metric your profile displays — Υ Yield, SNR, Leverage, 10xDEV, class tier, per-window history — originates from a tokenpull_submit call (or a manual paste through the calculator). The profile is the read surface; the agent is the write path.
The contamination constraint (non-negotiable)
Any live observer that prompts generates the tokens it measures. We learned this directly: a memory observer that auto-prompts (low-input / high-output) inflated a real operator's output by ~25% — visible openly on the live board as the inflated-vs-clean pair (rows 2 and 3). So every SigRank instrument that touches a live session is read-only against telemetry and emits no prompt — no auto-memory, no keep-alive, no self-query. Verified-passive, or it re-contaminates every operator running it. This is a hard requirement, not a caution — and it is the moat: the instrument that doesn't disturb what it measures.
This is the same rule that governs signature drift: the agent is the live reader the drift instrument runs on, and because it never prompts, the drift it reports is the operator's own — not the observer's. A live observer that prompted would inflate the very numbers it reports; this one cannot.
The agent is how board entries become exact and live — vs. the manual paste calculator, which runs your numbers but does not save to the board or update your profile. Account + review still gate the board so it stays honest.
Token counts only — never prompt content. Verified-passive by design.
Every score starts from four raw token counts — nothing else, and never your prompt content. The whole cascade is derived from these.
Fresh prompt tokens you send — the cost of asking.
Tokens the model generates back — the work produced.
Tokens served from cache — cheap reuse of held context.
Tokens written to cache — context you build forward.
Sum of all four pillars — the raw scale of the work.
Blended USD per 1,000,000 tokens — the wallet pillar.
From the four pillars we derive the token cascade — the metrics the board actually ranks on. Token-only; no word-era proxies.
The Three Degrees of Leverage
Read it as a token cascade: Cache : Input : Output. Research pegs the average user near 7 : 2 : 1 (~2 input tokens per output, on a ~7 cache); input-normalized that's 3.5 : 1 : 0.5. We surveyed 10 power users (median ~500B total tokens) at about 22 : 1 : 0.08, output traded for cache. The top operator on the live board is 367 : 1 : 1.50: every input returns multiple outputs on a deep cache. Three degrees of leverage, each a real skill, and the distance between them learnable.
Sources: the top operator is measured live from the all-time board (auto-pulled, refreshed daily); the power-user median is a measured survey (n=10). Both derived from canonical four-pillar token telemetry. Token counts only. AA 7:2:1 is a modeled baseline from Artificial Analysis methodology (not measured; a reference floor).
| Metric | Average Users* | Power users† | Top Evals to date‡ |
|---|---|---|---|
| Υ Yield | 1.57 | 1.51 | 552.53 |
| SNR | 0.33 | 0.07 | 0.60 |
| Velocity (O/I) | 0.50 | 0.08 | 1.50 |
| Leverage (CR/I) | 3.2× | 22.3× | 367.3× |
| 10xDEV (log₁₀) | 0.50 | 1.35 | 2.56 |
| Efficiency (vs AA 4.0) | 1.00 | 5.61 | 96.01 |
| Operating Ratio (C:I:O) | 3.5 : 1 : 0.50 | 22 : 1 : 0.08 | 367 : 1 : 1.50 |
10xDEV read on the log anchor
10xDEV is an exponent, not a multiplier: each whole point is a 10× jump in real cascade amplification (linear = 10^10xDEV).
| Degree | 10xDEV | Linear amplification (10^x) |
|---|---|---|
| Average users (AA 7:2:1)* | 0.50 | 3.2× |
| Power-user median | 1.35 | 22.4× |
| Top operator to date | 2.56 | 367.3× |
- Top operator vs AA baseline: +2.06 decades = ~115× more amplification
- Top operator vs power-user median: +1.21 decades = ~16× more
10xDEV is an anchor: the telescoping identity (10^10xDEV = cache_read/input) locks the exponent to leverage, so it can't be inflated independently; it has to be earned through the full cascade. Gaining two full points is ~2 orders of magnitude of real amplification, which is why it moves slowly and means a lot.
Sources & provenance
Top operator to date · measured
The top real operator on the live SigRank board so far (MO§ES™ — the owner's own observer-stripped run). Source: signalaf.com/compare (retrieved 2026-06-27, canonical board compute). Derived from canonical four-pillar token telemetry (input / output / cache_create / cache_read). Token counts only, no prompt content.
Power-user median (n=10) · measured
Operators #5–14 of the live board (public tokscale.ai footprints surfaced on SigRank). Median, not mean: n=10 is small and right-skewed (one operator = 64% of the field's total tokens), so the median is the honest “typical operator.” Source: signalaf.com/board/30d (retrieved 2026-06-21).
Average users (AA 7:2:1) · modeled, not measured
A synthetic reference operator built from the Artificial Analysis blended-price ratio: “we calculate a blended price assuming a 7:2:1 ratio of cache hit, input, and output tokens” (Artificial Analysis methodology) (retrieved 2026-06-21). Ratio cache:input:output = 7:2:1 → input-normalized 3.5:1:0.50. The cache term was split create/read (cw=0.7, cr=6.3) to give the reference a defined cascade (a modeling choice). This row is a constructed reference, not telemetry from a real operator. Do not cite as measured.
Metric definitions
SNR = O/(I+O) · Velocity = O/I · Leverage = cache_read/I · 10xDEV = log₁₀(transmission × commitment × reuse) · Efficiency = (cache+O)/I ÷ 4.0 (AA baseline 4.0 = (7+1)/2) · Υ = (cache_read × O) / I² · Operating Ratio = cache : input=1 : output. Telescoping identity: (O/I)(C_create/O)(C_read/C_create) = cache_read/input, so 10^10xDEV = Leverage.
* Modeled baseline: synthesized from the AA 7:2:1 ratio, not measured. The cache create/read split is a modeling assumption. Treat as a reference floor, not a real operator's telemetry.
‡ Top operator to date: the gold column is the highest real operator measured on the live board so far (MO§ES™, the owner). The claude-mem memory observer (an MCP that auto-prompts memory, low-input/high-output) inflated the raw owner row by ~25% of output; the figure shown here is the observer-stripped read. The inflated vs clean pair is shown openly on the live board, the instrument measuring its own contamination and removing it.
All signal is monitored. All drift is noted. · SigRank · MO§ES™ · Ello Cello LLC · Token counts only, never prompt content.
Verification & Integrity Tests
SigRank ranks operators on token telemetry. The obvious question: how do you know the numbers aren't fabricated, gamed, or bot-generated? Every result here comes from a real run on real data — and where a test failed its first form, we show that too, because a test that can't fail isn't a test.
Why these tests exist
The cascade thesis says operator token usage is a multiplicative process — each stage compounds on the last. Multiplicative processes leave statistical fingerprints that fabricated or mechanical data don't reproduce. To fake a high rank, a forger would have to simultaneously fake the right first-digit distribution, the right internal arithmetic, the right concentration, and the right human activity schedule — in one self-consistent file. Each test closes one of those escape routes.
Test 1 — Benford's Law (first-digit conformity)
If session totals come from a genuine multiplicative work process, their leading digits should follow Benford's Law — P(first digit = d) = log₁₀(1 + 1/d). The theory was never fitted to digits; it predicts this as a side effect. Pre-registered kill condition (declared before seeing data): Nigrini MAD > 0.015 = nonconformity.
First result — the registered prediction FAILED:
| Set | n | MAD | Verdict |
|---|---|---|---|
| All agents | 544 | 0.01604 | NONCONFORM |
| Claude only | 487 | 0.01896 | NONCONFORM |
| Codex only | 51 | 0.01793 | NONCONFORM |
Raw session totals did not conform. We report this plainly — the first prediction was falsified. But the failure was diagnostic: digit 1 was under-represented, 5 and 9 over-represented — the textbook signature of lower-bound truncation. The cause is mechanical: every coding session begins with ~20–23k tokens of cached system prompt — an additive constant on top of the multiplicative process — which starves the leading-1 bucket and breaks Benford.
The fix confirmed the mechanism:
| Approach | n | MAD | Verdict |
|---|---|---|---|
| All sessions (raw) | 544 | 0.01604 | NONCONFORM |
| Sessions > 10× floor | 269 | 0.03193 | NONCONFORM |
| Floor-subtracted (value − 22k) | 532 | 0.01109 | ACCEPTABLE |
Subtracting the measured floor — removing the additive constant and leaving the multiplicative remainder — recovers conformity. Subsetting does not fix it; subtraction does. Synthetic simulation reproduced the whole story (pure multiplicative conforms at 0.00974; +22k floor breaks it to 0.03253, matching the data; floor-subtracted recovers to 0.00787). The mechanism reproduces in synthesis — it's not a story told after the fact.
The defensible claim: the multiplicative cascade is Benford-conforming once the measured additive system-prompt floor is removed. The raw version is falsified and we say so; the floor-corrected version holds and is mechanistically motivated — a stronger result than naive conformity. The test had teeth, fired, and revealed a real artifact (the floor) that is now itself a tracked quantity.
Test 2 — the bot control (Hermes)
A natural-conformity claim is only meaningful if something fails it. Among the sessions was a set of 5 automated probe runs (“hermes”): totals 4208, 4152, 4115, 4222, 4258. Every first digit was 4. Zero digit diversity — a fixed-size mechanical probe, exactly the non-Benford signature a bot produces. This is the control that gives Test 1 meaning: the method distinguishes a multiplicative human process from a constant-size machine process.
Test 3 — the telescoping identity (internal-consistency lock)
The cascade has three stages — transmission (O/I), commitment (Create/O), and reuse (Read/Create). Their product must equal cache_read/input exactly, because the intermediate terms cancel:
(O/I) × (Create/O) × (Read/Create) = Read/Input
So 10^(10xDEV) = Leverage, by identity — not by fit. An operator cannot inflate their amplification exponent independently of their leverage; the two are bound by algebra. A fabricated row with a high 10xDEV but the wrong Read/Input ratio fails the identity and is detectable. We recompute this on every operator from the raw four pillars; it holds for every legitimate row.
Test 4 — content-free verification (the privacy license)
A separate experiment (EXP-007) established that conserved structure is detectable without reading content: across negation-paraphrase pairs, surface overlap was zero (Jaccard 0.00) while semantic equivalence was complete (NLI 1.00) — “You must not smoke” and “No smoking” converge to one kernel. The consequence: a statistical witness (token counts) is a legitimate instrument for a conservation-driven process. The no-content-access design is not a privacy compromise we tolerate — it is the architecture this result predicts. We rank the four integers; we never see what you typed.
Test 5 — the threat model (failure taxonomy → countermeasures)
| Gaming attempt | Countermeasure |
|---|---|
| Score inflation / single-metric overclaim | Composite scoring; no single metric escalates rank |
| Fake convergence on pre-processed numbers | Server recomputes everything from the RAW payload |
| High leverage with inverted meaning (idle re-read) | Convergence + concentration-band check |
| Merging metrics to blur a weak one | Components stay separately binding |
What's still being hardened (stated honestly)
- Cadence (Test 6, in development): human activity is bursty with heavy tails (Barabási, Nature 435, 207, 2005) and carries 1/f timing noise (Gilden, Science, 1995); machines are periodic or Poisson. Session timestamps already carry the data for a timing-domain humanity test. Not yet deployed.
- Data provenance note: the Benford figures above were computed on a 544-session sample transcribed by hand from session JSON. They are real and reproducible from that sample, but canonical published numbers should be regenerated from source telemetry. We flag this rather than hide it.
Sources
- Benford's Law: Nigrini, M. (2012), Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection.
- Human burst dynamics: Barabási, A.-L. (2005), “The origin of bursts and heavy tails in human dynamics,” Nature 435, 207.
- 1/f cognitive noise: Gilden, D. et al. (1995), Science 267, 1837.
- AA pricing baseline (7:2:1): Artificial Analysis, Language Model Benchmarking Methodology.
- All token-telemetry results: computed from canonical four-pillar session data. Methods and scripts are reproducible; raw transcripts are not published (privacy).
Token counts only — never prompt content. Tests are run, not asserted.
Signature Drift — the tune meter
Every operator has a signature: the characteristic shape of their token cascade — the proportions between output, cache-write, and cache-read, anchored to input. Signature drift measures how far a stretch of work has moved from that shape. Zero drift = locked in tune; rising drift = the cascade is desyncing from the operator's own calibrated peak. It's a measure of shape, not magnitude — the same log family as 10xDEV.
Shape, not size
Drift is computed in log-space on purpose. Working twice as hard across the board (every axis doubled) is still in tune — the shape is unchanged — so it reads as zero drift. Going off on a single axis (lots of cache-write, no reuse) breaks the shape and reads as real drift. Naive similarity measures get this backwards: one axis can dominate the vector and mask a badly desynced cascade as a false “high.” The log-shape read fixes that — no single component can dominate, and being 0.5× or 2× off counts equally. The exact formulation, thresholds, and per-operator calibration are SigRank proprietary internals.
Where it runs — three time-scales
One drift number per session — a live session-level “tune meter,” updating as the session total moves. Contamination-free, no live hook required.
Drift on the change between successive polls — a rolling time-series. This is where drift becomes a cadence instrument (the timing / burstiness layer). Flagged as a windowed estimate.
True per-turn drift — a UX tune-meter. Requires per-turn granularity the source must expose, and only ever as a strictly passive reader.
Sequence: session drift (now, safe) → window-delta drift (the cadence research) → true per-turn micro (passive-only, gated on granularity).
The contamination constraint (non-negotiable)
Any live observer that prompts generates the tokens it measures. We learned this directly: a memory observer that auto-prompts (low-input / high-output) inflated a real operator's output by ~25% — visible openly on the live board as the inflated-vs-clean pair (rows 2 and 3). So every SigRank instrument that touches a live session is read-only against telemetry and emits no prompt — no auto-memory, no keep-alive, no self-query. Verified-passive, or it re-contaminates every operator running it. This is a hard requirement, not a caution — and it is the moat: the instrument that doesn't disturb what it measures.
The drift instrument and the SigRank local agent are governed by the same rule — see the local agent (MCP). The agent is how drift is read live, and the constraint is why it can be trusted.
All signal is monitored. All drift is noted. · Token counts only — never prompt content.
Measured alongside
SigRank doesn't measure tokens in a vacuum — it builds on a small ecosystem of token-usage tools, and ranks the architecture of the cascade on top of what they count. Credit where it's due:
Independent project — not affiliated with or endorsed by the tools above; names belong to their authors. Token counts only.
Read the full write-up on the index-refinement page ↗.
The nine classes
still calibratingNine cascade classes from Transmitter down. The ranges shown are qualitative cuts — exact breakpoints are still calibrating as the leaderboard fills, so class assignments may shift.
Nine classes. One ladder.
Your class is identity. Your rank is position. Class is assigned from your cascade SNR and 10xDEV — whichever is more restrictive wins. TRANSMITTER is rare on purpose.
SigRank runs on MO§ES™ — the Modus Operandi §ignal Scaling Expansion System. It's a governance framework that came out of a published conservation law for language. This section covers where it came from, what the law says, what the evidence shows, how governance works inside it, who it's for, and what we're building on top of it.
Where this came from
The founder studied sociology and history at SUNY Geneseo, UB, and University of Hawaii at Hilo. Ran Pacific Northwest operations for Invisible Children. Held board seats at KEDS (2006–2008) and Horizon Health Services (2012–2018). Different world, but the same question underneath: how do you keep commitment intact when it passes through a lot of hands?
Then DJMP Inc. — a Buffalo contracting operation, started in 2011, taken from zero to $1M/yr with a team of 40. Projects ranging $10k–$500k. Real operations, real governance, real consequences when things drift.
Running governed AI across that operation, something was missing. The leaderboards measured the models. Nobody measured the operator — the person actually steering the AI, making the calls, deciding what to keep and what to cut. The augmentation layer was invisible. So the founder built a way to measure it, found a conservation law underneath it, published the law, patented the enforcement architecture, and ran it against the field. That's where SigRank and MO§ES™ came from — not from a market thesis, from an operational gap.
The Conservation Law of Commitment
C(T(S)) ≈ C(S) with enforcement; C(T(S)) < C(S) without it.
In plain terms: when you transform a piece of language — compress it, translate it, summarize it, rewrite it — the commitment content (the obligations, prohibitions, and modal constraints: “shall,” “must not,” “unless,” “is entitled to”) either survives or it doesn't. With an enforcement gate in the transformation pipeline, it survives. Without one, it decays. This isn't a guideline or a best practice — it's a measurable property of language under compression, and it's falsifiable.
The law is published under CC-BY-4.0 (DOI: 10.5281/zenodo.20029607). The enforcement architecture (MO§ES™) is patent-pending. The law itself is open.
What the evidence shows
Seven experiments (EXP-001 through EXP-007) tested the law on a 20-signal canonical corpus, running 10 recursive iterations each, using bidirectional NLI entailment and Jaccard surface stability as oracles. Three results worth pulling out:
- EXP-003: 13 of 20 signals held NLI bidirectional entailment = 1.00 across all 10 iterations under the gate. That's invariance under recursion, not a tautology.
- EXP-006: Only 2 of 4 paper claims survived self-referential recursion. The harness fails when commitment structure isn't robust — which is the point. The law is falsifiable and the experiments can break it.
- EXP-007: An NP-negation probe separated semantic commitment from lexical surface form. Jaccard degraded while NLI held — the commitment survived even when the surface words changed.
Separately, a 5-phase architecture stress test measured 80–85% structural coherence across a four-module system. Standard probability says four modules at 80% standalone viability should produce ~41% series-system viability (0.8×0.8×0.8×0.8 = 0.4096). The governance layer inverted that.
Full experimental record: DOI: 10.5281/zenodo.19105225
Governance in the action path
Most approaches to AI governance sit outside the model — firewalls that drop packets after the logic has already corrupted, sandboxes that box an agent that still hallucinates inside the box, post-hoc audits that tell you how you were breached after the damage is done. Every one of them patches after the fact.
MO§ES™ doesn't build a better cage. It governs from inside the loop — in the execution path, not before it, not after it. The enforcement gate sits where the transformation happens. Commitment that passes through the gate survives. Commitment that doesn't, doesn't. The conservation law is what makes this a property of the system rather than a policy someone has to remember to follow.
The practical difference: a violation doesn't kill the workflow. The loop realigns to its original parameters and steers back. The agent stays fluid, bound to intent, instead of dying in a dead end or looping indefinitely inside a sandbox that only secures the perimeter.
Who this is for
SigRank measures the operator, not the model. If you're the person steering the AI — deciding what to keep, what to cut, what to ask next — the board is about you. A few groups who get something specific out of it:
- Builders and developers — see how your AI-assisted workflow actually performs. The token cascade shows whether you're burning tokens or compounding them. Compare against the field instead of guessing.
- Creators and writers — measure augmentation efficiency, not just output volume. The cascade reveals whether the AI is helping you think or just generating text. The four pillars separate signal from noise.
- Students and researchers — benchmark your AI collaboration patterns against established operators. See what efficient operator-AI interaction looks like, with real numbers behind it.
- Enterprise teams — quantify operator effectiveness for hiring, training, and tooling decisions. The board is an objective surface, not a self-reported one. Signed snapshots mean the numbers are verifiable.
In the SIGNOMY layer, agents carry provenance and build trust. SigRank is where that provenance starts — the operator's measured record becomes portable.
What we're building on it
The law is the substrate. MO§ES™ is the enforcement architecture. On top of that, a stack of products — each one a different surface for the same gate:
- AQUA — application workflow tooling with reusable submission memory. Answer banks, submission memory, application filling. The workflow layer and the first wedge.
- SigRank — the leaderboard you're looking at. AI operator efficiency, measured by token cascade, verified by signed snapshots. The intelligence layer.
- KA§§A — voice AI runtime that uses commitment kernel caching to cut redundant NLU work in multi-agent flows. In practice: 50s → 6.5s per 5-turn call.
- SIGNOMY (signomy.xyz) — a governed agent marketplace where agents register, build trust, take missions, and carry provenance. The marketplace becomes a constitutional economy rather than a listing board. The top layer — where everything below it becomes operational behavior.
The goal is straightforward: make governance a property of the execution path, not a policy document someone reads once. If the law holds — and the evidence says it does — then commitment survives transformation when the gate is present. Every product above is a different surface for the same gate.
How this connects to the board
Every snapshot on the SigRank leaderboard is ed25519-signed on the operator's device and verified server-side. Token counts only — no message content is ever read or stored. The commitment being conserved is the integrity of the measurement itself: what the operator measured is what the board records, with no drift in between.
The governance layer maps directly to how the board works: signed operator identity, token counts only, leaderboard ranking, platform-agnostic collection, ed25519 verification, and a public board with open data. The leaderboard works because the data passed through the gate — not because someone reviewed it after the fact.
More at mos2es.com · benchmarks