Early access — cascade metrics are real (derived from canonical token telemetry); the operator field is a curated seed. Learn more about the data
← Wiki

How We Got Here — Refining the Index

Before live operators populated the board, the Index had to be bracketed and stress-tested against entries we understood completely — all of them our own usage (MO§ES), measured deliberately at the extremes. These are those entries, kept here as the record of how the methodology was tuned and where it's honest about its own limits. They are no longer on the live ranking; this page is where they belong.

1. One operator, two readers — the bracket

ENTRYREADINGΥ YIELD10×DEV◌ COMPNOTE
MO§ES — ccusage readmin input (1.25M)18,436.983.310.969most-favorable reading — the ceiling
MO§ES — tokscale readmax input (partial)16.241.760.218own worst-case reading — the floor

Both rows are the same operator's same activity, counted by two different token tools. ccusage reads a small fresh-input footprint (Υ 18,437); tokscale reads a larger, partial input (Υ 16). That is a >1,000× swing from the reader alone — and it is exactly why we do not lead with the raw multiplier. Υ = cache_read × output ÷ input², so input² in the denominator makes Υ hypersensitive to a tiny input. The honest figures are the ones that survive a hostile reading:

  • The rank. On every reader and every axis, the operator is #1. The ordinal does not wobble.
  • The reader-matched 6.3×. Put everyone on the same reader (the volume-style one) and even this operator's worst input reading beats the strongest high-volume operator by 6.3×. That is the courtroom number.
  • 10×DEV. The log view tames the input² blow-up: 1.76 vs the field's best 1.50 — clearly ahead, on a sane scale. The telescoping identity locks the exponent to earned leverage, so it can't be inflated independently.

The lesson the bracket taught: put every operator on one reader. The moment the field is measured the same way, the comparison stops being arguable — and the bracket [16.24 … 18,437] itself becomes the robustness story ("best even at its own worst-case measurement").

2. With memory vs. clean — what the Index credits

ENTRYREADINGΥ YIELD10×DEV◌ COMPNOTE
static seed · ✱memWITH memory2,308.800.768context reused
static seed · cleanCLEAN (mem stripped)1,695.420.723same work, no reuse

The two static seeds are the same workload run with memory/context reuse and clean (observer stripped). The memory-on run yields ~36% higher. We staged the pair on purpose: it lets the board show the inflation gap publicly — honesty as a feature — and it confirms the cascade credits the thing that actually makes an operator efficient: reusing prior context (cache_read) instead of re-paying for it every turn.

Why they're here, not on the board

These entries did their job — they bracketed the scale, exposed its sensitivity to the reader, and confirmed the cascade credits context reuse. Leaving a seed-era self-anchor at the top of a public ranking would misrepresent the live field, so they retire into the record. The live board ranks real operators; the methodology these entries refined is the one now scoring everyone.

Υ = cache_read × output ÷ input². Pillars and yields shown are real measured outputs; the weighting that maps them to Υ is part of the ruleset and stays server-side.