About This Project

Why we built it, what we assume, and how scoring works

What this site is

A leaderboard that ranks the world's best AI models by both intelligence and cost efficiency—then shows you which country is winning.

Use it to: Compare models side-by-side, find the best value for your use case, and see how the US-China AI race is unfolding in real time.

Why we built it

AI is a race—one that increasingly looks like a two-front contest between the United States and China. In a fast-moving landscape, debates tend to collapse into benchmark cherry-picking or price-only arguments. We wanted a simple, repeatable way to answer: “How strong is a model?” and “How much leverage do you get per dollar?”

This site turns that race into something you can track: who has the most capable models, who has the best economics, and who can place more models into the global Top 10.

🇺🇸 USA

Frontier capability, research leadership, and premium performance.

🇨🇳 China

Scaling, iteration speed, and cost-efficient deployment at volume.

The dashboard stays explicit about assumptions. If you disagree, you can still use the raw IQ and Value components directly.

Scoring model

Unified Score (0–1000)

Capability-focused (90%) with value efficiency (10%). Capability is computed via a two-pass scoring pipeline so that models aren’t penalised for skipping benchmarks their peers chose to publish.

Unified = 10 × (0.9 × norm(AvgIQ) + 0.1 × norm(Value))
  • Pass 1 — Initial Top 10: participation-weighted average across every benchmark to identify the leading 10 models in the cohort.
  • Pass 2 — Qualified rescoring: the qualified set is benchmarks reported by ≥ 8 of the Initial Top 10. Every model in the cohort is then rescored as a flat average (not participation-weighted) over the qualified set.
  • Per-benchmark normalization: Percentage benchmarks (xx.x%) use their raw 0–100 score directly. Non-percentage benchmarks with a documented range (e.g. CodeArena Elo 1000–2000) use that range. Anything else falls back to cohort min–max. Category aggregates (Reasoning, Math, Coding, …) are excluded entirely.
  • Value: AvgIQ / (Input $/M + Output $/M); both AvgIQ and Value are cohort-normalized to 0–100 before blending into Unified.

National score (USA vs China)

National totals sum the Unified Scores for models from each nation in the combined leaderboard (sorted by Unified). This rewards both peak performance and depth of high-performing models.

Eligibility rule
Only models that appear in the global Top 10 contribute to a country’s national total.

Key assumptions

1) Intelligence is non-negotiable

Cheap output is not “value” if the model cannot solve hard problems.

2) Value is a multiplier, not a substitute

A strong model becomes more powerful when it’s also affordable to deploy at scale.

3) Two-pass scoring keeps comparisons honest

Pass 1 picks the Initial Top 10 with participation weighting; Pass 2 then rescores every model as a flat average over only the benchmarks ≥ 8 of those 10 actually reported. Sparse benchmarks (fewer than 4 cohort models reporting) are dropped entirely.

4) Pricing uses total cost

Value uses total cost (Input + Output) with linear normalization inside the Unified blend.

5) “All” views are for context

The headline national scores are based on the Top 10 filter, by design.

6) This is a snapshot, not a truth oracle

Benchmarks and pricing shift. Treat rankings as a periodic audit, not a permanent leaderboard.

Benchmarks included

Only benchmarks with 2+ model participation are included. All 18 active benchmarks (by participation count):

CodeArena (20) GPQA (17) AIME 2025 (16) SWE-Bench Verified (16) TerminalBench (10) HLE (9) MMLU (8) BrowseComp (8) ARC-AGI v2 (6) Toolathlon (5) TAU2 Retail (5) CharXiv-R (3) MMMU-Pro (3) ScreenSpot Pro (3) SimpleQA (3) MCP Atlas (3) OSWorld (2) MMMU (2)

Data notes

Scores and pricing are sourced from each model's linked reference pages. Vendor pricing can change quickly; this is why the dashboard is presented as an audit at a point in time.

Not affiliated with any model provider. No guarantees are made about correctness, completeness, or fitness for any purpose.