About This Project

Why we built it, what we assume, and how scoring works

What this site is

A leaderboard that ranks the world's best AI models by both intelligence and cost efficiency—then shows you which country is winning.

Use it to: Compare models side-by-side, find the best value for your use case, and see how the US-China AI race is unfolding in real time.

Why we built it

AI is a race—one that increasingly looks like a two-front contest between the United States and China. In a fast-moving landscape, debates tend to collapse into benchmark cherry-picking or price-only arguments. We wanted a simple, repeatable way to answer: “How strong is a model?” and “How much leverage do you get per dollar?”

This site turns that race into something you can track: who has the most capable models, who has the best economics, and who can place more models into the global Top 10.

🇺🇸 USA

Frontier capability, research leadership, and premium performance.

🇨🇳 China

Scaling, iteration speed, and cost-efficient deployment at volume.

The dashboard stays explicit about assumptions. If you disagree, you can still use the raw IQ and Value components directly.

Scoring model

Unified Score (0–1000)

Capability-focused (90%) with value efficiency (10%): participation-weighted, benchmark-normalized capability blended with cost efficiency, then scaled ×10.

Unified = 10 × (0.9 × norm(AvgIQ) + 0.1 × norm(Value))
  • Per-benchmark normalization: Each benchmark is min–max normalized (0–100) across the cohort; benchmarks with only one participant are skipped.
  • Participation weighting: Benchmarks are weighted by participation fraction (bench count ÷ max bench count).
  • Value: AvgIQ / (Input $/M + Output $/M); both AvgIQ and Value are cohort-normalized to 0–100 before blending.

National score (USA vs China)

National totals sum the Unified Scores for models from each nation in the combined leaderboard (sorted by Unified). This rewards both peak performance and depth of high-performing models.

Eligibility rule
Only models that appear in the global Top 10 contribute to a country’s national total.

Key assumptions

1) Intelligence is non-negotiable

Cheap output is not “value” if the model cannot solve hard problems.

2) Value is a multiplier, not a substitute

A strong model becomes more powerful when it’s also affordable to deploy at scale.

3) Benchmarks are normalized and participation-weighted

Each benchmark is min–max normalized; single-participation benchmarks are skipped; weights follow participation fraction.

4) Pricing uses total cost

Value uses total cost (Input + Output) with linear normalization inside the Unified blend.

5) “All” views are for context

The headline national scores are based on the Top 10 filter, by design.

6) This is a snapshot, not a truth oracle

Benchmarks and pricing shift. Treat rankings as a periodic audit, not a permanent leaderboard.

Benchmarks included

Only benchmarks with 2+ model participation are included. All 18 active benchmarks (by participation count):

CodeArena (20) GPQA (17) AIME 2025 (16) SWE-Bench Verified (16) TerminalBench (10) HLE (9) MMLU (8) BrowseComp (8) ARC-AGI v2 (6) Toolathlon (5) TAU2 Retail (5) CharXiv-R (3) MMMU-Pro (3) ScreenSpot Pro (3) SimpleQA (3) MCP Atlas (3) OSWorld (2) MMMU (2)

Data notes

Scores and pricing are sourced from each model's linked reference pages. Vendor pricing can change quickly; this is why the dashboard is presented as an audit at a point in time.

Not affiliated with any model provider. No guarantees are made about correctness, completeness, or fitness for any purpose.