Total Score
0.0
Avg IQ Index 0.0
Avg Value Index 0.0

Total Score
0.0
Avg IQ Index 0.0
Avg Value Index 0.0

Leaderboard

Benchmarks Included

Intelligence vs. Value Matrix

Visualizing the trade-off between raw capability (IQ Index) and cost-efficiency (Value Index). The ideal model sits in the top-right quadrant.

USA Models
China Models
β€’ Size = Unified Score
πŸ’Ž High IQ, Low Value

Premium frontier models

πŸ† High IQ, High Value

The sweet spot: Smart AND affordable

Company Leaderboard

Which companies are leading based on their models in the Global Top 10? Scores are summed from each company's models that made the cut.

Scoring Methodology

Unified Power Score (0–1000)

Weighted blend emphasizing capability (90%) over cost efficiency (10%).

Unified = 10 Γ— (0.9 Γ— norm(AvgIQ) + 0.1 Γ— norm(Value))

AvgIQ and Value are normalized within the cohort so every run is self-balancing.

AvgIQ (two-pass)

Pass 1 uses a participation-weighted average across every benchmark to pick the Initial Top 10. Pass 2 keeps only benchmarks reported by at least 8 of those 10 models (the qualified set) and flat-averages them for the entire cohort.

This stops a model from being penalised for skipping an easy-win benchmark everyone else posted β€” we compare models only on tests the top tier actually shares.

Percentage benchmarks (GPQA, MMMU-Pro, etc.) use their raw 0–100 score directly. Non-percentage benchmarks with a known range (CodeArena Elo 1000–2000) use that range. Anything else falls back to cohort min–max. Category aggregates (Reasoning, Math, Coding, …) are excluded to prevent double-counting.

Value (efficiency)

Value = AvgIQ Γ· (Input $/M + Output $/M). Lower cost with higher capability increases Value.

Value is also normalized across the cohort before contributing to Unified.

National Score

For USA and China, we sum the Unified scores of that nation’s models inside the global Top 10.

This rewards both peak performance and depth in the top of the table.

Not affiliated with any model provider. No guarantees are made about correctness, completeness, or fitness for any purpose.