Visualizing the trade-off between raw capability (IQ Index) and cost-efficiency (Value Index). The ideal model sits in the top-right quadrant.
Premium frontier models
The sweet spot: Smart AND affordable
Which companies are leading based on their models in the Global Top 10? Scores are summed from each company's models that made the cut.
Weighted blend emphasizing capability (90%) over cost efficiency (10%).
AvgIQ and Value are normalized within the cohort so every run is self-balancing.
Pass 1 uses a participation-weighted average across every benchmark to pick the Initial Top 10. Pass 2 keeps only benchmarks reported by at least 8 of those 10 models (the qualified set) and flat-averages them for the entire cohort.
This stops a model from being penalised for skipping an easy-win benchmark everyone else posted β we compare models only on tests the top tier actually shares.
Percentage benchmarks (GPQA, MMMU-Pro, etc.) use their raw 0β100 score directly. Non-percentage benchmarks with a known range (CodeArena Elo 1000β2000) use that range. Anything else falls back to cohort minβmax. Category aggregates (Reasoning, Math, Coding, β¦) are excluded to prevent double-counting.
Value = AvgIQ Γ· (Input $/M + Output $/M). Lower cost with higher capability increases Value.
Value is also normalized across the cohort before contributing to Unified.
For USA and China, we sum the Unified scores of that nationβs models inside the global Top 10.
This rewards both peak performance and depth in the top of the table.
Not affiliated with any model provider. No guarantees are made about correctness, completeness, or fitness for any purpose.