Model performance
Walk-forward backtest: each match is predicted from ratings as they stood before it, then the ratings update. Out-of-sample from 2015, both players with ≥20 prior matches. 102,388 matches scored.
Brier score
0.2147
lower is better · 0.25 = coin flip
Accuracy
65.3%
favourite called correctly
Log loss
0.6174
lower is better
By model
| Model | Brier | Log loss | Accuracy |
|---|---|---|---|
| Ensemble (shipped) | 0.2147 | 0.6174 | 65.3% |
| Glicko-only | 0.2148 | 0.6177 | 65.4% |
| Elo-only | 0.2230 | 0.6362 | 63.4% |
| Surface-blended | 0.2207 | 0.6309 | 64.1% |
| Baseline (coin flip) | 0.2500 | 0.6931 | 49.8% |
Tuned ensemble: 90% Glicko / 10% Elo (temperature 1.8).
Brier by surface
- Hard 0.2131 (n=58,443)
- Clay 0.2163 (n=37,166)
- Grass 0.2204 (n=6,779)
Calibration
When the model says X%, players actually win ≈X%.
7% 6%
16% 16%
25% 25%
35% 35%
45% 44%
55% 53%
65% 63%
75% 73%
84% 83%
93% 92%
Updated 2026-06-07. Singles only; ELO + Glicko-2 ensemble.