Model performance

Walk-forward backtest: each match is predicted from ratings as they stood before it, then the ratings update. Out-of-sample from 2015, both players with ≥20 prior matches. 102,388 matches scored.

Brier score
0.2147
lower is better · 0.25 = coin flip
Accuracy
65.3%
favourite called correctly
Log loss
0.6174
lower is better

By model

Model Brier Log loss Accuracy
Ensemble (shipped) 0.2147 0.6174 65.3%
Glicko-only 0.2148 0.6177 65.4%
Elo-only 0.2230 0.6362 63.4%
Surface-blended 0.2207 0.6309 64.1%
Baseline (coin flip) 0.2500 0.6931 49.8%

Tuned ensemble: 90% Glicko / 10% Elo (temperature 1.8).

Brier by surface

  • Hard 0.2131 (n=58,443)
  • Clay 0.2163 (n=37,166)
  • Grass 0.2204 (n=6,779)

Calibration

When the model says X%, players actually win ≈X%.

7%
6%
16%
16%
25%
25%
35%
35%
45%
44%
55%
53%
65%
63%
75%
73%
84%
83%
93%
92%

Updated 2026-06-07. Singles only; ELO + Glicko-2 ensemble.