Model performance

Walk-forward backtest: each match is predicted from ratings as they stood before it, then the ratings update. Out-of-sample from 2015, both players with ≥20 prior matches. 102,388 matches scored.

Brier score

0.2147

lower is better · 0.25 = coin flip

Accuracy

65.3%

favourite called correctly

Log loss

0.6174

lower is better

By model

Model	Brier	Log loss	Accuracy
Ensemble (shipped)	0.2147	0.6174	65.3%
Glicko-only	0.2148	0.6177	65.4%
Elo-only	0.2230	0.6362	63.4%
Surface-blended	0.2207	0.6309	64.1%
Baseline (coin flip)	0.2500	0.6931	49.8%

Tuned ensemble: 90% Glicko / 10% Elo (temperature 1.8).

Brier by surface

Hard 0.2131 (n=58,443)
Clay 0.2163 (n=37,166)
Grass 0.2204 (n=6,779)

Calibration

When the model says X%, players actually win ≈X%.

16%

25%

35%

45%

44%

55%

53%

65%

63%

75%

73%

84%

83%

93%

92%

Updated 2026-06-07. Singles only; ELO + Glicko-2 ensemble.