Calibration & Track Record

Reliability, Brier decomposition, track record, and AI-vs-benchmark performance
Window
Category
Platform
Why the VNX ensemble — selective by design
Overall Brier
Lower is better
Reliability
↓ lower is better
Resolution
↑ higher is better
Mean CLV
Signal-time proxy
Predictions
Resolved
Brier vs Market — Pooled (Methodology / Transparency)
Transparency, not the headline. This pools every prediction — including the hedged band where the ensemble deliberately abstains — so it understates directional edge. The confident-band read is the value story (see top of page).
By Confidence — where the AI commits vs hedges
By Horizon — calibration across resolution timescales
Platform Calibration — AIA Forecaster Reliability
Brier Decomposition (Murphy)
Brier decomposition requires predictions spanning multiple probability bins across resolved markets. Data accumulating from AIA Forecaster resolved predictions.
Wealth-Replay — third-Kelly bankroll over the calibration window
Le 4-Component Decomp — where the residual variance lives
Skill vs Luck — bootstrap decomposition of track-record edge
Worst CallsAll-time worst calls (date filter not applied)
MarketPredicted %ActualMistake TypeSeverity
No resolved predictions with significant error