Calibration & Track Record

Reliability, Brier decomposition, track record, and AI-vs-benchmark performance

Window

Category

Platform

Why the VNX ensemble — selective by design

Overall Brier◎

Lower is better

Reliability◉

↓ lower is better

Resolution◉

↑ higher is better

Mean CLV✦

Signal-time proxy

Predictions▤

Resolved

Brier vs Market — Pooled (Methodology / Transparency)

Transparency, not the headline. This pools every prediction — including the hedged band where the ensemble deliberately abstains — so it understates directional edge. The confident-band read is the value story (see top of page).

By Confidence — where the AI commits vs hedges

By Horizon — calibration across resolution timescales

Platform Calibration — AIA Forecaster Reliability

Brier Decomposition (Murphy)

Brier decomposition requires predictions spanning multiple probability bins across resolved markets. Data accumulating from AIA Forecaster resolved predictions.

Wealth-Replay — third-Kelly bankroll over the calibration window

Le 4-Component Decomp — where the residual variance lives

Skill vs Luck — bootstrap decomposition of track-record edge

Worst CallsAll-time worst calls (date filter not applied)

Market	Predicted %	Actual	Mistake Type	Severity
No resolved predictions with significant error