calibration

Track record

Brier score by category and track — editorial (LLM-based reading) and markets (prediction-market-derived). Lower is better: 0.00 is perfect, 0.25 is a coin-flip, 1.00 is maximally wrong. Includes only resolved forecasts from active categories.

loading…

How this is computed

Every forecast targets a category-specific event class with a 14-day horizon. After 14 days the operator adjudicates whether the predicted event class actually occurred (yes/no), or marks it unresolvable if the event class was too vague to grade.

Editorial forecasts come from a model reading the news pool — scored 0–100, the probability used for Brier is score / 100.

Markets forecasts come from prediction-market data — the probability used for Brier is the liquidity-weighted average of matched markets at the time the forecast was snapshotted (already in [0, 1]; no scale conversion).

For each resolved forecast, Brier = (p − o)² where p is the forecast probability and o is the realized outcome (1 if yes, 0 if no). Mean Brier averages across all resolved forecasts in a category × track.

Unresolvable counts are surfaced separately so an operator can't improve their score by skipping hard cases. More on the methodology →