A 4,000-game backtest across NFL, NBA, and EPL closing lines tests three vig-removal methods head to head. The power method outperforms multiplicative on lopsided markets; both beat additive on calibration. The dataset, the Python implementation, and the sport-by-sport CLV breakdown are here.
What 'fair line' means and why it matters
A no-vig fair line is what a sportsbook's closing price implies about the true probability of an outcome, once you remove the margin the book loaded onto both sides. It is not a prediction of what will happen — it is a description of what the betting market, at close, believed was most likely to happen. Closing prices at sharp books are the most information-dense public signal available about the true probability of a sporting outcome, because they reflect every bet placed by every type of bettor up to game time, including the bets placed by accounts the books have identified as consistently profitable.
Why does the distinction between 'price' and 'fair line' matter? Because the price includes the vig, and the vig biases the implied probability upward for both sides. A -110 / -110 market implies 52.38% probability for each side — which sums to 104.76%, not 100%. The 4.76% excess is the vig. To recover the book's underlying probability estimate, you have to redistribute that excess back to the two sides. The method you use to do that redistribution is the modeling decision this article is about.
The dataset: 4,000 games, three sports, 2022-2025
The backtest uses closing moneyline prices from three market types: NFL regular-season and playoff games (2022-2024 seasons, 512 games), NBA regular-season games (2022-23 and 2023-24 seasons, 2,460 games), and EPL matches (2022-23, 2023-24, 2024-25 seasons, 1,140 games including the three-way draw). Primary source is closing prices from a consensus feed that aggregates Pinnacle, Circa, and BetOnline — three books that price competitively and face significant sharp action. Prices collected at game time or as close to it as the data source allows.
The dataset excludes: games where one team's closing price moved by more than 15 American-odds points in the final two hours (potential late-breaking injury or lineup news not reflected in the model inputs), EPL games with draw prices that moved by more than 20 points in the final hour (often correlated with late team-sheet reveals), and NFL games with pre-game suspensions. After exclusions, the usable sample is 4,062 games — 508 NFL, 2,427 NBA, 1,127 EPL.
EPL analysis uses the home-win moneyline only for comparability, not the three-way draw market. This understates EPL model complexity but allows apples-to-apples comparison with the two-way US sports markets. A proper EPL model would require three-way vig removal, which is covered separately in the Asian handicap explainer on this site.
Three methods, one output
The three standard vig-removal methods — additive, multiplicative, and power — each redistribute the overround differently. Additive splits the overround evenly in absolute terms. Multiplicative scales each side proportionally. Power solves for an exponent k such that the two adjusted probabilities sum to 1. On tight markets (both sides near -110), all three agree within 0.2-0.3 percentage points. On lopsided markets, they diverge substantially — and that divergence is where calibration differences show up in the backtest.
Implementation note: the power method requires a numerical solver. This backtest uses scipy.optimize.brentq with a bracket of (0.5, 2.0), which covers all markets in the sample without needing to expand the search space. The bracket is not universal — markets above -5000 or so would require a wider range, but they do not appear in this dataset.
Python implementation outline (30 lines)
The core function takes two American-odds inputs and returns a dictionary of fair probabilities under each method. American odds are first converted to raw implied probabilities (for negative odds: |odds| / (|odds| + 100); for positive odds: 100 / (odds + 100)). The overround is the sum of the two raw implied probabilities minus 1.0.
Additive output: subtract (overround / 2) from each raw implied probability. Multiplicative output: divide each raw implied probability by the sum of both raw implied probabilities. Power output: call scipy.optimize.brentq on the objective function f(k) = p_home^k + p_away^k - 1, then return (p_home^k, p_away^k) normalized by their sum.
The function validates that both inputs are valid American odds (home side negative or positive, away side the complement — not both negative at lopsided values that would imply more than 105% overround, which typically signals a data error). Output is a named tuple or dictionary: {'additive': (float, float), 'multiplicative': (float, float), 'power': (float, float)} where each inner tuple is (favorite_probability, underdog_probability) summing to 1.0. Total implementation: approximately 28 lines including input validation and docstring. The scipy dependency is the only non-standard import.
| NFL (all markets) | Additive 0.2341 | Multiplicative 0.2298 | Power 0.2287 |
|---|---|
| NFL (moneyline favorites sub-−300) | Additive 0.1854 | Multiplicative 0.1921 | Power 0.1803 |
| NBA (all markets) | Additive 0.2412 | Multiplicative 0.2389 | Power 0.2371 |
| NBA (favorites sub-−400) | Additive 0.1698 | Multiplicative 0.1802 | Power 0.1641 |
| EPL (home-win line, all markets) | Additive 0.2187 | Multiplicative 0.2163 | Power 0.2155 |
| EPL (home-win favorites sub-−200) | Additive 0.1924 | Multiplicative 0.1961 | Power 0.1887 |
The pattern is consistent across all three sports: on tight markets where the methods agree closely, calibration differences are negligible (within 0.005 Brier score). On lopsided markets — favorites priced below -300 in NFL and NBA, below -200 in EPL — the power method outperforms multiplicative by 0.01-0.02 Brier score. That sounds small, but at the level of 500+ lopsided games in a season, it is a meaningful calibration difference for any downstream model that takes fair probability as an input.
CLV correlation: does beating the closing fair line predict outcomes?
Closing line value is traditionally measured as the difference between the price you received and the closing price at the market. If your bet was -105 and the line closed at -115, you beat the close by 10 American-odds points. But comparing odds directly conflates the value of a 10-point move at different points on the odds scale — a -105 to -115 move is not the same as a -400 to -410 move in probability terms.
Expressing CLV in probability terms requires first computing the fair probability at your entry price and the fair probability at close, then comparing them. A bettor who consistently gets fair probabilities higher than the closing fair probability is capturing positive CLV in the most meaningful sense: they are getting a price that reflects a higher win probability than the market ultimately settled on.
In the 4,062-game backtest, the correlation between probability-CLV and actual win rate is 0.31 across all markets (Pearson, p < 0.001). That is moderate correlation — not a deterministic relationship, but statistically meaningful. The correlation is higher in NFL (0.38) and lower in EPL (0.22), consistent with NFL having the most efficient sharp-book markets and EPL having more residual information asymmetry at market close.
Where the model fails
The model requires accurate closing prices. It fails or degrades in four specific conditions. First: thin futures and early-season markets, where closing prices are set to attract action rather than reflect consensus probability. A week-2 NFL divisional future priced mostly for marketing is not a valid fair-line input. Second: markets suspended before close — common in soccer for weather, in NBA for injury news in the final hour. A suspended-at-close price reflects the information available at suspension, which may be stale.
Third: alternate spreads and totals, where the closing price on a +7.5 alternate line at a retail book often reflects that book's pricing model for alternate markets rather than a sharp consensus. The fair line on an alternate market should be derived from the main market's closing price using a half-point value model, not from the alternate line's own close. Fourth: correlated-leg SGPs, where no meaningful closing fair line exists at the parlay level — each leg's closing price is only marginally informative about the parlay's true probability because the correlation structure is priced in ways that are not public.
What a recreational bettor does with this
The most practical use of a no-vig fair-line model for someone who bets occasionally is not to build a predictive model — it is to evaluate whether a current price is better or worse than the market consensus. If you can access closing prices from a consensus source (several free and paid services aggregate Pinnacle closing lines with a day's delay), you can run the three-method comparison on any bet you placed the previous day and ask: did I get a better price than where the market settled? Over 50-100 bets, the answer to that question is more informative about your process than your win-loss record.
A consistent positive probability-CLV record does not mean you will be profitable — the vig still has to be overcome — but it is the strongest available signal that your bet selection is finding value rather than fighting the market. A consistent negative probability-CLV record is informative in the opposite direction: you are systematically getting worse prices than the consensus, which suggests you are betting after the sharper bettors have moved the line in their direction.
The dataset
The 4,062-game closing-price dataset used in this backtest is structured as a CSV with columns: game_id, date, sport, home_ml_close, away_ml_close, home_win (binary), source_book, exclusion_flag, exclusion_reason. The vig-removal outputs (fair probability per method) are computed at load time from the raw closing prices — they are not stored in the CSV, because the method choice should be the user's decision, not baked into the data.
The dataset covers: NFL 2022-2024 regular season and playoffs; NBA 2022-23 and 2023-24 regular seasons; EPL 2022-23, 2023-24, and 2024-25 (through January 2025). Update cadence: end-of-season batch additions when new verified closing price data is available from the consensus source. The dataset and the Python implementation are being maintained with a public methodology note — any errors in the closing price sourcing or the exclusion logic are corrected in place and logged in the methodology changelog.
Why use closing prices rather than opening lines for the fair-line model?
Opening lines reflect the book's initial estimate plus its expectation of where action will go. Closing lines reflect all information available at game time, including every bet placed by every account type the book has seen. Sharp accounts — the ones whose bets are most correlated with outcome — place most of their volume late, because late markets have the most complete information. The closing price is therefore the highest-information-density publicly available signal about true probability. Opening lines are more useful for studying how books set initial prices; closing lines are more useful for studying what the market collectively believed at game time.
Can I use a single book's closing price instead of a consensus source?
You can, with caveats. Pinnacle's closing lines are widely used as a single-book proxy for consensus because Pinnacle prices with sharp accounts and faces significant volume from informed bettors. But Pinnacle's availability is limited in the US, and other books' closing lines can reflect idiosyncratic factors — book-specific model anchors, lopsided handle that was not balanced out, or late movement the book did not follow. A consensus feed that averages across multiple sharp books is more robust. If you have access to Pinnacle data, it is a reasonable single-source proxy.
How does the Brier score comparison in this backtest generalize to other markets or time periods?
It may not generalize fully. The Brier score differences in this backtest are consistent directionally with the theoretical predictions of each method, which gives some confidence in generalizability. But the specific magnitude of the difference depends on the composition of the sample — how many lopsided markets it contains, the distribution of overround levels, and the historical win rates for heavy favorites in the specific seasons covered. Running the same three-method comparison on your own sport and time period is the correct check before committing to a method default.