What 1 week of refitting taught me about Spain, Argentina, and the limits of my own model

The 2026 World Cup kicks off tomorrow. My model thinks Spain wins it. So does Opta. We agree on almost nothing else at the top.

Opta’s three favorites are Spain, France, England. Mine are Spain, Argentina, France. We both put Spain first, then we part ways immediately. Opta has England third at 11.2 percent. My model has England fifth at 6.6 and Argentina second at 14.2, a team Opta ranks below France. Two models, the same public data philosophy, a genuine disagreement about who the second-best team in the world is.

That gap is the most interesting thing I can show you, and it is worth understanding where it comes from. It comes from a month of watching the model change its mind.

Forecast Wc2026 V10

491KB ∙ PDF file

Download

The first run was confident in a way it had not earned

When I first stood the engine up, Spain led comfortably. The gap to the field at the top was 7.4 points. That felt clean. It also felt like the kind of number that should make you suspicious, because a 48-team tournament with the variance football carries does not usually hand you a runaway favorite this early.

The model was not wrong, exactly. It was just reading a thinner slice of evidence than it would have a week later. Friendlies were doing more work in the fit than they deserved to. The match-importance weighting treated a June tune-up too much like a real fixture. So when Spain looked good in low-stakes football, the model believed it a little too readily.

The warm-ups did the teaching

Then the teams actually played, and the model got to watch.

Argentina was the story. Three nil over Iceland, two nil over Honduras. Not glamorous opposition, but the manner mattered, and the model rewards a team that wins the way it is supposed to win. Spain, meanwhile, drew Iraq before beating Peru. A draw with Iraq is not a crisis, but it is information, and the model took it as such.

By the final refit the 7.4-point lead had halved to 3.4. Spain 17.6 percent, Argentina 14.2. The top of the board had gone from a procession to a contest. None of this was me overriding the model. It was the model doing what it is built to do, which is update when the world gives it new results.

The chasing pack reshuffled at the same time. Colombia firmed from 4.6 to 5.6 on strong tune-ups. Morocco moved from 3.5 to 4.5. England slid to 6.6 after only edging New Zealand one nil, the kind of narrow win over weak opposition that the model reads as a mild warning rather than a triumph. France held at 11.0, Brazil at 8.8.

I also changed the engine, not just the inputs

Two things under the hood moved between the first run and the last.

The first was match-importance weighting. I rebuilt how the model values different kinds of fixtures. The Nations League now counts as a near-major rather than something closer to a friendly, which is the correct reading of how seriously teams now take it. Friendlies are discounted harder. This is the change that pulled the early over-confidence out of the system.

The second was calibration discipline. I held the validation work to the same standard I would hold any quantitative claim to. Across 990 internationals over the last twelve months the ensemble runs a mean ranked probability score of 0.169, against 0.278 for an uninformed baseline. That is the number I trust most in the whole exercise, because it is the one that tells me the model is actually adding information rather than dressing up noise.

Why I disagree with Opta, and why I am comfortable with it

Back to the hook. Opta has France second and England third. I have Argentina second and England fifth.

The honest answer is that Opta sees things I do not. Their models carry player-level data: injuries, suspensions, recent-form weighting at the individual level. Mine works from team-level signals only. It does not know which winger pulled up in training on Tuesday. So when we diverge materially, the smart prior is usually that the difference traces to player information they have and I do not.

But that cuts both ways. A team-level model has its own discipline. It is harder to talk into a narrative. It saw Argentina win the way contenders win and moved them up, without being anchored to a preseason consensus that had France as the European challenger. I am not claiming my second-place call is better than Opta’s. I am claiming it is honestly arrived at, and that the disagreement is the kind worth publishing rather than hiding.

What the single path says, and why you should not believe it exactly

The model’s most-likely path ends Spain 1-0 Argentina, with France taking third. Read that as a story the model finds plausible, not a prediction it is confident in. Every fixture in that bracket is the single highest-probability outcome at that node, and the joint probability of all of them landing exactly as drawn is small. The championship numbers, built from 10,001 simulated tournaments, are where the real belief lives. Spain 17.6, Argentina 14.2, France 11.0, Brazil 8.8, England 6.6, Colombia 5.6.

That is where the model stands the night before kickoff. Tomorrow the tournament starts feeding it real results again, and the refitting starts over. The next update lands after Round 1.

I will let the model keep changing its mind in public. That is the whole point of doing this where you can watch.

Risk Premium Research. Forecasts are probabilistic, built from public data only, and for research purposes rather than betting advice. The model carries no insider or player-level information.

from Risk Premium https://ift.tt/Yku6rEZ
via IFTTT

6/10/2026

How My World Cup Model Changed Its Mind

What 1 week of refitting taught me about Spain, Argentina, and the limits of my own model

The first run was confident in a way it had not earned

The warm-ups did the teaching

I also changed the engine, not just the inputs

Why I disagree with Opta, and why I am comfortable with it

What the single path says, and why you should not believe it exactly

No comments:

Post a Comment

Search This Blog