6/10/2026

How My World Cup Model Changed Its Mind

What 1 week of refitting taught me about Spain, Argentina, and the limits of my own model

The 2026 World Cup kicks off tomorrow. My model thinks Spain wins it. So does Opta. We agree on almost nothing else at the top.

Opta’s three favorites are Spain, France, England. Mine are Spain, Argentina, France. We both put Spain first, then we part ways immediately. Opta has England third at 11.2 percent. My model has England fifth at 6.6 and Argentina second at 14.2, a team Opta ranks below France. Two models, the same public data philosophy, a genuine disagreement about who the second-best team in the world is.

Thanks for reading Risk Premium! Subscribe for free to receive new posts and support my work.

That gap is the most interesting thing I can show you, and it is worth understanding where it comes from. It comes from a month of watching the model change its mind.

Forecast Wc2026 V10
491KB ∙ PDF file
Download
Download

The first run was confident in a way it had not earned

When I first stood the engine up, Spain led comfortably. The gap to the field at the top was 7.4 points. That felt clean. It also felt like the kind of number that should make you suspicious, because a 48-team tournament with the variance football carries does not usually hand you a runaway favorite this early.

The model was not wrong, exactly. It was just reading a thinner slice of evidence than it would have a week later. Friendlies were doing more work in the fit than they deserved to. The match-importance weighting treated a June tune-up too much like a real fixture. So when Spain looked good in low-stakes football, the model believed it a little too readily.

The warm-ups did the teaching

Then the teams actually played, and the model got to watch.

Argentina was the story. Three nil over Iceland, two nil over Honduras. Not glamorous opposition, but the manner mattered, and the model rewards a team that wins the way it is supposed to win. Spain, meanwhile, drew Iraq before beating Peru. A draw with Iraq is not a crisis, but it is information, and the model took it as such.

By the final refit the 7.4-point lead had halved to 3.4. Spain 17.6 percent, Argentina 14.2. The top of the board had gone from a procession to a contest. None of this was me overriding the model. It was the model doing what it is built to do, which is update when the world gives it new results.

The chasing pack reshuffled at the same time. Colombia firmed from 4.6 to 5.6 on strong tune-ups. Morocco moved from 3.5 to 4.5. England slid to 6.6 after only edging New Zealand one nil, the kind of narrow win over weak opposition that the model reads as a mild warning rather than a triumph. France held at 11.0, Brazil at 8.8.

I also changed the engine, not just the inputs

Two things under the hood moved between the first run and the last.

The first was match-importance weighting. I rebuilt how the model values different kinds of fixtures. The Nations League now counts as a near-major rather than something closer to a friendly, which is the correct reading of how seriously teams now take it. Friendlies are discounted harder. This is the change that pulled the early over-confidence out of the system.

The second was calibration discipline. I held the validation work to the same standard I would hold any quantitative claim to. Across 990 internationals over the last twelve months the ensemble runs a mean ranked probability score of 0.169, against 0.278 for an uninformed baseline. That is the number I trust most in the whole exercise, because it is the one that tells me the model is actually adding information rather than dressing up noise.

Why I disagree with Opta, and why I am comfortable with it

Back to the hook. Opta has France second and England third. I have Argentina second and England fifth.

The honest answer is that Opta sees things I do not. Their models carry player-level data: injuries, suspensions, recent-form weighting at the individual level. Mine works from team-level signals only. It does not know which winger pulled up in training on Tuesday. So when we diverge materially, the smart prior is usually that the difference traces to player information they have and I do not.

But that cuts both ways. A team-level model has its own discipline. It is harder to talk into a narrative. It saw Argentina win the way contenders win and moved them up, without being anchored to a preseason consensus that had France as the European challenger. I am not claiming my second-place call is better than Opta’s. I am claiming it is honestly arrived at, and that the disagreement is the kind worth publishing rather than hiding.

What the single path says, and why you should not believe it exactly

The model’s most-likely path ends Spain 1-0 Argentina, with France taking third. Read that as a story the model finds plausible, not a prediction it is confident in. Every fixture in that bracket is the single highest-probability outcome at that node, and the joint probability of all of them landing exactly as drawn is small. The championship numbers, built from 10,001 simulated tournaments, are where the real belief lives. Spain 17.6, Argentina 14.2, France 11.0, Brazil 8.8, England 6.6, Colombia 5.6.

That is where the model stands the night before kickoff. Tomorrow the tournament starts feeding it real results again, and the refitting starts over. The next update lands after Round 1.

I will let the model keep changing its mind in public. That is the whole point of doing this where you can watch.

Risk Premium Research. Forecasts are probabilistic, built from public data only, and for research purposes rather than betting advice. The model carries no insider or player-level information.

Thanks for reading Risk Premium! Subscribe for free to receive new posts and support my work.



from Risk Premium https://ift.tt/Yku6rEZ
via IFTTT

6/06/2026

What a Football Model Teaches You About Forecasting Markets

What a football model teaches you about forecasting markets

In five days the World Cup kicks off. I built a model to forecast it. The model is not the point. The point is that football, unlike markets, grades you in public and on a deadline.

grey concrete figurine

Most forecasting lives in a comfortable fog. You make a call, the world moves, and by the time the outcome arrives the question has changed enough that nobody checks. Markets are the worst offender. A view on equities in June is unfalsifiable by December because ten other things happened in between. Football has no such mercy. The whistle blows, the score is the score, and four weeks from now everyone can see whether the model was right.

So I treated the World Cup as a stress test of the same discipline I apply everywhere else.

Forecast Wc2026
490KB ∙ PDF file
Download
Download

The approach, in outline

The model is an ensemble. It combines more than one independent statistical engine, each estimating team strength and match outcomes a different way, then blends them. It layers in external strength signals beyond raw results, and it recalibrates for the things simple models get wrong, draws chief among them. On top of that sits a Monte Carlo simulation: the tournament is played forward 10,001 times over the new 48-team, 104-match format, and the championship probabilities are the frequencies that fall out.

I am keeping the internals to myself. The value of a model is not the idea, which is freely available in any sports-analytics paper, it is the calibration, the choices, and the hours of getting the details right. What I will share is the output and, more importantly, the evidence that it works.

The discipline that actually matters

Anyone can produce a forecast. The question is whether you can produce one that beats doing nothing. So the model is validated the way I validate anything before I trust it.

Subscribe now

Walk-forward backtest, no peeking: every prediction is made using only data available before that match. Over the last 12 months of international football, 997 matches, the model scored a mean Ranked Probability Score of 0.165. An uninformed forecast scores 0.278. A perfect one scores 0. Outcome accuracy came in at 50.8 percent against a 33.3 percent baseline for blind three-way guessing. Across five recent major tournaments it beat both naive benchmarks and every single component model in the battery.

That last point is the one I care about most. A model has to earn its complexity. If the elaborate version cannot beat a simple rule, the elaborate version is vanity. This one clears the bar, but only modestly, and I will say so plainly: the edge is real and it is small.

What it says

Spain, champions, at 19.6 percent. Then Argentina at 12.2, France at 12.1, Brazil at 8.6, England at 7.6. The top four hold 52 percent of the title probability between them. The most likely final is Spain against Argentina, with France and Brazil meeting in the third-place game.

Read those numbers correctly. Spain at 19.6 percent means Spain loses this tournament four times out of five. The single most likely bracket, the one where every favorite advances exactly as projected, has a joint probability close to zero. That is not a flaw in the model. It is the truth about football, and the same truth holds for markets. The headline call is the least interesting number on the page. The distribution is the forecast.

Share

Why this connects to the day job

Three habits carry directly from this exercise into how I think about markets.

First, ensemble over conviction. Two independent methods that disagree tell you more than one method you happen to like. Where they agree, lean in. Where they diverge, the gap is information, not noise to be smoothed away.

First principles on uncertainty. A probability is a statement about long-run frequency, not a prediction. Spain at 19.8 percent and a portfolio position sized to a 20 percent base rate are the same kind of claim. Treat them the same way.

And backtest honestly or do not bother. The temptation in every model is to let a little future information leak into the fit and admire the result. The whole value of the football version is that the leak gets exposed in public, on a fixed date, with no second question to hide behind. If a process cannot survive that, it should not be running your money either.

What happens next

This is version one. I will keep iterating up to kickoff and through the tournament as results land. The model will be wrong about specific matches, often. The test is not whether Spain lifts the trophy. The test is whether, across 104 matches, the probabilities turn out to be calibrated. That is the only thing worth measuring, and unlike most of what I do, you will all get to watch it happen.



from Risk Premium https://ift.tt/QabToZv
via IFTTT

6/01/2026

A Reminder That Coherence Exists

Listening to Bill Burns's insider interview is such a positive and uplifting exercise. It seems we can still have voices from the US that are intellectually rigorous, with coherent, mature, and measured speech. Perhaps hope is not lost after all, but these kinds of voices need to return to the US administration urgently. https://www.economist.com/insider/inside-defence/how-to-handle-americas-adversaries

- Pedro

Read on Substack

5/29/2026

Who actually created the value?

Separating the CEO from the hand they were dealt

A new Claude Code skill that grades any public-company CEO on what is theirs, plus ten anonymized cases to show it working.

I have been having a lot of fun building skills in Claude Code lately. A skill is just a packaged set of instructions and tools that teaches the model to do one thing well and repeatably. Once it exists, I can point it at a new input and get the same structured output every time, without rebuilding the logic by hand.

Thanks for reading Risk Premium! Subscribe for free to receive new posts and support my work.

The latest one grades CEOs. Give it any public-company chief executive and it returns the same disciplined read: a setup, a scorecard, two composite scores, and a verdict.

The problem it fixes

Most CEO commentary is captured by the company’s own narrative. Management foregrounds the metrics it hit and quietly reframes the ones it missed. Headline results get flattered by cyclical tailwinds, by one-time items, by recovery off a depressed base, and by the execution of a strategy that someone else designed. Read enough earnings calls and you start scoring the press release rather than the person.

The skill is built to refuse that. It strips the headline back to what is attributable to the CEO. It credits real value-add fully, and declines to credit luck, inertia, or accounting noise.

How it works

The conceptual core is two measures that look similar and are not.

The first is the delta from baseline. I score the company’s qualitative state at handover, then again today, and take the difference. It answers a narrow question: does the company look better or worse than when this person took over? Useful as a sanity check. Not a measurement of value creation.

The second is the residual. It asks whether the CEO beat a peer-median replacement holding the same hand, over the same calendar window. Actual performance minus expected performance, where expected is what a competent peer would have delivered with the same inheritance in the same cycle. This is the number that matters, because it controls for both the cycle and the hand.

The two often disagree, and the disagreement is the interesting part. A CEO can post a positive delta and a deeply negative residual: the company looks better on paper, yet lagged every peer in the same cycle. The reverse happens too. A steward can inherit a near-perfect franchise, watch the qualitative score barely move, post a slightly negative delta, and still compound far ahead of the sector. Positive residual, negative delta. The franchise does the work, and the question becomes whether the leader added anything on top of it.

Around those two measures sits the machinery: a thirteen-dimension scorecard running from strategic vision to succession quality, and four adjustment disciplines that separate recurring from one-time, cyclical from structural, inherited from originated, and input metrics from output metrics. There are a couple of hard rules I cannot override. An ouster under cause caps the governance score. Only an originated, non-consensus vision earns a 9 or a 10, which keeps the top of the scale honest.

The whole thing is built to fight my own bias toward a generous 7. Most CEOs, in most tenures, land at 4 to 6 once the rigor is applied evenly.

Ten cases, no names

To show the framework working without grading anyone by name, I ran ten anonymized cases across industries, from semiconductors to media and entertainment. The industry stays visible. The identity does not. Anonymizing also lets me publish it freely and let the method speak rather than the personalities.

The spread runs from 2.8 to 8.8, and the ordering is clean rather than collapsing into a noncommittal middle.

Founder-originators sit at the top. They built the position the company holds, so the cycle helped them but did not create them. There is also a hired CEO near the top, which matters: the high marks are not reserved for founders, they are reserved for originated theses, and a non-founder who originates a winning, non-consensus strategy earns the same credit.

Value-destroyers ousted under cause sit at the bottom, where a high inherited baseline makes the destruction worse, not better, because the head start was squandered.

The most useful finding sits in the middle. The framework is consistently harsher than sell-side on competent operators of inherited premium franchises, and it refuses to read a cyclical recovery as skill. The market prices the franchise and the rebound. The framework grades the leader. Both views are defensible, and the gap between them is exactly the signal worth surfacing

Ceo Assessment Anonymized
363KB ∙ PDF file
Download
Download

.

What it cannot do

A note on honesty, because a framework that pretends to see everything is worse than useless. This one scores observable performance. It cannot detect undisclosed fraud, hidden accounting irregularities, or operational misbehavior that has not surfaced yet. When I tested it against historical cases where the outcome is now known, four matched cleanly and one only partially: the executive who looked strong right up until a concealed scandal broke. That case is the blind spot, and it is a standing reminder to pair this kind of scorecard with separate fraud-and-governance diligence rather than treat the composite as the whole picture.

The point

The point is not to dunk on chief executives. It is to keep score in a way that survives the disciplines, that treats a turnaround CEO and a fortress steward by the same rules, and that answers the only question worth asking: who created value relative to the hand they were dealt, rather than whose company has the nicer headline.

The full ten-case deck is below. Built, as ever, with a fair amount of fun, in Claude Code.


This is published for informational and educational purposes only. It is not investment advice, nor a recommendation, offer, or solicitation. The cases are illustrative and anonymized; no company or individual is named, and any identification a reader infers is the reader’s own. Views are my own, current only as of the date shown, and may change. Past performance is not indicative of future results. Do your own research.

Thanks for reading Risk Premium! Subscribe for free to receive new posts and support my work.



from Risk Premium https://ift.tt/huArlU5
via IFTTT

5/22/2026

Just read this Bartleby column in The Economist. Witty, sharp, painfully on point. It got me thinking about all the corporate buzzwords and slop we hear in meetings and town halls. And, full confession, that we sometimes deploy ourselves. Whoever has never sinned, cast the first stone 😀. So I turned the laugh into a game: Town Hall Bingo (attached). Six cards, five in a row to win, "Velocity pivot" enshrined as the free center square in honor of the article. Print it, hand it out before the next all-hands, and let me know how it goes. Introducing "Velocity pivot": the corporate world's Lorem ipsum 👇 [link] https://www.economist.com/business/2026/05/14/introducing-velocity-pivot

- Pedro

Read on Substack

5/10/2026

Fela Kuti: Fear No Man by Jad Abumrad

Sometimes in life you get overwhelmed, in this case positively overwhelmed, by a book, a piece of music, a painting, a movie… Something sweeps you off your feet and makes you stop, think and reflect on what you just experienced.

Fela Kuti – Fear No Man was one of those cases. An extremely well-crafted podcast series (13 episodes in total) about the life and work of Fela Kuti, it gave me so much more than I was expecting when I started it. Its narrative quality and care sometimes made me enter a state of flow while listening, that state where time becomes relative and goes by so quickly.

Subscribe now

Why I liked it so much?

1. Introduced to my universe a major 20th-century artist that I was completely ignorant about.

2. Introduced me to his music and work, which, on top of everything, I really liked.

3. Provided me a brief walk-through of recent Nigerian history, through the lens of his life.

4. Made me stop and think, more than once, about how people live their lives in such a different, rich, and by the same token difficult and complex environment, so far away from my reality.

5. Learnt about music concepts I was not aware of: the “ostinato” rhythm, which makes me feel at home, and “counterpoint,” which is the base of his music and, surprise, surprise, entangles his music with Bach’s one (surprised?).

6. How the author did not sugar-coat the most controversial areas of Fela Kuti’s life, which adds a strong plus to the full narrative.

Fela Kuti: AfroBeat and the Significance of Kalakuta Republic | The ...

If you want to jump to a completely different world and reality without leaving yours, if you want to get to know Fela Kuti, or if you already know him and want to deepen that knowledge, do not waste this opportunity. Start the journey.

I hope you like it as much as I did, and that by the end you just feel a little sad because the series is over.



from Risk Premium https://ift.tt/6QYRdro
via IFTTT

5/03/2026

Argos, Second and Final Ticker: XOM

A week ago Argos published its first ticker call. Today, as promised, the second and final of the pair: ExxonMobil. Spot $152.75. The Argos call is HOLD with negative skew. The Monte Carlo base case sits 17% below today’s quote, the bull tail just barely reaches above it, and the disagreement with Street is no longer where it looked a week ago.

This piece walks through what the engine sees, where it disagrees with consensus, and why the call lands at HOLD rather than SELL despite 86.6% of the simulated paths closing below the print. Then a meta-observation on the two published calls, a note on what’s next, and a longer note on what won’t be in the next post.

Xom Argos Full 2026 05 03
2.3MB ∙ PDF file
Download
Download

The data the call is sitting on

Before the numbers, the disclosure: the engine runs on financials retrieved from SEC EDGAR. The most recent retrieved filing is the FY2025 10-K (period 2025-12-31, filed 2026-02-18). XOM publicly released Q1 2026 results on its corporate IR site earlier this week, but the structured 10-Q with XBRL-tagged financials had not yet filed on EDGAR as of the run date. The engine consequently does not pick up Q1 2026 actuals. The Y1 anchor extrapolates from FY2025 actuals plus drift, not from Q1 2026 reported numbers.

Re-run is queued for the moment the 10-Q files on EDGAR, expected around 2026-05-05.

This matters more for this call than for the prior one. Q1 is a real test, and the call is genuinely contingent on it. If revenue prints +17% in line with Street’s FY26 expectation, the MC base case re-prices toward $145 to $160 and the HOLD becomes a missed opportunity. If revenue prints in line with the model’s $336B Y1 anchor (roughly flat year-over-year), Street’s +15% revenue gap collapses toward us and the HOLD with negative skew is the right call. Either outcome will be testable within days of this article going out.

Subscribe now

The distribution Argos sees

The Monte Carlo base case is $127.55. That’s a 16.5% gap to the $152.75 spot. 86.6% of 10,001 simulated paths over a 7-year forecast horizon terminate below today’s price. The P10 to P90 band runs from $101.36 to $156.76, and that detail matters: Bull P90 just barely reaches above spot. Roughly 13% of the simulated paths land at-or-above today’s quote. That’s not nothing.

The DCF sanity check sits at $125.74, inside the MC interquartile range ($113.50 to $142.41). DCF and MC are aligned methodologically. The MC base of $1.81 above DCF reflects right-tail optionality the deterministic single-path DCF cannot price. Y1 EBIT in MC is $41,992M (12.6% margin); Y1 EBIT in DCF is $43,246M (12.9% margin). Both are anchored on FY2025 LTM $41,871M actual.

Now the shape. The MC distribution itself is near-symmetric around its own base: distance from base to P10 is $26, distance from base to P90 is $29. But the distance from spot to P10 is $51, and the distance from spot to P90 is just $4. That asymmetry, structural negative skew versus the market, is what the call is built on. Most outcomes are below spot, the bull tail reaches above, but the bear tail extends much further down than the bull tail extends up.

For comparison: Street median target $165.50 (n=22, mean rec 2.36 / “buy”, split 4 SB / 7 B / 13 H / 1 S). Street is roughly 30% above the MC median. That’s a real disagreement, and it isn’t where you might first guess.

The disagreement with Street is on revenue, not margins

The instinctive first read for an integrated oil major trading at an FY25 EBIT margin of 12.6% with Street targets implying a much richer profitability path would be that the disagreement is about margins. It isn’t.

Street FY26E revenue is $389B. The model’s Y1 anchor is $336B. That’s a +15% revenue drift differential at the top line. Flowed through at an FY24-style 14% EBIT margin (which is roughly what the model’s calibrated forward path supports by Y7), the revenue gap alone produces about $12B more EBIT than the model, which closes much of the gap between the $127.55 MC base case and the Street’s $165.50 target.

The terminal multiple is not the source of disagreement either. The model’s blended exit at 6.73x EV/EBITDA is sector-aligned: Damodaran sector average is 6.30x, live peer median is 7.16x (CVX 11.26x, SHEL 11.54x, COP 7.16x, BP 4.30x, TTE 5.63x). XOM trades at 12.12x trailing today, which is the elevated print, but the model isn’t penalizing the equity for staying at 12x. The exit assumption is that XOM normalizes to a sector-aligned multiple over the forecast horizon. Street isn’t disagreeing with that.

What Street is doing, in effect, is taking a higher oil deck or a faster Upstream/Energy ramp than the model’s peer-anchored drift priors allow. The drift priors are: Upstream 3.5% (anchored on XOM’s own 2024 Investor Day +4.3%/yr volume guide, with a -0.5pp transition haircut), Energy Products 1.0% (refining-capacity-vs-demand compression risk), Chemical Products 2.5% (industry +3% mid-cycle from IHS/WoodMac), Specialty Products 2.5% (blended mature 1.5% with high-growth specialty chemistry 3-4%). Reasonable analysts can disagree on these calibrations. Q1 will pressure-test them.

Share

Why this is a HOLD and not a SELL

86.6% probability that the long-run trajectory ends below today’s quote sounds like a SELL. It isn’t, and the reason matters.

First, the bull tail. Bull P90 $156.76 reaches above spot. About 13% of the 10,001 paths terminate at-or-above today’s price. Those paths require Upstream margin sustained near 71% with continued Permian and Guyana ramp, Chemical mid-cycle reversion to 8-10% SOI, and Specialty Products 2x earnings target on track. All credible state-of-the-world combinations, not a tail of fantasies. You cannot short a megacap when one in eight of your own modeled paths reaches the strike.

Second, the floor. ExxonMobil’s balance sheet is fortress-grade by every standard credit metric. Altman Z-Score 4.67, deep Safe, and the highest among integrated majors (CVX 3.40, SHEL 3.03, COP 3.26, TTE 2.19, BP 1.76). Interest coverage 69x. Net-debt-to-EBITDA 0.6x. AA-equivalent credit. The Beneish M-Score is -2.75 (Unlikely category for earnings manipulation, 8 of 8 indicators valid). Distress probability is 0.0% across all 7 forecast years. The recovery floor at $9.19 a share is never invoked in any of 10,001 paths.

Third, the carry. XOM returns approximately $16B annually to shareholders through dividends and buybacks. Dividend yield is 2.7%. While the gap to fair value closes, the equity holder is paid to wait.

Put together: 86% of paths below spot is real, and is the basis for the negative-skew designation. But the structural reasons not to be short, credible upper tail and fortress credit and $16B annual capital return, are enough to disqualify a confident SELL on a megacap. The default action is HOLD: trim if overweight versus benchmark, defer new money until either Q1 2026 10-Q resets the Y1 anchor or spot pulls toward the $127 to $130 region. Bull tail is where the upside lives if Street is right; balance sheet is what protects the floor if the model is right.

What XOM taught the engine

XOM was harder than the first ticker. Different business model, different segment structure, different reporting conventions, different XBRL conventions, different capital-allocation history. The engine had to flex.

A few specific lessons that translated into code:

The first ticker reported D&A folded inside COGS. XOM reports D&A as its own income statement line. This sounds trivial; it is not. It changed the EBIT identity check in the engine, the COGS extraction logic, the Yearly_BS_PnL Excel formulas in two locations, the Yearly_CFS aggregation, and the DCF and MC display layers. A new per-ticker config flag, da_separate_from_cogs, now wires the right EBIT formula across all of those touch-points based on filer convention.

XOM uses different revenue concepts across years. The custom XBRL tag xom:TotalRevenuesAndOtherIncome doesn’t exist in 2019 to 2021 filings, where the company tagged revenue as us-gaap:Revenues. Pre-ASC 606 filings (2016) require adding us-gaap:ExciseAndSalesTaxes to the COGS sum. The pension expense tag changed naming convention between FY2018 and FY2019. None of this is exotic. It’s the normal reality of financial reporting evolving over time. But it’s the kind of thing that silently corrupts a calibration if it isn’t handled.

The Yearly_CFS tab was aggregating quarterly data by calendar year while Yearly_BS_PnL was aggregating by fiscal year. Both calendar-year and fiscal-year are December for the first ticker (and for AAPL, MSFT, WMT) so the bug was silent on those filers. XOM is also a December fiscal-year filer, so the bug was still silent here, but the audit caught it. It would have shown up loudly the next time we hit a non-December filer. Caught, fixed, and the audit harness now runs end-to-end after every report build.

The segment SOI extraction was reading parent-only NetIncomeLoss + Tax for FY25 ($44.8B), but the Note 3 segment SOI total in the 10-K is $46.2B, which includes NCI and interest. Adding NetIncomeLossAttributableToNoncontrollingInterestand InterestExpense to the segment SOI concepts closes the gap. Small, easy to miss, structural.

The SGA-overhead calibrator had two structural bugs that compounded. It floored quarterly overhead readings at zero, biasing theta upward (negative readings clipped, mean shifted up). And both DCF and MC projectors were seeding mean reversion from the calibrator’s own theta, which made mean reversion a no-op (margin held flat at theta for all years instead of reverting toward it). Fixed to seed from the true LTM observation. The fix lifted XOM Y1 EBIT by $2.8B / +7%.

In total: roughly twelve engine fixes since the first ticker, plus two new validation layers that fire at engine load (P&L identity, BS identity, CFS bridge, magnitude breaches), plus a new sector profile library that lets future filers in the same business model inherit XOM’s lessons automatically (a CVX, COP, or BP run would now load with the right tag overrides on first attempt), plus a regression harness that catches silent engine drift across runs.

Each new ticker pushes the engine harder. That is the design intent. The engine is now tangibly more robust than it was a week ago because XOM forced the issue.

Leave a comment

The third ticker we ran but won’t publish

For clarity: a third ticker, Goodyear (GT), was run through the same engine on the same public-filings-only diet as the other two. No insider input of any kind. Not in this post, not in the deck, not in the report.

I work at Goodyear, which makes publishing a public valuation call inappropriate, full stop. The methodology was identical to the other two. The decision to not publish is the only difference.

This is worth saying out loud because there’s an obvious question lurking. Has the engine been validated on three names or two? The honest answer is three. The third just isn’t visible.

Two for two, both negative-skew HOLDs

A meta-observation worth flagging.

The first ticker came back HOLD with negative bias. The second, this one, comes back HOLD with negative skew. Both sit materially below their respective Street consensus targets. Two for two on the published side, both bearish-leaning HOLDs.

There are two non-exclusive explanations.

The framework may be conservative by construction. Drift priors are anchored on peer evidence and corporate guidance with mild haircuts, not on consensus revenue projections. Cost ratios are calibrated on multi-year history and mean-revert toward calibrated theta values, not toward analyst-implied targets. Bayesian shrinkage pulls drifts toward zero in low-evidence regimes. Vasicek interest-rate dynamics give a non-trivial probability mass to higher-rate paths. None of these is wrong; all of them tilt the engine toward outputs below Street.

Or the late-cycle US tape is genuinely priced rich. Two large-cap US equities in different sectors both showing meaningful gaps to consensus is a small sample, but it is a sample. P/E re-rate compression risk shows up as a Tornado driver in this report; it isn’t an exotic concern.

Probably some of both. Worth flagging before the next run rather than after, and worth designing the next layer to pull in names where the call wouldn’t be bearish by default.

What’s next: a Graham-anchored screener

The next pillar of Argos, still unnamed, is a screening tool. It is anchored on Graham’s Intelligent Investor framework, with three updates for the world Graham did not write in.

Buybacks treated as quasi-dividends. A literal reading of Graham’s defensive-investor screen disqualifies almost every quality compounder of the last twenty years on the dividend criterion alone, because firms have rationally moved cash returns toward repurchases. A modern screen has to reflect that.

Intangibles-aware valuation. Graham’s price-to-book ceiling of 1.5x is unreachable for any asset-light business and meaningless for businesses where the assets that matter (brands, networks, software stacks) don’t sit on the balance sheet. The modernized version replaces book-value gates with EV/IC or sector-relative metrics.

Rate-regime-aware P/E thresholds. Graham’s 15x P/E ceiling was pegged to a rate world that produced earnings yields with margin over high-grade bonds. That world hasn’t existed for decades. The threshold has to flex with the ten-year yield and the equity risk premium.

On top of the Graham spine sit Argos-native overlays: Z-score, ROIC versus WACC spread, EV/EBITDA versus peer dispersion, drift-to-trailing gap, distribution width. Graham is the gatekeeper, Argos signals are the prioritization layer once a name has passed.

The point of the screener: the engine knows how to value one ticker. The next layer decides which ticker is worth the run. And, secondarily, addresses the watch-out from the previous section by pre-filtering for margin-of-safety names rather than letting the engine work on a randomly-selected late-cycle book.

Going quieter from here

A note on what to expect from this Substack going forward.

The build-in-public phase served a specific purpose. It established that the methodology exists, that the engine works on hard cases, and that the calls are reproducible by anyone with public filings and time. Two reports, two decks, one merged PDF, several thousand words of methodology trace. Mission accomplished on that front.

The next phase is shipping and seeing whether the calls hold up. Deployment of the screener and live tracking of the first two calls is targeted for the next one to two months. Results, if any are worth reporting, in six to twelve.

Less detail going forward, by design and not by accident. There are commercial reasons (a screener isn’t useful if its weights are public). There are intellectual reasons (a year of live tracking is more informative than another paper portfolio). And there are practical reasons (the build-in-public phase consumes time that ought to be spent on the build).

I’ll surface results when they exist and are honest. If the calls underperformed, that will be in the post too.


The XOM report and deck are attached to this post. The first ticker’s report is in the prior Substack post.



from Risk Premium https://ift.tt/6pGHThb
via IFTTT