Diamond Radar Went 65% Last Run. Let's Talk About the Misses.

A postmortem of the June 12 roster update: every best call, every whiff, and the pattern hiding in both.

Shaun, the Headghoul of RGL9 min readResearch

Diamond Radar just posted its weakest graded run of the year, and I'm going to walk you through every miss instead of burying them.

The June 12 roster update came in at 65.6% — 145 of 221 upgrade calls correct. Six weeks earlier, the May 8 window hit 76.7%. So either the model got worse or the run got harder. The honest answer is mostly the second thing, with an asterisk on the first. Either way, the only postmortem worth reading is the one that shows the receipts, so here they are.

The public scorecard. 70.5% precision across two graded windows, 42% recall, methodology v1 — every number in this post traces back to this page.The public scorecard. 70.5% precision across two graded windows, 42% recall, methodology v1 — every number in this post traces back to this page.

The scorecard

Two numbers up top, both pulled straight from the public track record page — nothing massaged.

The June 12 run:

PredictionsHitsMissesHit rate
June 12 window2211457665.6%

The season so far:

PredictionsHitsHit rateRecall
All graded windows39327770.5%42%

A few things that number-on-a-marketing-page versions of this post would not tell you.

First, "the season so far" is two windows. Two. SDS shipped ten roster updates this year and eight of them moved nothing worth grading — minor tweaks, no real attribute changes — so Diamond Radar has only had two actual report cards: May 8 and June 12. Anyone selling you a "season-long accuracy" stat off a sample of two is selling you a vibe. I'm telling you it's two so you can weight it accordingly.

Second, recall is 42%. That means of every real upgrade SDS handed out, Diamond Radar flagged about four in ten ahead of time. The hit rate measures when it points at a card, how often it's right. Recall measures how many of the real movers it caught at all. Those are different questions, and 42% is the more humbling answer. Diamond Radar is decent at not crying wolf. It is not yet good at catching everything.

Where it printed: nobody's bullpen

Here's the part that actually makes the tool worth running. The five best calls from June 12:

PlayerPosCalledLandedBeat
Bryce MillerRP6976+7
Walbert UreñaSP6471+7
Sam AntonacciLF5965+6
Orlando RibaltaRP5964+5
Shane DrohanSP6670+4

Look at that list. Not a name on it you'd recognize from a Diamond lineup. Sub-70 OVR arms and a prospect or two — the exact guys nobody is refreshing the market for. That's the whole edge. Diamond Radar reads real MLB performance against what the card currently is, and unheralded relievers having quietly great weeks are where real performance and stale attributes drift furthest apart before anyone notices.

And it's not a one-window fluke. Go back to May 8 and the best calls were Antonio Senzatela (55 → 66), Connor Prielipp (59 → 68), John King, Jose Fernandez, Juan Morillo — a reliever convention. Same shape, two runs in a row: the money is in the names you'd scroll right past.

The June 12 best calls, straight off the track record page. Five names, zero you'd draft on purpose — and every one of them beat its predicted OVR.The June 12 best calls, straight off the track record page. Five names, zero you'd draft on purpose — and every one of them beat its predicted OVR.

Where it whiffed: the stars

Now the misses. Same window, the three that stung:

PlayerPosCalledLandedOff by
Fernando Tatis Jr.RF8683−3
Zack WheelerSP9089−1
Brice Turang2B8280−2

Notice anything? Every name you do recognize is in the miss column. Diamond Radar called for an 86 Tatis; SDS stamped an 83. It ran three points hot on the most famous card in the window.

And — you saw this coming — May 8 did the same thing. The notable misses there were Taylor Ward, Mike Trout, and Marcelo Mayer. Established guys. The model ran hot on the names everybody already watches.

So here's the unexpected thing I didn't go looking for, and it's the most useful sentence in this post: across both graded runs, the hits were no-name arms and the misses were stars. Diamond Radar is sharpest exactly where the market is asleep, and softest exactly where everyone's already paying attention. Which, if you think about it, is the opposite of how a tool would behave if it were just reading hype. It is reading something else. The something else is the part I'm not going to publish, for the same reason I won't publish the OpScore formula — the moment the recipe is public, every spreadsheet on Reddit reverse-engineers it and the edge is gone. But the behavior is right here in the data, and the behavior says: trust it most on the guys you've never heard of.

And the misses: every name you actually recognize. Tatis called for an 86, stamped an 83. The model runs hot exactly where the spotlight is.And the misses: every name you actually recognize. Tatis called for an 86, stamped an 83. The model runs hot exactly where the spotlight is.

The calibration: the confidence label is the whole game

This is why the June run "dropped" and it's less scary than 65.6% looks. Diamond Radar tags every call with a confidence band. Here's June 12 split by band:

ConfidencePredictionsHitsHit rate
High19013068.4%
Medium15746.7%
Low16850.0%

The same run, split by confidence band. The green bar is the part you bet on. The yellow and grey bars are the part you watch.The same run, split by confidence band. The green bar is the part you bet on. The yellow and grey bars are the part you watch.

The high band carried the run. The medium and low bands were coin flips — 15 of 31, basically noise. And that's the actual reason the headline number slid from May. In May, Diamond Radar made 172 calls and only 9 of them were medium-or-low confidence. In June it made 221 calls and 31 of them were medium-or-low. It threw more darts, and the extra darts landed in the bands where it admits it's guessing. The high-confidence core softened a little too — 77.9% in May to 68.4% in June — but it held up as the part you'd actually bet on.

The lesson is the boring, correct one: the confidence label is not decoration. When Diamond Radar says high, treat it as a real lean. When it says medium or low, treat it as a name to watch, not a card to buy. If you only ever acted on the high band, your personal hit rate this run would've been 68%, not 65%.

What I'd act on, and what I wouldn't

Pulling it together into something you can use before the next Friday update:

  • Buy the high-confidence no-names. A sub-70 reliever flagged high, sitting at a few hundred stubs, is the trade with the best risk-to-reward on the board. The downside is a card that was already cheap. The upside is the +7 you saw above, before the market repriced it.
  • Don't chase the stars off this tool. If Diamond Radar likes your 86 Tatis to climb, that's the call it's most likely to run hot on. Use market timing on the expensive cards, not upgrade predictions.
  • Ignore the medium and low band for buying. Watch it, don't fund it.

The point of Diamond Radar was never to be right every Friday. It's to be early on the unglamorous cards often enough that the wins bury the misses — and even on its worst graded run, the win column was full of guys the rest of the market hadn't looked at yet.

A thing you didn't ask for

I don't trade on Diamond Radar. I built it, I can see every high-confidence call the moment it scores, and I sit on my hands.

That's a deliberate rule, not modesty. The whole tool runs on one thing — you trusting the number — and the fastest way to poison that is for the guy holding the model to front-run the people reading it. Buying Bryce Miller at a few hundred stubs the night before a bump I called would be dealing myself off the bottom of a deck everyone else is playing straight. So I don't have a tidy "I bought low and flipped high" story to sell you here, and I never will. That's on purpose. The receipts above are the only flex I'm comfortable taking — the calls are public, graded, and the same for you as they are for me.

The honest sign-off

This is methodology v1 and a sample of two graded windows. The hit rate will move. Recall has a lot of room to climb, and that's the number I'm most focused on next. When the algorithm changes, the track record page stamps a new methodology version so you can see exactly when the math moved and judge the before-and-after honestly — no quietly recutting old grades to look smarter.

If you want the whole picture of how this thing works, the overview is here. And the live board, updated nightly, is at /radar.

I'm Shaun. The model missed on Tatis. The comments are open.