James Santucci's (mostly) software development blog

In early December last year, Team USA played Team Europe in the 31st Mosconi Cup, a pool tournament run by Matchroom Pool pitting five players representing USA against five players representing Europe. Each matchup was a race to five racks. These were the teams and each player’s Fargo rating:

Europe	USA
Josh Filler (859)	Fedor Gorst (847)
Jayson Shaw (834)	Shane van Boening (846)
Moritz Neuhausen (819)	Skyler Woodward (812)
David Alcaide (817)	Tyler Styer (791)
Pijus Labutis (812)	Billy Thorpe (778)

Europe brought a strong team. Josh Filler is the top-ranked player in the world by Fargo rating (which is like chess’s Elo rating system). About two months before the Mosconi Cup, Neuhausen won the Peri 9 Ball Open a week before losing to Pijus Labutis in the Hanoi Open final. David Alcaide won the Philippines Open two weeks after that. Jayson Shaw didn’t win a major tournament in 2025 but remains in the world’s 15 highest ranked players and lost the semifinal at the Philippines Open on an inexplicable 9 ball miss.

On the USA side, Fedor Gorst is typically in the top-ranked players on the World Nineball Tour and was the top-ranked player in 2025, and Shane van Boening is a legend who consistently has one of the top five ratings in the world. Styer and Thorpe were the only two players on either team ranked outside the world top 100, while Woodward is rated similarly to Alcaide, Labutis, and Neuhausen. The team’s recent form was worse than Team Europe’s. In the three tournaments I mentioned above:

Thorpe and Woodward both made the quarterfinals of the Peri 9 Ball Open, where Woodward lost 10-2 to Neuhausen.
Gorst, Woodward, and Thorpe all lost in the last 64 at the Hanoi Open.
Neither of Gorst nor Thorpe advanced to stage 2 of the Philippines Open.

In World Nineball Tour rankings, Shaw was the lowest ranked player on team Europe at 16th, while van Boening was the second highest ranked player on Team USA at 15th (Styer, Woodward, and Thorpe were ranked 26th, 27th, and 34th)¹.

It went poorly for the United States, who lost 11-3. Reddit’s solution is to fire Tyler Styer.²

11-3 is a drubbing, but whether you look at ratings, recent tournament results, or tour rankings, Europe were clear favorites. Because of the obvious gap, I was curious just how bad an 11-3 result was relative to expectations, since, with two non-flubbed game endings, the score is 9-5 after 14 games, which is… I mean it’s not close, but it’s sort of possible to imagine a comeback from there. So what would have been a reasonable expectation for how many matchups the US team should win?

My guess was that given the differences in the players’ ratings on the two teams, 11-3 wasn’t that unlucky, but I was wrong – 11-3 was about a 1 in 20 outcome given the players’ ratings. The bad news for Team USA though is that they still lost a big majority of the simulated Mosconi Cups that didn’t end 11-3.

If you want to read how I tried to answer that question, check out nerd stuff. If you just want to see some plots, you can skip to plots.

Nerd stuff

Generating a schedule

According to the rules posted on the Matchroom website, each day has matches proceed in a set order, with requirements that in some series of matches, everyone on the team has to play. The three kinds of matchups are team matchups, doubles matchups, and singles matchups. All matchups are races to five racks.

Team USA lost 11-3 with the actual schedule, but that schedule is only one of many possible schedules. Even in the first team match, there are 120 different lineups each of the two teams can choose, so there are 14,400 possible matchups for day one / match one. On the broadcast, we were told that teams can’t re-use the same lineup from a previous team matchup, so the second team matchup only had 119 x 119 = 14,161 possible pairs of lineups. That makes over 200 million possible schedules for just two of the team matches.

There are way too many possible schedules to simulate all of them. I added an argument to the command line interface for running the simulation to control how many schedules to generate. For each day, I randomly picked the play order for each team subject to the rules posted on the Matchroom website, i.e. respecting the constraints about sets of matches in which all five players on a team had to play at least once. I didn’t bother requiring the team matchup lineups to vary.³

Here’s one random schedule I created, so you can check whether the schedules I generated were legal schedules (Europe won this one 69% of the time, average racks won for the US was about 7.8):

Example schedule

Day 1

Team match: Styer/Thorpe/van Boening/Woodward/Gorst vs. Filler/Alcaide/Shaw/Labutis/Neuhausen
Doubles: van Boening/Thorpe vs. Alcaide/Filler
Singles: Styler vs. Shaw
Doubles: Woodward/Gorst vs. Labutis/Neuhausen

Day 2

Team match: Woodward/van Boening/Styer/Thorpe/Gorst vs. Labutis/Shaw/Filler/Neuhausen/Alcaide
Singles: Woodward vs. Labutis
Doubles: Styer/van Boening vs. Shaw/Labutis
Singles: Woodward vs. Filler
Doubles: Gorst/Thorp vs. Alcaide/Neuhausen

Day 3

Team match: Styer/Gorst/Woodward/Thorpe/van Boening vs. Shaw/Labutis/Filler/Neuhausen/Alcaide
Singles: Styer vs. Shaw
Doubles: Woodward/Styer vs. Neuhausen/Shaw
Singles: Thorpe vs. Filler
Doubles: van Boening/Gorst vs. Labutis/Alcaide
Singles: Woodward vs. Filler

Day 4

Singles: Woodward vs. Neuhausen
Singles: Gorst vs. Labutis
Singles: Thorpe vs. Shaw
Singles: van Boening vs. Filler
Singles: Styer vs. Alcaide
Singles: van Boening vs. Alcaide

Picking a winner for each matchup

Fargo ratings can be translated into win probabilities per rack. The Wikipedia page on Elo rating systems says that, for two players with ratings $R_A$ and $R_B$ , the win probability for player $A$ ( $E_A$ ) is

$E_A = \frac{1}{1 + 10^\frac{(R_B - R_A)}{s}}$

where $R_A$ is the rating for player $A$ , $R_B$ is the rating for player $B$ , $s$ is some “scaling factor,” and $E_A$ is the probability that player $A$ wins whatever thing the rating applies to. In pool, the rating applies to individual racks.

The FargoRate FAQ explains:

When two players are 100 points apart, say a 300 versus a 400, the ratio of game wins will be near 1:2, as in 5 games to 10 games, or 50 games to 100 games.

That’s consistent with a scaling factor of 400 as in the Wikipedia examples, so that’s the value I picked.

Singles matchups

Singles matchups are straightforward. In each rack, the player representing Team USA has some probability of winning, let’s say it’s 40%. I generated a random number between 0 and 1, and if it was less than that probability, I gave Team USA the rack. I repeated this until one of the players reached five racks.

Here’s how the probability of victory in each rack changes based on the difference in rating between two players:

Team matchups

Team matchups are cycles of individual matchups. The team matchup cycles throught the five players from each team in orders determined by the teams’ captains. In each rack, picking a winner is exactly the same as in a singles matchup.

Doubles matchups

There’s probably some mathematically correct way to combine ratings for pairs of players, but I don’t know what it is. To pick probabilities for each team, I took the average of the win probabilities of each of the four possible matchups, e.g. if Woodward and Thorpe played Labutis and Alcaide, I calculated win probabilities for Woodward vs. Labutis, Woodward vs. Alcaide, Thorpe vs. Labutis, and Thorpe vs. Alcaide, then I averaged those four values.⁴

After I had a win probability, I picked a winner for the matchup the same way as in singles matchups.

Checking my implementation

Being able to generate schedules and results is a good start, but I needed a way to validate that the results made sense. To test how whether the simulation results were reasonable, I matched up two fake teams against actual team Europe. One team was all Josh Fillers, i.e., in every matchup except singles matchups against real Josh Filler, Team Filler Clones was favored. That team won the Mosconi Cup about 85% of the time. The other team was all me, with my 553 rating. That team won 0% of the time.⁵ Those both seemed reasonable to me. Here’s the win percentage curve for homogeneous teams of players with a bunch of ratings between mine and a mix between Josh Filler and a Terminator. I generated 1,000 schedules and simulated each one 50 times.

Clone teams look like they’d win about half the time with a rating near 830, which makes intuitive sense, since that’s around the average for Team Europe’s ratings.

The results

I generated 10,000 schedules and simulated each one 100 times. This plot shows how often team USA won each number of racks:

This plot answers the main question I had, and the answer is that only 3 racks for Team USA is pretty rare. About 95% of the time they do better than that. Unfortunately, there’s a big gap between doing better than three racks and winning. About a third of the time, the US team wins 8, 9, or 10 matchups, another third-ish of the time they win, and the remaining third is losing with 7 or fewer racks.

So why not fire Styer?

Team USA were the US’s five highest ranked players in Matchroom’s World Nineball Tour rankings and the first, second, third, seventh, and twelfth highest-rated US players by Fargo rating. Team Europe were Europe’s third, fifth, sixth, seventh, and tenth ranked players by World Nineball Tour rankings and first, third, eleventh, thirteenth, and seventeenth highest-rated players by Fargo rating. It’s easy to look at the missed 9 balls and imagine a universe where the US team makes them instead and goes on to win the whole thing, but the odds were against them from there anyway. It’s even easier to imagine different players representing the US team altogether, but it’s hard to come up with a way to pick the teams that gives the US an advantage.

If instead of the actual teams you let each team bring their top five players by Fargo rating, the matchup gets worse. Those teams would be:

Europe	USA
Josh Filler (859)	Fedor Gorst (847)
Francisco Sanchez Ruiz (846)	Shane van Boening (846)
Jayson Shaw (834)	Skyler Woodward (812)
Wojciech Szewczyk (832)	Mike Dechaine⁶ (803)
Albin Ouschan (831)	Thorsten Hohmann (793)

In sims of these matchups, the US won about 26% of the time.

The US on average wins fewer racks in this matchup, wins 3 or fewer racks about 8% of the time instead of 5% of the time, and wins overall 8 percentage points less often.

If both countries instead brought their top teams by World Nineball Tour rankings, the US team doesn’t change at all, and Team Europe is pretty similar. Shaw, Alcaide, and Neuhausen are gone, but Kaci, Sanchez Ruiz, and Krause bring about as many total rating points.

Europe	USA
Kaci (831)	Gorst (847)
Sanchez Ruiz (846)	van Boening (846)
Filler (859)	Styer (791)
Krause (794)	Woodward (812)
Labutis (812)	Thorpe (778)

With similar teams to the actual matchups, it’s not surprising that Team USA wins a similar number of Mosconi Cups in this simulation or that the shape of the curve looks about the same as the first plot:

Europe can send many different strong teams. For the US, that isn’t the case – this team was close to as good as it gets in terms of Fargo ratings and was the best team the US could assemble by World Nineball Tour rankings, but it’s a significant underdog against any of the European teams.

Pool doesn’t currently have a strong statistical backing. It’s not like baseball, where there’s a small number of outcomes for a plate appearance, or basketball, where it’s easy to track how often an offense can generate open, valuable shots, or hockey / soccer, where you can tell whether a team is winning on average by whether they’re keeping possession in attacking areas. Without a robust statistical explanation of how one player beats another, it’s easy to fixate on specific high leverage events to explain a loss.

I think in this case, focusing on Styer’s misses misses the point. Overall, Team Europe won 63 of the 108 racks played. Hand the two flubbed 9 balls to Styer and that drops to 61 out of 108. This US team against that Europe team was always losing. It could have happened less dramatically, but a comeback was unlikely either way. Hypothetical other US teams against other European teams have the same disadvantage. Firing Styer is one thing you could do if you wanted to assemble a different team with a better chance to win, but the first problem you’d run into is finding someone to fill the slot as good as Tyler Styer.

All ratings and rankings quoted as of what I can see on 2025/02/10.↩︎
Billy Thorpe and Skyler Woodward too. To be fair to the person who posted that/to un-strawman a bit, they weren’t specifically claiming that bringing someone other than Styer would have made the US win the Mosconi Cup.↩︎
I decided that the probability of identical team matchups was low enough, and that the probability of those identical team matchups affecting the results was low enough, that it wasn’t important. There’s a $\frac{1}{120}$ chance that a team’s lineup for the second team match matches the first one and a $\frac{1}{60}$ chance that the third team match lineup matches either of the first two, for a 2.5% chance total. Over a bunch of simulations, I’ll get some repeats. Oh well.↩︎
I also ran the simulation with averaging the ratings and calculating the win probabilities with the average ratings. Results were broadly similar.↩︎
Not rounded – literally zero of the 100,000 sims I ran. One discouraging/funny thing I learned is that I could be 200 points higher rated – or basically, get twice as good as I currently am twice – and a team made up entirely of that superhuman wins the Mosconi cup about a half percent of the time. I’m pretty good! Pros are monsters though.↩︎
Nevermind that Dechaine played four tournaments in 2025, none of them at the level of the big international tournaments.↩︎

If you have anything to say about anything in this post, feel free to email the author

Don't fire Styer