| Bates Motel: A Rankings Story

In the preliminary end-of-season USAU rankings released yesterday, one of the most surprising teams was Tufts-B, ranked #34 overall. What the rankings don’t show, is that according to the USAU algorithm, Bates Orange Whip, from Lewiston, Maine, is the best team in the country.

Before I explain how that happened, here are a few relevant points about how the Top 25 algorithm works:

Every sanctioned game between properly-rostered teams is included in the calculation. These are the games highlighted in blue on each team’s Score Reporter page. Teams need 10 sanctioned games to be included in the rankings. If a team’s opponent was not properly rostered, that game still counts towards the 10 games, but the result isn’t included in the calculation. If a team played fewer than 10 sanctioned games, that team still has a rating, which is hidden.
A team’s rating for each game is determined by a combination of their opponent’s rating and the result of the game. A team can earn a maximum of 606 points for a win (and a minimum of -606 for a loss). In any normal-length game, a team earns maximum points for scoring at least twice as many points as their opponent. I’m going to use blowout win and blowout loss to refer specifically to results earning maximum (or minimum) ratings points.
If the initial ratings difference between two teams is greater than 606 points, the higher-rated team would hurt their rating even if they won 15-0. In previous years, these results were included in the ratings calculation. I discussed how this hurt 2011 Whitman in this article from a year ago.
This year, USAU modified the algorithm so that, for all intents and purposes, blowouts that hurt the winning team’s rating (and help the losing team’s rating) are excluded from the calculation.

How to deal with these blowouts is a difficult question. This year, USAU has chosen to calculate ratings in a way that does not punish teams for blowout wins. This seems reasonable, but it also has a couple potentially negative consequences. First, teams no longer have an incentive to play a consistently strong schedule. Second, many results get ignored, which decreases the robustness of the rankings.

The rankings work well for comparing two given teams when the teams have lots head-to-head games and common opponents. For example, comparing Stanford and Texas A&M to determine the last D-I open strength bid, we have one head-to-head result and three common opponents. In this case, we should expect the algorithm to do a reasonably good job.

As the number of games needed to connect two teams increases, the algorithm becomes less effective. Comparing Pacific Lutheran to Truman State for the last D-III strength bid, the shortest path between the two teams is three games. Pacific Lutheran lost 14-12 to St. John’s, who beat North Park 13-7, who lost to Truman State 15-13. A large distance between two teams may not be a problem if there are a variety of paths between the two teams. But when we start ignoring lots of games, we can end up with some questionable outputs.

So now let’s talk about Bates College, the algorithm’s highest rated team in the country. Once we take out the excluded games, the graph of games played looks like this:

Tufts-B’s Evan Ferber. (Photo by Danny Solow)

Bates, Tufts-B, and Kean are only connected to the outside world of the rest of the teams in the country through Tufts-C and Delaware-B. Tufts-C (#218 of 253, rating 444) and Delaware-B (#209, rating 517) seem to be reasonably rated. Kean‘s rating is calculated using only the results against Tufts-C (13-4 win) and Delaware-B (15-10 win). Kean’s loss to Tufts-B doesn’t matter, because that game is Tufts-B’s only connection to the outside world. From the two included games, we get a rating of about 962 for Kean.

Now all of Tufts-B‘s games were blowout wins (ignoring the games again Bates for now). We take only the game against the highest-rated team, which is Kean, and ignore all the rest. The losses to Bates don’t matter, because Tufts-B is the only connection from Bates to the outside world. Tufts-B ends up with a rating of 1561, good for #34 in the country.

Bates won every game they played in a blowout win, so again we only take the best one(s), which are 11-5 and 15-7 wins over Tufts-B. From these, we get a rating of about 2167 for Bates. That’s the best rating in the country, well above #1 Wisconsin’s 2070.

Bates only played 8 sanctioned games, so they aren’t included in the rankings. If they had recorded two more 11-5 wins over the worst B teams they could find, they’d be sitting at #1 in the rankings right now.

Tufts-B in all their glory.

How good are Tufts-B and Bates really?

Tufts-B is one of the best B teams in the country, as can be seen by their results. According to Captain Evan Ferber, “Tufts has a very large and incredibly deep Ultimate Program. Tufts B is one of the strongest B teams in the country because of the depth of the Tufts program and Tufts playing style and play calls.” Captain Kevin Herbard says: “Tufts is a very deep program so our B team usually consists of players who could make A teams at all but the most competitive universities. We have a lot of intelligent players and a lot of athletes who are still polishing their skills.” Using the RRI algorithm, Tufts-B is #147 in the college division, a ranking their captains consider “fairly accurate.”

Bates recorded two big wins over Tufts-B, and is ranked #75 by RRI, almost exactly equal to Bentley (the #2 team in D-III). Captains Connor Abernathy, Fergus Moynihan, Henry Mauck, and David Smith are looking forward to showing what Bates can do in the series:

The Bates team this year is the product of hard-work and team-building that’s gone on for the past two years. Last year, we were fortunate to take on a former Red Tide player, Brian York, as a volunteer coach. Under his generous guidance, the team has risen to the challenge of tougher off-season conditioning and skill-building. This Bates team is the best it’s looked, probably ever. Last year was the first time Bates made it out of the Conference tournament and we’ve only worked harder since. Are we as good as teams like Bentley? We certainly think we can compete against teams of that caliber and we’re out to prove it.

So is the rankings algorithm broken?

Sort of. This year, it appears the rankings worked out well. There aren’t any clearly underserving teams earning Nationals bids for their region. But the examples of Tufts-B and Bates illustrate problems with the algorithm, which actually is exactly what Tufts-B wanted to happen. Ferber explains:

At the beginning of our season we noticed that we could make the main flaw in USAU’s ranking algorithm work to our advantage. By playing almost exclusively teams against whom we were confident we could beat by double their score, we played the system and caused ourselves to be ranked ridiculously high.

We weren’t trying to gain anything personally, or for our team, by gaming the system, so therefore we think it might help to mention that we did this intentionally . Yes, it most likely earned us a bid to New England D1 Regionals, but more importantly: We are hoping that it will exemplify how flawed the ranking algorithm is. If a nationally competitive team had tried to do what we did this season, they could have potentially earned their region an extra bid to nationals by playing in less competitive tournaments and 606-ing every team.

Should we go back to the way the rankings were calculated in previous years? Keep it the way it is now? Or is there some alternative that solves all our problems.

Addendum: Just for fun, here’s the shortest path I could find between Bates and Wisconsin (using only non-excluded games): Bates beat Tufts-B 15-7, who beat Kean 15-7, who beat Delaware-B 15-10, who lost to Georgetown-B 9-7, who beat East Strousburg 13-11, who lost to Messiah 13-8, who lost to Pennsylvania 15-10, who lost to UNC-Wilmington 13-8, who lost to Whitman 13-7, who beat Wisconsin 13-10.

Feature photo of Tufts’ Ariel Rascoe by Danny Solow