An Interpretation and Critique of the USAU Ranking Algorithm
There has been quite a lot of discussion thus far about the USAU ranking algorithm, yet I don’t feel the algorithm has been broken down in a transparent everyman’s way. I set out to do this below, followed by suggesting a solution to what I consider to be the most important flaw in the current algorithm (Nerds: the concavity of the 400/x element). I’ll finish with some further insights and issues that have been brought up in the past.
I should probably qualify myself before starting. I care about the algorithm because I’m a co-captain of Sub Zero and have an interest in improving/understanding everything related to our sport. Our 16th-place ranking on August 29th influenced me to figure out how the rankings actually work. I’m not trying to manipulate things to favor Sub Zero, which really wouldn’t even be possible since there is a 0% chance anything I say here will change this year’s rankings. I don’t think anything should be changed this year anyway.
Away from the field, I am a 4th year Biostatistics PhD student at the University of Minnesota and I recently passed my qualifying exam, so I don’t entirely suck at this stuff. A large component of my education is basically mathematical (or statistical) interpretation, which is what I’m trying to do here. I’m not an expert on ranking algorithms, and before this new device was put into place, I hadn’t really thought at all about how to develop a ranking. Over Labor Day weekend I spent a lot of time thinking about the algorithm and these are my insights.
USAU Rankings Algorithm Breakdown
I want to begin by saying the current algorithm does a fine job: it spits out quite reasonable rankings and I think our sport is all the better for it. Size bids and growth bids make for a less competitive nationals, so current algorithm is better than what we had before. That being said, the current algorithm has been aptly criticized on this site and is definitely in need of some improvements. The improvements I suggest here would not drastically shift the rankings, but would likely flip a handful of teams and at the extreme move a team a few spots up or down. But when the difference between 17th and 16th is so important, these minor shifts can really be meaningful.
So how does the current algorithm really work and what is wrong/controversial about it? We have probably all read the details on the USAU site, but recall that the algorithm uses the following construction: game_PR = opp_PR 400/x, where x=max(2/3,2.5*(ls/ws)2), ls and ws denote the losing and winning score, respectively, and the last element is added for the winner and subtracted for the loser. Your PR is then a weighted average of all the individual game_PRscores. I’m not going to go into the weighting scheme here, I’ll stick to the equation above.
The opp_PR piece allows your ranking to reflect your strength of schedule, a component that isn’t going to change much once the game begins. Yes, the worse you beat a team, the lower their PR will get and vice versa for a loss, but this change is averaged over all their other games and as the number of games accumulate, your current game matters less and less to your opponent’s PR. Figuring out exactly how much your current game will change your opponent’s PR can be done, but because of the weighting it gets convoluted pretty quick.
Thinking about it simply, if you beat a team in their 10th game of the season and they get the maximum 600 subtracted from their game_PR, this 600 is averaged over those ten games and their PR will drop by about 60 points, so you’ve effectively only gained 540 points for that one win. The point is that the opp_PRreflects strength of schedule, is relatively constant in the current game but slightly reduces your gross gain/loss.
Lastly, this component is what moves teams around when they are at home practicing: if a team’s collective strength of schedule goes up, they too will go up and vice versa. So when you are at home, root for the teams you’ve played and root against those you haven’t. This is an essential component to the algorithm, but once you’ve decided on your schedule for the year, you have very little control over it.
Once a game has begun, the 400/x piece is what your team can control. We can write this algorithm generally as game_PR = opp_PR PR_pts, calling the latter component of the algorithm the PR_ptscomponent. This component is responsible for distributing PR_ptsbased on the who won, what that winning score was and the margin of victory.
This component can have any form, though some are more sensible than other. In the current construction, the 400/xpiece is tough to understand on it’s own, but if you just start plugging in possible game scores it’s mechanics become quite clear. We have the luxury of knowing that the score at the end of a frisbee game will be one of a relatively small discrete set of pairs. So lets just look at how the PR_ptsare distributed across the likely game scores under the current construction:
So if your team wins 15-12, you get 250 PR points!! Since x=max(2/3,2.5*(ls/ws)2), the 2/3is what caps the PR_ptsat 600 (400*3/2 = 600). This is what I refer to a the mercy threshold, once a team has a enough breaks, any additional breaks don’t get them any additional gain in the rankings. This is one of my critiques of the current construction: the mercy threshold is saying a team that loses 15-7 is equivalent to a team that loses 15-0. This is a loss of information and since we only have about 15 games from which to derive rankings, we want to minimize any losses of information.
Another thing to note in the above table is that winning on universe is always worth about 185 PR points. The last thing to note, for a given winning score, the marginal value of a break is increasing as the game gets less close until the mercy threshold kicks in. This pops out if we look directly at how the marginal value of a break changes:
Looking at a winning score of 15, we see that a team would gain 29.34 PR points by winning by 2 instead 1. Notice how these marginal values are increasing as the margin of victory increases, which means that the value of breaks is increasing as the game gets less competitive. I find this to be counter-intuitive and against the way our sport actually functions. In reality, breaks when games are closer are more valuable and the larger your lead, the less valuable another break becomes.
What is absurd about the above table is that you’ll notice there are ‘super’ breaks sitting out there right before the 600 threshold kicks in. The break that pushes a game to 15-8 rather than 15-9 is worth an astonishing 118.06 PR points. Recall, the break that prevents universe point is only worth 29.34 PR points (~25% of the value). Then, because of the mercy threshold, the marginal value of a break drops sharply. For instance, the break that pushes a game to 15-7 versus 15-8 is only worth 37.5 PR points (~33% of the ‘super’ break’s value). There is no possible rationale for this value structure. If a team were smart, they would throw out a universe point line whenever they have the opportunity or are in jeopardy of getting the ‘super’ break. Recall that the break that wins you the game is only worth about ~185 PR points. Getting ‘super broken’ wipes out ~66% of the value a previous universe point win.
The above table is what I find to be the most glaring issue with the current algorithm. Under the current algorithm, the most important thing a team can do is win games…which is what we want. But under the current algorithm the solid wins (e.g., 15-14 through about 15-9) are severely undervalued. The most undervalued games are the 2 point wins, wins that are hard fought and often the best of the weekend. The flip side is the close losses are not as detrimental to a team, but looking at the loss side of things it has a more emotional interpretation to say the 15-8 loss is unreasonable punished. Thankfully, as I mentioned before, the PR_ptscomponent can have any form so we can easily fix it.
My Suggested Fix
I take a hybrid approach. I assign universe point wins in regulation or any game to <15 points a 200 PR point value. This slightly increases the value of winning relative to the current algorithm. For games to 15 points that go to overtime, I give slightly less of a bonus because I believe a long game to 15 that goes to OT shows that the two teams are very evenly matched. As such, I drop the value of an OT win. I also stick with the maximum of 600 PR points, but instead of using a mercy threshold, which is a blatant loss of information, I distribute the 400 PR points so that the value of each additional break is decreasing by a constant 20%.
Mathematically, we can redefine the PR_ptscomponent for a game that ends in regulation as, PR_pts= UnivseWinPts+(TotalPts-UnivseWinPts)*(1-p-(1-p)diff)/(1-p-(1-p)ws), where UnivseWinPtsis the PR point value of a universe point win, TotalPtsis the total PR points to be distributed or the maximum number of possible PR points in one game, diffis the margin of victory and wsstill denotes the winning score. To complete the allocation, we assign reasonable amounts of PR points for a game to 15 that ends in OT. The pcomponent is the percent by which the value of and additional break is decreasing. This formula and its properties are derived from a truncated Geometric(p)distribution. The fraction involving p is given by (F(diff-1) – F(0))/(F(ws-1) – F(0)where F(x)is the cumulative distribution function of a Geometric(p) random variable. Other nice allocation schemes could be derived in the same manner by switching the distribution.
If we want the scale of the PR rankings to stay the same, we should assign 600 total points and to keep the value of a universe point win similar we give 200 points. Lastly, letting the value of additional breaks decrease by 20% we get the following fix for the current PR_ptsconstruction: PR_pts= 200 + 400*(.8-.8diff)/(.8-.8ws).
Enough math. Let’s look at the points allotted for various final scores and the marginal gain for each break under my suggested construction.
I somewhat arbitrarily set the value of OT wins, but I think they are very reasonable. We see that my construction can still distinguish between a team that gets beat 15-6 and a team that gets beat 15-0. There are not abrupt changes anymore and this is in line with reality. Lets look at the marginal values of breaks:
Each break is worth 20% less than the last, and if you want faster decay in value just increase p. There are no ‘super breaks’ and solid wins are no longer undervalued. This is easier to understand too.
These changes would take very little time for USAU to make and they wouldn’t in any way sacrifice the work they’ve already done on the algorithm. These changes wouldn’t drastically change the rankings but I think they would give slightly better rankings. I’ll prove this by running my proposed algorithm and the current algorithm on the Labor Day results from this past weekend:
Algorithm Comparisons on Labor Day Results
|Teams||My PR||My Rank||USAU PR||USAU Rank|
Don’t pay too much attention to the actual PR values. You shouldn’t compare these values across different algorithms and especially not to the current PR’s in the USAU site. What you can pay attention to is the actual ranking and the distance between teams. Regarding the ranking, my algorithm and the current USAU algorithm agree quite closely, which is encouraging. However, my algorithm ranks Chain above Rhino and in the USAU algorithm has them flipped. This flip occurs because Chain had a bunch of 15-11 wins, which are undervalued by the current algorithm; Rhino had a couple more wins that hit the ‘super break’ with lots of value relative to the breaks before it. Chain also beat Rhino 15-9, so in this case I’d argue my algorithm is outperforming the current one.
Another thing to note is in the current algorithm: Sockeye and Doublewide are essentially indistinguishable. In my algorithm Sockeye has a wider lead over Doublewide. This happens for the same reason cited above, Sockeye’s wins are in the 15-11 range, which are undervalued whereas Doublewide had a couple more wins where they hit the ‘super’ break to boost their rating. Again, I think my algorithm has a better outcome here with Sockeye clearly leading Doublewide in the rankings.
Random Other Issues and Insights
I’ll finish with a quick suggestion for forfeits, convergence, and communication lines. I suggest a team that forfeits gets 606’d as the lingo goes (though it is really 600’d). The winning team gets the maximum benefit and the forfeitures get maximum pain. Under my proposed changes, all a team would need to do is score a single point and they would be better off not forfeiting, and even if they don’t score a goal the outcome is the same. I doubt anyone would forfeit a sanctioned game again under my system.
Convergence of the algorithms is not terribly concerning when you consider the common round robin approach to tournament formats. What is a little concerning is the rumor that USAU only runs 20 iterations. My algorithm did converge (as well as the USAU algorithm) in 18 iterations on the Labor Day results, but there is no reason that the algorithm shouldn’t be run to convergence. Convergence means there is no substantial change in the rankings on subsequent iterations. This can be done by running the algorithm until the absolute sum of the changes over an iteration are <1 or some other small value. I’m not too concerned about this as 20 iterations seems to be reasonable, and I’d bet that USAU is running this thing until convergence anyway.
The last issue that could potentially cause problems is the lack of communication lines between cohorts of teams. For simplicity, lets say a cohort of tier 2 teams never plays a sanctioned game against a team that has played a sanctioned game against a tier 1 team (or a game against a team that played a gam against a team who played a tier 1 team, and so on). In this case, the algorithm has no way to distinguish the relative strength of the two cohorts. This is a possibility, though right now nearly all the sanctioned tournaments draw a couple of tier 1 teams. Though if sanctioned tournaments become more widespread a tier requirement might be in line, otherwise the top teams would need to make sure and get some games against the lower teams to prove their superiority.
The other potential issue, which lingered in the college series for a few weeks (I call it the Michigan State effect), is if cohort 1 and cohort 2 only have 1 game connecting them (or only 1 line of communication) and that game/communication line has an upset (e.g., Michigan State over Oregon), the algorithm is going to push the winning cohort above the losing. This is what it should do, but if there are few communication lines between cohorts of teams some funky stuff can result from an unlikely upset. Because of these two potential issues, I think more and more sanctioning is the best way to go and teams should try/be required to attend two levels of tournaments as this happens.
I’d love to get feedback on some of this stuff and keep the discussion going. I hope you have a better understanding of the mechanics of the USAU algorithm. Don’t be too angry with it, it works alright as it stands and I believe it is a step in the right direction. These changes I suggest here would continue to improve the system and I hope they are well received.