Rankings: Under the Hood

by | March 8, 2012, 4:57am 0

This is the first piece in a multi-part series that will look in-depth at the way the rankings are built, examine some specific examples, look at the implications of the new system and make recommendations for improvement.  This article wouldn’t be possible without the generous help of Sholom Simon, the creator and caretaker of the ranking algorithm.  (Note: As of 2011, the ranking algorithm is housed with USAU.)

The discussion and the math that follow at times can be complicated and there are aspects of it, notably the effects of the iteration and time of entry, that still aren’t entirely clear to me.  However, it is important to know that overall, I think the rankings work.  To me, they pass the eyeball test – they look right.  All right, lets get started.

Two pieces, but really four

Mainly based on two pieces, each game you play earns you points.  All the games you play are then averaged together to determine your ranking.  The two pieces are the ranking of the team you play and the bonus points you earn (or lose) based on the score.  The rankings then change in a couple of different ways.  First, scores decay.  Older results count for less than more recent ones.  Secondly, the algorithm cranks the numbers iteratively.  What this means is that averages are determined with a particular set of scores, then again with the new scores, then again with the new scores.  This is repeated 20 times, although typically they change very little after 10 iterations.

It’s all who you play

When you play another team, you automatically get the points of that team’s ranking.  When Wisconsin stepped on the field to play Santa Barbara on Saturday morning at Stanford Invite, Wisconsin earned 1387 points, the amount of Black Tide’s ranking.  Tide earned Wisco’s 1736.  These points are allocated even before the first point is played.  This is the part of the rankings that has always seemed the weirdest and most unjust.  There are two big mitigating factors here.  First, the role that point differential makes in a game is huge.  Second, this usage of team ranking builds in a reward for doing well.  The better you play and the farther you go in a tournament, the better your opponents are and the higher their rankings.  This effectively builds in a reward for quarters, semis and finals.

Short version: You want to play highly ranked teams, because their rankings add into yours.

Just win, baby!

After a game is finished, you earn (or lose) points based on the ratio of points scored.  The formula is 400/x where x is either 0.66 or 2.5*(losing score/winning score)^2, whichever is bigger.  Since x is the denominator of the fraction, the larger it is the more points you will earn (or lose).  Also, since the points are figured via ratio, a 5-4 game scores the same as a 10-8 or 15-12.  On a practical level, the 0.66 maximum will trigger any time the point differential gets to 2:1 or greater.  So 15-7 scores the same as 15-3 and 15-0; all trigger the 0.66 maximum.  How many points is that?  606!  Since that is +606 for the winner and -606 for the loser, a blow out is a 1212 point swing!  (“Hey bro, how was your game?” “Dude, it sucked.  We got 606ed.”)  How much is winning worth?  Even the closest possible game, 17-16, comes out to 180 points each team, which is a 360 point swing.

Short version: Win.  Losing sucks and it costs you lots of points.

Time heals all wounds

Your games aren’t all averaged in evenly because the algorithm uses a weighted average.  Based on how long ago they were, games count less.  This is most easily explained by an example; since I was going to figure it out for Oregon anyway, that is the team I will use.  This year Fugue is going to go to three sanctioned tournaments: Corvegas, President’s Day and Stanford Invite.  (I made a couple of simplifications for ease of calculations, but I don’t think they diminish the essential point.  The first simplification is to assume that each tournament happened on a single day.  In reality, Sunday games count slightly more than Saturday games.  The second was to run the calculations based on the end of the regular season, April 1st.  The rankings probably won’t be run until Tuesday the 7th.  To be fair, I didn’t ask.)

The formula gives Corvegas a weight of 0.42, President’s Day a weight of 0.48 and Stanford Invite a weight of 0.56.  This yields percentages of: Corvegas 29%, Pres Day 33% and Stanford 39%.  One of the big decisions Fugue made was to not attend Centex.  (It was a money issue.)  Centex is only 7 days before the rankings run, so it has a weight of 0.83.  This would change the percentages to: Corvegas 19%, Pres Day 21%, Stanford 24.5% and Centex 36.5%.

What can you take away from this?  In the no-Centex case, even weighting would yield percentages of 33% for each tournament.  29%, 33% and 39% aren’t that far off.  In the Centex case, you would expect 25% a tourney and 19%, 21%, 24% aren’t that far off.  Centex’s 36.5% is a pretty big part, but only in comparison as it is smaller than Stanford Invite in the first, no-Centex scenario.

Short version: Games decay, but not that much.  How many games you play has a much bigger impact than when you play them.

 Run it again.  Run it again.  Run it again.  Run it again.  Run it again.

Each week the rankings are run, they are iterated 20 times.  The first round of new rankings is calculated using the existing rankings.  This generates new rankings.  These rankings are then used to generate a second round of new rankings.  And a third and a fourth and a fifth…twenty times.  Typically, after ten iterations, most team’s rankings have settled onto an integer value (which is reported) and are only changing at smaller and smaller decimal values.

The iterative function serves to bind all the pieces together, so that even though CUT hasn’t played Ego yet, both have played Wisconsin and Pitt and these games help hold all the teams in place.  As more and more teams play more and more games, this single transitive thread broadens into a vast web connecting and tying all the teams in order.

In conversation with Sholom, he made clear there is still some mystery to this very complex process.  Even after 20 years of official rankings and simulations, it is impossible to predict exactly how things will behave .  For example, it is unclear how much effect early games have.  Do early wins cast ripples throughout the system; ripples that are slightly higher?  Will echos of these ripples remain even after many iterations?  In theory, they should; in practice, it is very difficult to tell.  There are some super-nerdy potential problems with any iterative process: strange attractors and divergence.  Sholom said he had never seen a strange attractor appear in the rankings, but that without the dampening effect of the min and max functions (look at the formula) divergence would be a possibility.  With dampeners, it doesn’t appear to be an issue.

Short version: The iterative routine mysteriously binds all the results together.

Up next

Tomorrow, Bryan Jones is going to hit you up with some detailed analysis using specific teams from 2011.

Comments Policy: At Skyd, we value all legitimate contributions to the discussion of ultimate. However, please ensure your input is respectful. Hateful, slanderous, or disrespectful comments will be deleted. For grammatical, factual, and typographic errors, instead of leaving a comment, please e-mail our editors directly at editors [at] skydmagazine.com.