Thursday, March 20, 2014

"Doing Math": Biz's Bracketology

I will preface this post by saying that the last time I watched a basketball game for more than ten minutes was when I was 11, with my grandma. I have never been interested in basketball (my family is of short, Polish stock), and have only the faintest idea of what March Madness is. But, I have a competitive nature, and I love applying my math and stats skills and my intuition. So, this year I decided to jump on the bracket bandwagon.
            To determine my bracket I used various sources:
·         FiveThirtyEight’s NCAA Tournament Predictions (accessible at
I relied heavily on this tool, more so than seed, because of the breadth that it covers (from the pre-season rankings, various professionals’ estimates, the power rankings, player injuries, and geography, among many other factors).
·         The Washington Post’s list that contains the number of upsets per round each year (accessible at
This helped me determine how many upsets I should include in my bracket. Some upsets were already statistically determined by FiveThirtyEight’s tool, but others I decided for myself.
·         The tournament results from the past five years
·         Seed
·         My own (slight) biases
Mostly based on living in Michigan, my love for Oregon, and the graduate schools I applied to.
From the past five years I noticed that, usually, around one team from the previous year’s Final Four makes it to the current year’s Final Four. In three out of the last five instances, that team won. In the following table, the underlined teams were in the Final Four the year before:
1 North Carolina (won)
1 Duke (won)
3 Connecticut (won)
1 Kentucky (won)
1 Louisville (won)
2 Michigan State
5 Michigan State
11 Virginia Commonwealth
4 Louisville
4 Michigan
1 Connecticut
5 Butler
8 Butler
2 Ohio State
4 Syracuse
3 Villanova
2 West Virginia
4 Kentucky
2 Kansas
9 Wichita State

This year Louisville, Michigan, and Wichita State are all in the same region, so only one of them can go to the Final Four. Louisville has, from what I’ve read, been mis-seeded because of their easy schedule during the year, and according to FiveThirtyEight they have a 38% chance at getting to the Final Four (against Wichita State and Michigan’s 14%), so I decided to include them. I have Ohio State beating Syracuse in the second round as one of my second round upsets, thus they can’t make it to the Final Four.
            Additionally, as you can see from the table, each time a team has shown up at the Final Four in two consecutive years, it does as well or better than the second year than the first. Thus, although Michigan State and Louisville are seeded the same, and Michigan State does have Tom Izzo, I decided to have Louisville going to the Semi-finals (also, on FiveThirtyEight it shows that Louisville has a much better chance at going to the Semi-finals than State).
            That is as far as I took the insights that I gained from the last fine year’s Final Four history—instead, I took into account Florida’s place as a #1 seed as well as the fact that no team has won the title two years in a row since the 1980s, and gave Louisville a respectable home in second place.
            I used the number of upsets per round over the past few years to guide the number of upsets I would have per round.
Number of upsets
            Since 2009, the number of upsets in round 1 has been 10 four times and 7 once. When FiveThirtyEight projected that a lower seed would beat a higher seed, I would put that in as one of my upsets. My biases accounted for my putting NC State (12) beating Saint Louis (5), even though FiveThirtyEight projects otherwise (I’ll be attending NC State in the fall to obtain a masters in Data Analytics), Harvard ahead of Cincinnati (Harvard has done well the past few years, I looked into applying at Harvard’s statistics program, and FiveThirtyEight has Harvard winning a 42% chance and Cincinnati a 58%--relatively close), George Washington beating Memphis (FTE estimates 55/45 Memphis—but I applied to GW’s Data Analytics program and was accepted), and Stanford over New Mexico (Stanford’s stats program is ranked #1 in the US).
            The number of upsets in round 2 has been between 1 and 6 over the past five years, though mostly the number of upsets is in the higher range. I chose Ohio State over Syracuse almost arbitrarily, but more because I’ve heard of Ohio State and I haven’t heard much about Syracuse. Also, I needed an upset, and FTE estimated that Ohio State has a 40% chance of winning that round verses Syracuse’s 50%--not too much of a difference. So, Ohio State is my upset for this round. Oregon is my other upset, mostly because my family bought a house there this year. I visited Oregon over winter break with my family, and fell in love. I don’t have high hopes for them, but I thought that maybe I could send enough positive vibes their way for them to beat number 2-seeded Wisconsin.
            In round 3 I have a few more upsets than I normally would like—I have three, while the range over the past five years has been between 2 and 3, with the mode being 2. I already explained why I think Louisville will go to the finals—they are one of my upsets. FTE accounted for my putting Duke to beat Michigan, and my home-state bias (as well as Izzo) helped me choose Michigan State as beating Virginia.
            For the Final Four game I have 1 upset—the only time there has been an upset in the last 5 years was in 2009. But I see Louisville as a strong team, and FTE has them at a 1% chance of getting to the finals over Arizona, so I chose to have them beating Arizona.   

            Overall, I have enjoyed making my bracket. I have never thought of myself as wanting to go into sports statistics, but now I can see how people are drawn to it. I can definitely see myself getting into watching sports more often if I view the opportunity as a competition or for me to use my statistical abilities.

1 comment:

  1. Fun, and good exposition of your process. I feel like I can ask you this question due to your statistical expertise: how strong is past performance of seeds relevant to making predictions? Or is the presumed similarity between the #5's (for example) this year and in previous years enough to justify using past results? How could you test whether past tournaments should be taken into consideration?