I will preface this post by saying that the last
time I watched a basketball game for more than ten minutes was when I was 11,
with my grandma. I have never been interested in basketball (my family is of
short, Polish stock), and have only the faintest idea of what March Madness is.
But, I have a competitive nature, and I love applying my math and stats skills
and my intuition. So, this year I decided to jump on the bracket bandwagon.
To determine
my bracket I used various sources:
·
FiveThirtyEight’s NCAA Tournament
Predictions (accessible at http://fivethirtyeight.com/interactives/march-madness-predictions/)
I relied heavily on this tool, more
so than seed, because of the breadth that it covers (from the pre-season
rankings, various professionals’ estimates, the power rankings, player
injuries, and geography, among many other factors).
·
The Washington Post’s list that contains
the number of upsets per round each year (accessible at http://www.washingtonpost.com/wp-srv/special/sports/ncaa-march-madness-bracket-guide/)
This helped me determine how many
upsets I should include in my bracket. Some upsets were already statistically
determined by FiveThirtyEight’s tool, but others I decided for myself.
·
The tournament results from the past five
years
·
Seed
·
My own (slight) biases
Mostly based on living in Michigan,
my love for Oregon, and the graduate schools I applied to.
From the past five years I noticed that, usually,
around one team from the previous year’s Final Four makes it to the current year’s
Final Four. In three out of the last five instances, that team won. In the
following table, the underlined teams were in the Final Four the year before:
2009
|
2010
|
2011
|
2012
|
2013
|
1
North Carolina (won)
|
1
Duke (won)
|
3
Connecticut (won)
|
1
Kentucky (won)
|
1
Louisville (won)
|
2
Michigan State
|
5
Michigan State
|
11
Virginia Commonwealth
|
4
Louisville
|
4
Michigan
|
1
Connecticut
|
5
Butler
|
8
Butler
|
2
Ohio State
|
4
Syracuse
|
3
Villanova
|
2
West Virginia
|
4
Kentucky
|
2
Kansas
|
9
Wichita State
|
This year Louisville,
Michigan, and Wichita State are all in the same region, so only one of them can
go to the Final Four. Louisville has, from what I’ve read, been mis-seeded
because of their easy schedule during the year, and according to
FiveThirtyEight they have a 38% chance at getting to the Final Four (against
Wichita State and Michigan’s 14%), so I decided to include them. I have Ohio
State beating Syracuse in the second round as one of my second round upsets,
thus they can’t make it to the Final Four.
Additionally,
as you can see from the table, each time a team has shown up at the Final Four
in two consecutive years, it does as well or better than the second year than
the first. Thus, although Michigan State and Louisville are seeded the same,
and Michigan State does have Tom Izzo, I decided to have Louisville going to
the Semi-finals (also, on FiveThirtyEight it shows that Louisville has a much
better chance at going to the Semi-finals than State).
That
is as far as I took the insights that I gained from the last fine year’s Final
Four history—instead, I took into account Florida’s place as a #1 seed as well
as the fact that no team has won the title two years in a row since the 1980s,
and gave Louisville a respectable home in second place.
I
used the number of upsets per round over the past few years to guide the number
of upsets I would have per round.
Round
|
1
|
2
|
3
|
4
|
5
|
Number
of upsets
|
7
|
2
|
3
|
2
|
1
|
Since
2009, the number of upsets in round 1 has been 10 four times and 7 once. When
FiveThirtyEight projected that a lower seed would beat a higher seed, I would
put that in as one of my upsets. My biases accounted for my putting NC State
(12) beating Saint Louis (5), even though FiveThirtyEight projects otherwise (I’ll
be attending NC State in the fall to obtain a masters in Data Analytics),
Harvard ahead of Cincinnati (Harvard has done well the past few years, I looked
into applying at Harvard’s statistics program, and FiveThirtyEight has Harvard
winning a 42% chance and Cincinnati a 58%--relatively close), George Washington
beating Memphis (FTE estimates 55/45 Memphis—but I applied to GW’s Data
Analytics program and was accepted), and Stanford over New Mexico (Stanford’s
stats program is ranked #1 in the US).
The
number of upsets in round 2 has been between 1 and 6 over the past five years,
though mostly the number of upsets is in the higher range. I chose Ohio State
over Syracuse almost arbitrarily, but more because I’ve heard of Ohio State and
I haven’t heard much about Syracuse. Also, I needed an upset, and FTE estimated
that Ohio State has a 40% chance of winning that round verses Syracuse’s
50%--not too much of a difference. So, Ohio State is my upset for this round.
Oregon is my other upset, mostly because my family bought a house there this
year. I visited Oregon over winter break with my family, and fell in love. I
don’t have high hopes for them, but I thought that maybe I could send enough positive
vibes their way for them to beat number 2-seeded Wisconsin.
In
round 3 I have a few more upsets than I normally would like—I have three, while
the range over the past five years has been between 2 and 3, with the mode
being 2. I already explained why I think Louisville will go to the finals—they are
one of my upsets. FTE accounted for my putting Duke to beat Michigan, and my
home-state bias (as well as Izzo) helped me choose Michigan State as beating
Virginia.
For
the Final Four game I have 1 upset—the only time there has been an upset in the
last 5 years was in 2009. But I see Louisville as a strong team, and FTE has
them at a 1% chance of getting to the finals over Arizona, so I chose to have
them beating Arizona.
Overall,
I have enjoyed making my bracket. I have never thought of myself as wanting to
go into sports statistics, but now I can see how people are drawn to it. I can
definitely see myself getting into watching sports more often if I view the
opportunity as a competition or for me to use my statistical abilities.
Fun, and good exposition of your process. I feel like I can ask you this question due to your statistical expertise: how strong is past performance of seeds relevant to making predictions? Or is the presumed similarity between the #5's (for example) this year and in previous years enough to justify using past results? How could you test whether past tournaments should be taken into consideration?
ReplyDelete