CoCalc Public FilesProjects / Sports Ranking Project / Sports Ranking Project.ipynb
Views : 51
In [ ]:
%load_ext sage
pretty_print_default(True)
latex.matrix_delimiters("[", "]")

In [ ]:



## Names,Yihan Wu, Hanwen Zhang, Jackie Crowley, Cole Meier

List all group member names here.

# Sports Ranking Project

This project concerns the use of linear algebra to rate sports teams. Before you begin, you should read the paper A Mathematical Rating System by Roland Minton (you may skip the proofs of Theorems 1 and 2; you will find a link to this paper on the course Moodle page). Let's use Sage to work through the first point-spread ratings example. We will follow the convention established in the paper: our data will be presented as an $n \times (n+4)$ matrix, where the first $n$ columns contain the schedule data and the last four columns contain wins, losses, points for, and points against. Evaluate the next cell.

In [ ]:


In [3]:
RecordExample = matrix(QQ, [
[0, 2, 0, 2, 4, 0, 60, 28],
[2, 0, 2, 0, 2, 2, 50, 50],
[0, 2, 0, 2, 2, 2, 50, 50],
[2, 0, 2, 0, 0, 4, 28, 60], ])

RecordExample


[ 0 2 0 2 4 0 60 28] [ 2 0 2 0 2 2 50 50] [ 0 2 0 2 2 2 50 50] [ 2 0 2 0 0 4 28 60]

Next, we'll create the ratings matrix. You'll need to carefully review the paper to understand what's going on here. When you execute the cell below, the ratings matrix (RM) from the paper will be created. When you execute the subsequent cell, the ratings matrix will be displayed.

You need to carefully read through the cell below so that you will be able to generalize it in part A of the assignment (see below). After you read through the next cell, you should evaluate it.

In [ ]:


In [4]:
# First, let's create a 4 x 5 matrix of zeros. This will eventually be our ratings matrix (RM);
# the last column will contain points for (PF) minus points against (PA).

RM = matrix(QQ, 4, 5)

# We will fill in the ratings matrix RM one row at a time, left to right and top to bottom.
# Recall that indentation is key.

for r in [0..3]: # Since there are four rows, the row indices go from 0 to 3.
for c in [0..4]: # Since there are five columns, the column indices go from 0 to 4

# If col < 4, then we're creating the portion of the ratings matrix that is determined
# by the schedule (the first four columns of RecordExample). To fill in the entries,
# I've divided it into three cases below. In the first case, you're working on the main diagonal,
# and in the second case you're working off the main diagonal. The last case is for filling in
# the column with points for minus points against.

if c < 4 and r == c:
# In the below line, since r == c is true, we are working on the main diagonal. The entry
# of the ratings matrix should therefore be the number of games played by the team corresponding
# to the current column. The total number of games played by this team is the sum of the entries in
# the current column.
RM[r, c] = sum(RecordExample.columns()[c])
elif c < 4 and r != c:
# In the line below, we are filling in the off-diagonal entries in the portion of the ratings
# matrix determined by the schedule. In Minton's paper, you'll see that we need only negate
# the corresponding entries in the record matrix.
RM[r, c] = -1*RecordExample[r, c]
elif c == 4:
# This case corresponds to filling in the last column.
# On the below line, we are computing points for minus points against. Points for are in the column
# with index 6 (the next-to-last column), and points against are in the column with index 7 (the
# last column).
RM[r, c] = RecordExample[r, 6] - RecordExample[r, 7]


Execute the next line and you'll see the ratings matrix from the paper.

In [10]:
RM

--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-10-6ac153f4c672> in <module>() ----> 1 RM NameError: name 'RM' is not defined 

In the cell below, Sage puts the matrix into reduced row echelon form. As in the article, we find that the point-spread rankings are 12, 8, 4, and 0.

In [6]:
RM.rref()

[ 1 0 0 -1 12] [ 0 1 0 -1 8] [ 0 0 1 -1 4] [ 0 0 0 0 0]

A. Next, I want you to create a function RatingsMatrix(Record) whose input, Record, is a matrix that contains the schedule, wins, losses, points for, and points against (formatted as in the example above). This function should compute a ratings matrix augmented by two columns: wins minus losses (in the next-to-last column) and points for minus points against (in the last column). Finally, your function should return the ratings matrix. I have provided a template for your function below. Your function should work for a Record of any size (i.e., any number of teams).

In [8]:
def RatingsMatrix(Record):
# In the above example, the number of teams was just 4, but here it depends on the given Record. So let's
# start by nailing down the number of teams. The number of teams is the same as the number of rows
# in the matrix "Record". To find the number of rows in a matrix, see the Introduction to Sage notebook.

# Create a NumTeams x (NumTeams + 2) matrix over QQ with all zero entries. We have "+2" since we need
# two extra columns: one for wins minus losses and one for points for minus points against.

# Now, fill in the ratings matrix RM, one row at a time, using the data from the matrix Record. Use part
# A as a model. Use the "if" and first "elif" to fill in the portion of the ratings matrix determined
# by the schedule. Use the second "elif" to fill in the column containing wins minus losses. Use the last
# "elif" to fill in the column containing points for minus points against. Replace "#YOUR CODE HERE#" with
# your own code, but do not delete the colons! Maintain the given indentation.

for r in [0..NumTeams-1] :
for c in [0..NumTeams+1]:
if c < NumTeams and r == c:
RM[r, c] = sum(Record.column(c))

elif c < NumTeams and r != c:
RM[r,c] = -1*Record[r,c]
elif c == NumTeams+1:
RM[r,c] = Record[r, Record.ncols()-2] - Record[r,Record.ncols()-1] # sage line 57
elif c==NumTeams:
RM[r, c] = Record[r,Record.ncols()-4]-Record[r, Record.ncols()-3]

return RM


In [ ]:



Test your function by evaluating the line below. It should produce the win ratings and point-spread ratings in the last two columns.

In [9]:
RatingsMatrix(RecordExample).rref()


[ 1 0 0 -1 3/2 12] [ 0 1 0 -1 1 8] [ 0 0 1 -1 1/2 4] [ 0 0 0 0 0 0]

Here are two more records to test your code against. Simply execute the following cell to see whether you pass each test.

In [18]:
Test1 = matrix(QQ, [
[0, 19, 19, 19, 19, 6, 7, 6, 7, 7, 6, 7, 7, 6, 7, 85, 57, 608, 463],
[19, 0, 19, 19, 19, 7, 7, 6, 6, 6, 6, 7, 7, 7, 7, 53, 89, 444, 510],
[19, 19, 0, 19, 19, 6, 6, 7, 7, 7, 7, 7, 6, 7, 6, 63, 79, 546, 608],
[19, 19, 19, 0, 19, 6, 6, 7, 7, 7, 7, 7, 7, 6, 6, 66, 76, 547, 628],
[19, 19, 19, 19, 0, 7, 7, 7, 7, 6, 6, 7, 6, 7, 6, 75, 67, 564, 536],
[6, 7, 6, 6, 7, 0, 19, 19, 19, 19, 7, 6, 7, 7, 7, 53, 86, 487, 613],
[7, 7, 6, 6, 7, 19, 0, 19, 19, 19, 7, 6, 7, 6, 7, 79, 63, 614, 516],
[6, 6, 7, 7, 7, 19, 19, 0, 19, 19, 7, 6, 6, 7, 7, 68, 74, 566, 590],
[7, 6, 7, 7, 7, 19, 19, 19, 0, 19, 6, 7, 7, 7, 7, 79, 63, 555, 526],
[7, 6, 7, 7, 6, 19, 19, 19, 19, 0, 7, 7, 7, 6, 6, 87, 55, 563, 521],
[6, 6, 7, 7, 6, 7, 7, 7, 6, 7, 0, 19, 19, 19, 19, 70, 72, 600, 603],
[7, 7, 7, 7, 7, 6, 6, 6, 7, 7, 19, 0, 19, 19, 19, 69, 73, 636, 652],
[7, 7, 6, 7, 6, 7, 7, 6, 7, 7, 19, 19, 0, 19, 19, 80, 62, 550, 503],
[6, 7, 7, 6, 7, 7, 6, 7, 7, 6, 19, 19, 19, 0, 19, 66, 71, 552, 608],
[7, 7, 6, 6, 6, 7, 7, 7, 7, 6, 19, 19, 19, 19, 0, 70, 76, 632, 587]])

Test1_Output = matrix(QQ, [
[142,  -19,  -19,  -19,  -19,   -6,   -7,   -6,   -7,   -7,   -6,   -7,   -7,   -6,   -7,   28,  145],
[ -19,  142,  -19,  -19,  -19,   -7,   -7,   -6,   -6,   -6,   -6,   -7,   -7,   -7,   -7,  -36,  -66],
[ -19,  -19,  142,  -19,  -19,   -6,   -6,   -7,   -7,   -7,   -7,   -7,   -6,   -7,   -6,  -16,  -62],
[ -19,  -19,  -19,  142,  -19,   -6,   -6,   -7,   -7,   -7,   -7,   -7,   -7,   -6,   -6,  -10,  -81],
[ -19,  -19,  -19,  -19,  142,   -7,   -7,   -7,   -7,   -6,   -6,   -7,   -6,   -7,   -6,    8,   28],
[  -6,   -7,   -6,   -6,   -7,  142,  -19,  -19,  -19,  -19,   -7,   -6,   -7,   -7,   -7,  -33, -126],
[  -7,   -7,   -6,   -6,   -7,  -19,  142,  -19,  -19,  -19,   -7,   -6,   -7,   -6,   -7,   16,   98],
[  -6,   -6,   -7,   -7,   -7,  -19,  -19,  142,  -19,  -19,   -7,   -6,   -6,   -7,   -7,   -6,  -24],
[  -7,   -6,   -7,   -7,   -7,  -19,  -19,  -19,  144,  -19,   -6,   -7,   -7,   -7,   -7,   16,   29],
[  -7,   -6,   -7,   -7,   -6,  -19,  -19,  -19,  -19,  142,   -7,   -7,   -7,   -6,   -6,   32,   42],
[  -6,   -6,   -7,   -7,   -6,   -7,   -7,   -7,   -6,   -7,  142,  -19,  -19,  -19,  -19,   -2,   -3],
[  -7,   -7,   -7,   -7,   -7,   -6,   -6,   -6,   -7,   -7,  -19,  143,  -19,  -19,  -19,   -4,  -16],
[  -7,   -7,   -6,   -7,   -6,   -7,   -7,   -6,   -7,   -7,  -19,  -19,  143,  -19,  -19,   18,   47],
[  -6,   -7,   -7,   -6,   -7,   -7,   -6,   -7,   -7,   -6,  -19,  -19,  -19,  142,  -19,   -5,  -56],
[  -7,   -7,   -6,   -6,   -6,   -7,   -7,   -7,   -7,   -6,  -19,  -19,  -19,  -19,  142,   -6,   45]])

Test2 = matrix(QQ,4,8,[
[0, 2, 1, 2, 4, 1, 31, 5],
[2, 0, 2, 0, 2, 2, 23, 15],
[1, 2, 0, 2, 2, 3, 15, 33],
[2, 0, 2, 0, 1, 3, 5, 21] ])

Test2_Output = matrix(QQ,4,6,[
[  5,  -2,  -1,  -2,   3,  26],
[ -2,  4,  -2,   0,   0,   8],
[ -1,  -2,   5,  -2,  -1, -18],
[ -2,   0,  -2,   4,  -2, -16] ])

if RatingsMatrix(Test1) == Test1_Output:
print "Passed test 1."
else:
print "Failed test 1."

if RatingsMatrix(Test2) == Test2_Output:
print "Passed test 2."
else:
print "Failed test 2."

Passed test 1. Passed test 2.

Now, use your function in the rest of the project.

In [1]:
Problem5 = matrix(QQ, [
[0, 1, 0, 3, 4, 0, 40, 8],
[1, 0, 3, 0, 3, 1, 32, 22],
[0, 3, 0, 1, 1, 3, 26, 34],
[3, 0, 1, 0, 0, 4, 6, 40] ])

Problem7 = matrix(QQ, [
[0, 1, 1, 3, 4, 1, 40, 18],
[1, 0, 3, 0, 3, 1, 32, 22],
[1, 3, 0, 1, 2, 3, 36, 34],
[3, 0, 1, 0, 0, 4, 6, 40] ])

Problem8 = matrix(QQ, [
[5, -1, -1, -3, 3],
[-1, 4, 3, 0, 2],
[-1, -3, 5, -1, -1],
[3, 0, -1, 4, -4] ])


B. Do exercises 5, 7 and 8 in Minton's paper. For exercise 5, be sure to interpret the results: how do the win and point-spread ratings rank the teams? For exercise 7, don't forget to consider both win and point-spread ratings in the last question. For exercise 8, be sure to explain the choices you make! (Think of this problem as a single mini-essay, where over the course of the essay you will answer all the questions posed in problems 5, 7, and 8.) Feel free to make use of the function you created in part A!

Solution:

Start here ...

In [ ]:
#5. Just like the example above, we used the ratingsMatrix function to turn the original matrix into matrix with both point-spread and win spread ratings
# For the point spread rating, 10,8,4,0 are the values of subtractions.For example, if we want to know the point spread between A and D, then the answer should be 10 by subtracting a by d. Team A should win by 10 points. And if we want to know the point spread between A and B, we subtract the first equation by the second, which is (a-d)-(b-d) = 2. In fact, to determine ratings based on point-spread, arbitarily pick value d. For instance when d = 0, a = 10,b = 8,c = 4. Given that any d value that plugged in does not affect the difference. Then we rank it from high to low. In this case, 10 is the higest point spread rating. Win ratings, on the other hand, determined by wins and losses in a particular league. The only difference from point spread rating is we replace last column with wins minus losses and row reduce it(note that a win is 1 point). Similarly, d is 0 and we see a is 1/4 higher than b in win ratings and b is 3/4 higher...etc. Each team will lay two time with the rest of them, for instance 2(a-b)+2(a-c)+2(a-d). We plug in the win rating values earlier to get how many wins a team need to have. For A team, they need 5 wins and 1 loss, B need 3 wins and 3 losses...etc. Then we rank it from high to low. We know A has the highest win ratings.

In [ ]:
#7. In this case, C and A played extra game with each other.And team C won 10-0(10 points)to team A. It can be easily determined by comparing the value between matrix 5 and matrix 7. For team B and D, there is nothing change for their value in matrix. However, we can see the number of games played between Team C and Team A increases 1. But no thing changes in matrix of A and 10 more points in matrix C. While Team D is still weakest team in both point spread rating(d=0) and winning rating, Team A drop down from first to third for point spread rating, and drop down from first to second in winning rating. As for point spreading rating, like what we did in exersice 5, by using substraction and reduced echlon matrix, we get a= 42/5, b=56/5, c=44/5 and d=0. Then we rank from highest number to lowest one. Team A is No.3, comparing to 1st in exsercise 5. As for winning rate, like what we did in exersice 5, by replacing last column with wins minus losses and row reduce it, we see b is 28/20 higher than a and a is 21/20 higher than c and c is 17/20 higher than d=0. We assume each team will lay two time with the rest of them, for instance 2(a-b)+2(a-c)+2(a-d). We plug in the win rating values earlier to get how many wins a team need to have. Team A has 9/5 wins, team B has 23/5 wins, team C has 1/5 wins and team d his -33/5 wins.Comparing these number of wins, we can see team A has second win ratings.

In [ ]:
#8. We could project that Team A would win 3 games and lose 1 game, Team B would win 4 games and lose 0 games, Team C would win 1 game and lose 3 games, and Team D would lose 4 games and win 0 games. I choose to round the ratings this way based on the reduced row echelon form of the win ratings. Team B has the highest win rating shown by the last column in the augmented win rating matrix. Team B won the most games because they have the highest win rating and Team D lost the most games because they have the lowest win rating.

In [12]:
RatingsMatrix(Problem5).rref()

[ 1 0 0 -1 5/4 10] [ 0 1 0 -1 1 8] [ 0 0 1 -1 1/4 4] [ 0 0 0 0 0 0]
In [6]:
RatingsMatrix(Problem7).rref()

[ 1 0 0 -1 21/20 42/5] [ 0 1 0 -1 7/5 56/5] [ 0 0 1 -1 17/20 44/5] [ 0 0 0 0 0 0]
In [3]:
(Problem8).rref()

[ 1 0 0 0 1/16] [ 0 1 0 0 9/16] [ 0 0 1 0 -1/16] [ 0 0 0 1 -17/16]

C. Find data for a real sports league over a single season (recent: 2018 - 2020) to which you can apply the ratings system. Here are a few warnings about the data:

• Your schedule should be connected (see Theorem 2 in Minton's paper).
• The sum of the wins should equal the sum of the losses, and the sum of the points for should equal the sum of the points against.
• The schedule should be symmetric, in the sense that if entry $(i, j)$ is 5 then entry $(j, i)$ should also be 5.  This is because if team $i$ plays team $j$ five times, then of course team $j$ plays team $i$ five times.

Write up an analysis of your chosen example.  Do the results seem reasonable? You will need to bring in some information not in the record to answer this question. Here are a few things you might choose to ponder when assessing your results:

• How did the win ratings stack up against subsequent games or published rankings?
• How did the point spread ratings stack up against subsequent games or published rankings?
• When the win ratings were tied, could the points spread ratings be used effectively to rank the teams?
• Discuss the predictive power of the ratings and the reasonableness of any expectations that they are indeed predictive.
• Bring in any other knowledge of the sport, or of the teams involved, to shed light on any differences between the predictions and reality.

Points will be awarded on a sliding scale for the depth and thoroughness of the analysis.

Solution:

Start here ...

In [ ]: