CoCalc Shared Filesassignments / 2020-03-20 / presentation-aneelamr.ipynbOpen in CoCalc with one click!

1

2

This presentation looks at singular value decomposition, and how it can be applied and used

3

4

The Singular-Value Decomposition, or SVD for short, is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simpler.

5

before, we continue, let's quickly remember how to compute the SVD in sage. here is an example from lecture

6

In [2]:

import numpy import scipy.linalg import scipy.integrate A = numpy.array([[2,1,1],[1,-1,3],[0,1,-2]]) U, s, V = scipy.linalg.svd(A) print(U) print(s) print(V)

7

[[-0.33618447 0.90453943 -0.26227547]
[-0.80113368 -0.12824982 0.5845826 ]
[ 0.49514123 0.4066453 0.7677726 ]]
[4.12369962 2.23233078 0.10863116]
[[-0.3573254 0.23282259 -0.90449555]
[ 0.75294802 0.64481239 -0.13147722]
[ 0.55261907 -0.72801828 -0.40571116]]

The command A.SVD() returns a triple (U,S,V) so that A=USV^T; U and V are orthogonal matrices; and S is a “diagonal” (but not square) matrix. So, the columns of U are left singular vectors of A, and the columns of V are right singular vectors of A.

8

9

we often work with datasets, that have a lot of columns(features) which might not be as useful. A very practical and comman application of SVD is taking these big matrices, and decomposing it into smaller matrices which might be more meaningful in our analysis. For example, there might be a lot of insiginifacnt data entries in our matrix, To do this we can perform an SVD operation on the original data and select the top k largest singular values in Sigma. These columns can be selected from Sigma and the rows selected from V^T.

10

here is an example borrowed from https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/

11

In [48]:

from numpy import diag from numpy import zeros from scipy.linalg import svd # define a matrix A = numpy.array([ [1,2,3,4,5,6,7,8,9,10], [11,12,13,14,15,16,17,18,19,20], [21,22,23,24,25,26,27,28,29,30]]) print(A) # Singular-value decomposition U, s, VT = svd(A) # create m x n Sigma matrix Sigma = zeros((A.shape[0], A.shape[1])) # populate Sigma with n x n diagonal matrix Sigma[:A.shape[0], :A.shape[0]] = diag(s) # select n_elements = 2 Sigma = Sigma[:, :n_elements] VT = VT[:n_elements, :] # reconstruct B = U.dot(Sigma.dot(VT)) print(B) # transform T = U.dot(Sigma) print(T)

12

[[ 1 2 3 4 5 6 7 8 9 10]
[11 12 13 14 15 16 17 18 19 20]
[21 22 23 24 25 26 27 28 29 30]]
[[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
[11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]
[21. 22. 23. 24. 25. 26. 27. 28. 29. 30.]]
[[-18.52157747 6.47697214]
[-49.81310011 1.91182038]
[-81.10462276 -2.65333138]]

in this example, we started of with a 3x10 array, and we end up with a 3x2 matrix which might be more helpful while doing machine learning and PCA

13

14

In [ ]:

15

we have our very familiar tooth, data set. we looked at how we can make pivot tables, in class. but let's use svd on this data set, and see what we can learn from it

16

In [21]:

tooth=sm.datasets.get_rdataset('ToothGrowth').data

17

/ext/sage/sage-9.0/local/lib/python3.7/site-packages/statsmodels/datasets/utils.py:185: FutureWarning: `item` has been deprecated and will be removed in a future version
return dataset_meta["Title"].item()

In [8]:

tooth = tooth.drop(columns=['supp'])

18

In [9]:

tooth

19

len | dose | |
---|---|---|

0 | 4.2 | 0.5 |

1 | 11.5 | 0.5 |

2 | 7.3 | 0.5 |

3 | 5.8 | 0.5 |

4 | 6.4 | 0.5 |

5 | 10.0 | 0.5 |

6 | 11.2 | 0.5 |

7 | 11.2 | 0.5 |

8 | 5.2 | 0.5 |

9 | 7.0 | 0.5 |

10 | 16.5 | 1.0 |

11 | 16.5 | 1.0 |

12 | 15.2 | 1.0 |

13 | 17.3 | 1.0 |

14 | 22.5 | 1.0 |

15 | 17.3 | 1.0 |

16 | 13.6 | 1.0 |

17 | 14.5 | 1.0 |

18 | 18.8 | 1.0 |

19 | 15.5 | 1.0 |

20 | 23.6 | 2.0 |

21 | 18.5 | 2.0 |

22 | 33.9 | 2.0 |

23 | 25.5 | 2.0 |

24 | 26.4 | 2.0 |

25 | 32.5 | 2.0 |

26 | 26.7 | 2.0 |

27 | 21.5 | 2.0 |

28 | 23.3 | 2.0 |

29 | 29.5 | 2.0 |

30 | 15.2 | 0.5 |

31 | 21.5 | 0.5 |

32 | 17.6 | 0.5 |

33 | 9.7 | 0.5 |

34 | 14.5 | 0.5 |

35 | 10.0 | 0.5 |

36 | 8.2 | 0.5 |

37 | 9.4 | 0.5 |

38 | 16.5 | 0.5 |

39 | 9.7 | 0.5 |

40 | 19.7 | 1.0 |

41 | 23.3 | 1.0 |

42 | 23.6 | 1.0 |

43 | 26.4 | 1.0 |

44 | 20.0 | 1.0 |

45 | 25.2 | 1.0 |

46 | 25.8 | 1.0 |

47 | 21.2 | 1.0 |

48 | 14.5 | 1.0 |

49 | 27.3 | 1.0 |

50 | 25.5 | 2.0 |

51 | 26.4 | 2.0 |

52 | 22.4 | 2.0 |

53 | 24.5 | 2.0 |

54 | 24.8 | 2.0 |

55 | 30.9 | 2.0 |

56 | 26.4 | 2.0 |

57 | 27.3 | 2.0 |

58 | 29.4 | 2.0 |

59 | 23.0 | 2.0 |

In [ ]:

20

here, we convert take the values of len and dose from the above dataset, and convert it into numpy arrays.

21

In [16]:

arr1 = tooth['len'].to_numpy() arr2 = tooth['dose'].to_numpy()

22

In [19]:

A2 = matrix([arr1,arr2]) A2.str() #we cleaned the data, by taking the values of len, dose, and converting it into a matrix

23

'[ 4.2 11.5 7.3 5.8 6.4 10.0 11.2 11.2 5.2 7.0 16.5 16.5 15.2 17.3 22.5 17.3 13.6 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5 15.2 21.5 17.6 9.7 14.5 10.0 8.2 9.4 16.5 9.7 19.7 23.3 23.6 26.4 20.0 25.2 25.8 21.2 14.5 27.3 25.5 26.4 22.4 24.5 24.8 30.9 26.4 27.3 29.4 23.0]\n[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0]'

In [24]:

U, s, V = scipy.linalg.svd(A2) print(s) here, the singular values of this mat

24

[157.43375176 2.88336744]

we were able to convert a data set and find its eigenvalues. this is really useful for Principal componenet analysis, and to see how much each value, of the matrix A2, can be maximized or minimized to make sense of the larger dataset

25

26

Another really useful application of SVD, is that it can be used for image reconstruction. here we will take an image, convert it into a matrix, and reconstruct it by computing its svd. this is useful, because while trying to compresss images and save space, its often useful to start from a high resolution image and use svd to get a smaller file with lower resolution

27

In [29]:

import pylab A = pylab.mean(pylab.imread('blackcat.png'),2)

28

pylab is a simple plotting tool. here we take the mean, because we want to scale our image reconstruction to our particular axes on pylab

29

In [ ]:

30

In [30]:

B = matrix(A) #converting image into matrix

31

we now have our image as a matrix, that is great, we can now compute its SVD

32

In [ ]:

33

In [38]:

U, s, V = B.SVD() n = 32; #here we are setting the number of singular values we are using C= list(range(n))

34

In [39]:

for j in range(n): C[j]=((U[:,j]*V.transpose()[j,:])*s[j,j])#we are putting together the original image, by multiplying the matrices from the decomposition D = sum(C) #using values computed in svd to reconstruct image

35

In [40]:

matrix_plot(D)#plotting our graph

36

we've been able to reconstruct some of the image of the cat!!! let's explore a little more. what happens when we decrease the number of singular values?

37

In [42]:

n = 10; #here we are setting the number of singular values we are using C= list(range(n)) for j in range(n): C[j]=((U[:,j]*V.transpose()[j,:])*s[j,j]) D = sum(C) matrix_plot(D)#plotting our graph

38

we can see that the image is much blurry, but this means the size of the file is smaller, since it has less components. now let's do the opposite and inccrease the number of singular values we shall use to reconstruct this image. the image i picked is 256x256, so the diagonal matrix can have atmost 256 elements. i set n to 256 here, but that means we are not really doing any compression

39

In [45]:

n = 256; #here we are setting the number of singular values we are using C= list(range(n)) for j in range(n): C[j]=((U[:,j]*V.transpose()[j,:])*s[j,j]) D = sum(C) matrix_plot(D)#plotting our graph

40

41

42

In [49]:

problem 1 solution

43

```
File "<ipython-input-49-2508c95c38ee>", line 1
problem Integer(1) solutino
^
SyntaxError: invalid syntax
```

In [62]:

A = pylab.mean(pylab.imread('pirateflag.png'),2) B = matrix(A) #converting image into matrix U, s, V = B.SVD() n = 32; #here we are setting the number of singular values we are using C= list(range(n)) for j in range(n): C[j]=((U[:,j]*V.transpose()[j,:])*s[j,j])#we are putting together the original image, by multiplying the matrices from the decomposition D = sum(C) #using values computed in svd to reconstruct image matrix_plot(D)#plotting our graph

44

In [ ]:

1b)

45

In [64]:

n = 256; #here we are setting the number of singular values we are using C= list(range(n)) for j in range(n): C[j]=((U[:,j]*V.transpose()[j,:])*s[j,j])#we are putting together the original image, by multiplying the matrices from the decomposition D = sum(C) #using values computed in svd to reconstruct image matrix_plot(D)#plotting our graph

46

47

In [60]:

iris = sm.datasets.get_rdataset('iris').data iris = iris.drop(columns=['Species']) arr1 = iris['Sepal.Length'].to_numpy() arr2 = iris['Sepal.Width'].to_numpy() arr3 = iris['Petal.Length'].to_numpy() arr4 = iris['Petal.Width'].to_numpy() A2 = matrix([arr1,arr2,arr3,arr4]) U, s, V = scipy.linalg.svd(A2) print(s)

48

[95.95991387 17.76103366 3.46093093 1.88482631]

/ext/sage/sage-9.0/local/lib/python3.7/site-packages/statsmodels/datasets/utils.py:185: FutureWarning: `item` has been deprecated and will be removed in a future version
return dataset_meta["Title"].item()

using the above values, we can train a classifier to predict what an average sepal or petal length would look like, if we know the sepal or petal width. the truncatedsvd in sklearn is a good example of a classifier that can do this and is already built

49

sources:

- professor's lecture notes
- https://blogs.uoregon.edu/math342sp16lipshitz/
- https://see.stanford.edu/materials/lsoeldsee263/16-svd.pdf
- https://personal.utdallas.edu/~herve/Abdi-SVD2007-pretty.pdf -https://staff.imsa.edu/~fogel/LinAlg/PDF/50 Application of the SVD.pdf
- https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/

50