Applications of SVD
Final project by Aditya Neelamraju
About this presentation
This presentation looks at singular value decomposition, and how it can be applied and used
What is SVD?
The Singular-Value Decomposition, or SVD for short, is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simpler.
before, we continue, let's quickly remember how to compute the SVD in sage. here is an example from lecture
The command A.SVD() returns a triple (U,S,V) so that A=USV^T; U and V are orthogonal matrices; and S is a “diagonal” (but not square) matrix. So, the columns of U are left singular vectors of A, and the columns of V are right singular vectors of A.
SVD for dimensionality reduction
we often work with datasets, that have a lot of columns(features) which might not be as useful. A very practical and comman application of SVD is taking these big matrices, and decomposing it into smaller matrices which might be more meaningful in our analysis. For example, there might be a lot of insiginifacnt data entries in our matrix, To do this we can perform an SVD operation on the original data and select the top k largest singular values in Sigma. These columns can be selected from Sigma and the rows selected from V^T.
here is an example borrowed from https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/
in this example, we started of with a 3x10 array, and we end up with a 3x2 matrix which might be more helpful while doing machine learning and PCA
SVD for data Analysis
we have our very familiar tooth, data set. we looked at how we can make pivot tables, in class. but let's use svd on this data set, and see what we can learn from it
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-6-01614960f97f> in <module>()
----> 1 tooth=sm.datasets.get_rdataset('ToothGrowth').data
NameError: name 'sm' is not defined
len | dose | |
---|---|---|
0 | 4.2 | 0.5 |
1 | 11.5 | 0.5 |
2 | 7.3 | 0.5 |
3 | 5.8 | 0.5 |
4 | 6.4 | 0.5 |
5 | 10.0 | 0.5 |
6 | 11.2 | 0.5 |
7 | 11.2 | 0.5 |
8 | 5.2 | 0.5 |
9 | 7.0 | 0.5 |
10 | 16.5 | 1.0 |
11 | 16.5 | 1.0 |
12 | 15.2 | 1.0 |
13 | 17.3 | 1.0 |
14 | 22.5 | 1.0 |
15 | 17.3 | 1.0 |
16 | 13.6 | 1.0 |
17 | 14.5 | 1.0 |
18 | 18.8 | 1.0 |
19 | 15.5 | 1.0 |
20 | 23.6 | 2.0 |
21 | 18.5 | 2.0 |
22 | 33.9 | 2.0 |
23 | 25.5 | 2.0 |
24 | 26.4 | 2.0 |
25 | 32.5 | 2.0 |
26 | 26.7 | 2.0 |
27 | 21.5 | 2.0 |
28 | 23.3 | 2.0 |
29 | 29.5 | 2.0 |
30 | 15.2 | 0.5 |
31 | 21.5 | 0.5 |
32 | 17.6 | 0.5 |
33 | 9.7 | 0.5 |
34 | 14.5 | 0.5 |
35 | 10.0 | 0.5 |
36 | 8.2 | 0.5 |
37 | 9.4 | 0.5 |
38 | 16.5 | 0.5 |
39 | 9.7 | 0.5 |
40 | 19.7 | 1.0 |
41 | 23.3 | 1.0 |
42 | 23.6 | 1.0 |
43 | 26.4 | 1.0 |
44 | 20.0 | 1.0 |
45 | 25.2 | 1.0 |
46 | 25.8 | 1.0 |
47 | 21.2 | 1.0 |
48 | 14.5 | 1.0 |
49 | 27.3 | 1.0 |
50 | 25.5 | 2.0 |
51 | 26.4 | 2.0 |
52 | 22.4 | 2.0 |
53 | 24.5 | 2.0 |
54 | 24.8 | 2.0 |
55 | 30.9 | 2.0 |
56 | 26.4 | 2.0 |
57 | 27.3 | 2.0 |
58 | 29.4 | 2.0 |
59 | 23.0 | 2.0 |
here, we convert take the values of len and dose from the above dataset, and convert it into numpy arrays.
we were able to convert a data set and find its eigenvalues. this is really useful for Principal componenet analysis, and to see how much each value, of the matrix A2, can be maximized or minimized to make sense of the larger dataset
SVD for image compression, construction
Another really useful application of SVD, is that it can be used for image reconstruction. here we will take an image, convert it into a matrix, and reconstruct it by computing its svd. this is useful, because while trying to compresss images and save space, its often useful to start from a high resolution image and use svd to get a smaller file with lower resolution
pylab is a simple plotting tool. here we take the mean, because we want to scale our image reconstruction to our particular axes on pylab
we now have our image as a matrix, that is great, we can now compute its SVD
we've been able to reconstruct some of the image of the cat!!! let's explore a little more. what happens when we decrease the number of singular values?
we can see that the image is much blurry, but this means the size of the file is smaller, since it has less components. now let's do the opposite and inccrease the number of singular values we shall use to reconstruct this image. the image i picked is 256x256, so the diagonal matrix can have atmost 256 elements. i set n to 256 here, but that means we are not really doing any compression
problem 1a). there is an image in this repository called pirateflag. your job is to convert this image using the method above, and then reconstructing it. (hint: the resolution of the image is 256x256). problem 1b) make this image as clear as possible
problem 2. a) take the iris data set. convert the data set into a matrix by only taking into account sepal and petal width and length, b) and find its singular values. c) What can you gain from this?(the last question is open ended, as there are many insights one can gain using PCA)
File "<ipython-input-49-2508c95c38ee>", line 1
problem Integer(1) solutino
^
SyntaxError: invalid syntax
problem 2 solution
using the above values, we can train a classifier to predict what an average sepal or petal length would look like, if we know the sepal or petal width. the truncatedsvd in sklearn is a good example of a classifier that can do this and is already built