Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Project: math480-2016
Views: 2158

Math 480: Open Source Mathematical Software

2016-05-18

William Stein

**Lectures 23: Numpy Arrays (vs Sage Matrices) **

Today:

  • A podcast interview of me about Sage just appeared today. If you like podcasts, you might find it interesting.

  • Clarification of question 1 a-b: use text3d and get raw latex code that look ugly!

  • Start screencast

  • Basics of numpy (which you won't learn in your homework, so learn it NOW)

Basic Numpy Tutorial

%auto import numpy as np %default_mode python

1. Making numpy arrays

np.array(range(15))
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
v = np.arange(15) # different way to make above v
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
v.shape
(15,)
a = v.reshape(3, 5) a
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
# Or we could do this: np.array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
a.shape
(3, 5)
a.ndim
2
a.dtype
dtype('int64')
a.itemsize
8
a.size
15
type(a)
<type 'numpy.ndarray'>

Challenge 1:

  • Make a numpy ndarray a with a.shape==(4,)

  • Make a numpy ndarray a with a.shape==(2,3,4)

  • Make a numpy ndarray a with a.ndim==4

  • Make a numpy ndarray a with a.dtype not dtype('int64')

  • Make a numpy ndarray a with itemsize 0

︠ed4c9bec-6c4b-4ef6-afd7-34e751efb535︠ ︠b8535290-e79f-4b7c-b350-67fa46cb26af︠ ︠9f61ed87-dd77-4b58-a2d0-d3b72e645854︠ ︠ab45cd1a-1658-4874-a564-fbece79df231i︠ %md ### 2. More about making arrays

2. More about making arrays

Explicitly specifying the data type at creation time:

c = np.array([ [1,2], [3,4] ], dtype=complex) c
array([[ 1.+0.j, 2.+0.j], [ 3.+0.j, 4.+0.j]])
c.dot(c)
array([[ 7.+0.j, 10.+0.j], [ 15.+0.j, 22.+0.j]])
np.zeros( (3,4) )
array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]])
np.zeros( (3,2,2) )
array([[[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]]])
np.ones( (3,3) )
array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])
np.arange(0, 2, 0.3) # points ever .3
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])
np.linspace(0, 2, 5) # 5 points between 0 and 2
array([ 0. , 0.5, 1. , 1.5, 2. ])
︠e3ffadbd-faea-4447-a4fa-ad1b9b0283fai︠ %md The following is an example of a "**vectorized operation**", where we apply a function to an array, which applies that function componentwise.

The following is an example of a "vectorized operation", where we apply a function to an array, which applies that function componentwise.

v = np.linspace(0, 2, 5) v w = np.sin(v) w
array([ 0. , 0.5, 1. , 1.5, 2. ]) array([ 0. , 0.47942554, 0.84147098, 0.99749499, 0.90929743])
np.sin(0.5)
0.47942553860420301

**Challenge 2: **

  • Make a numpy array with 10 equally spaced values between 0 and 1.

  • Use %timeit to see how long np.sin(...) takes when applied to a numpy array v with one million entries in it. Compare that to the time of doing np.array([np.sin(x) for x in v]) and np.array([math.sin(x) for x inv]). Do you see the value of vectorization?

  • Does np.sin(a) work if a is an array with shape (2,2)?

︠b9afd03d-4634-41b3-9873-a19b892e1fd9︠ ︠1c0a31e1-f94d-476d-af87-e1008bd19506s︠ ︠653858a3-b32f-41cc-90cb-939ec1fc0d0fi︠ %md ### 3. Numpy arrays "do not respect math"... ... in the same sense that some people [do not respect wood](https://youtu.be/17hnNOFaRH4)...

3. Numpy arrays "do not respect math"...

... in the same sense that some people do not respect wood...

print "Numpy:" a = np.array([[1,2],[3,4]]) np.exp(a) print "Sage:" b = matrix(RDF,[[1,2],[3,4]]) exp(b)
Numpy: array([[ 2.71828183, 7.3890561 ], [ 20.08553692, 54.59815003]]) Sage: [51.968956198705044 74.73656456700328] [112.10484685050491 164.07380304920997]
print "Numpy:" a * a print "Sage:" b * b
Numpy: array([[ 1, 4], [ 9, 16]]) Sage: [ 7.0 10.0] [15.0 22.0]

This is just an arbitrary choice, made differently by mathematicians and engineers, so you have to be aware of it.

np.exp(a)
array([[ 2.71828183, 7.3890561 ], [ 20.08553692, 54.59815003]])
b.apply_map(exp)
[ 2.718281828459045 7.38905609893065] [20.085536923187668 54.598150033144236]
# and... a.dot(a)
array([[ 7, 10], [15, 22]])

Challenge 3: Strangely, I don't know how to:

  • compute Sage's matrix exponential exp(b) using numpy

  • do numpy's componentwise matrix multiplication a*a using Sage!

(no answer here -- I really don't know how to do these things. If anybody figure it out, let me know.)

︠b2338570-820d-403c-9ccb-c4452a7fd470︠ ︠996253de-56ef-477a-a9fa-1d97c8da5605︠ ︠3cd57739-a860-4fa2-90f0-c90a5f8bf8f5i︠ %md ### 4. Binary Operations

4. Binary Operations

a = np.array([[1,2],[4,5]]) b = np.array([[3,5],[2,4]]) a + b
array([[4, 7], [6, 9]])
a * b
array([[ 3, 10], [ 8, 20]])
a.dot(b)
array([[ 7, 13], [22, 40]])

Another major philosophical difference betweeen Sage and Numpy is that if you do a binary operation involving numpy arrays with different precision, the result has the higher precision, whereas in Sage, the result has the lower precision! Exactly the opposite choice.

a_low = np.array([[1,2],[3,4]], dtype=float) a_high = np.array([[1,2],[3,4]], dtype=np.dtype('float128')) a_low.dtype a_high.dtype print "numpy type of sum", (a_low + a_high).dtype
dtype('float64') dtype('float128') numpy type of sum float128
b_low = matrix(RealField(64), [[1,2],[3,4]]) b_high = matrix(RealField(128), [[1,2],[3,4]]) b_low.parent() b_high.parent() print "Sage parent of sum", (b_low + b_high).parent()
Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision Full MatrixSpace of 2 by 2 dense matrices over Real Field with 128 bits of precision Sage parent of sum Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision

(For the record, Magma does the same thing as Sage.)

5. Methods of numpy arrays

a = np.arange(100).reshape(10,10) a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
a.sum()
4950
a.sum(axis=0) # sums each column
array([450, 460, 470, 480, 490, 500, 510, 520, 530, 540])
a.sum(axis=1) # sums each row
array([ 45, 145, 245, 345, 445, 545, 645, 745, 845, 945])
a.prod() # of course it is 0
0
a.std()
28.866070047722118
a.std(axis=1)
array([ 2.87228132, 2.87228132, 2.87228132, 2.87228132, 2.87228132, 2.87228132, 2.87228132, 2.87228132, 2.87228132, 2.87228132])
a.mean(axis=1)
array([ 4.5, 14.5, 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5])
a.var()
833.25
a.max()
99
a.min()
0

Challenge 5:

  • How does the speed of Numpy's np.arange(1000000).sum() compare to Python's sum(range(1000000))?

  • How does the speed of Numpy's standard deviation (so std) compare to that of pandas std on a Data frame with input range(1000000)?

%timeit np.arange(1000000).sum()
%timeit sum(range(1000000))
a = np.arange(1000000) a.std() %timeit a.std()
import pandas df = pandas.DataFrame(np.arange(1000000)) df.std() %timeit df.std()

Heh, look up at the actual OUTPUT of the above two .std() functions -- they are completely different after the decimal point.

Caveat emptor!

v = stats.TimeSeries(range(1000000)) # I wrote this in Cython once... %timeit v.standard_deviation()
︠50f74575-ce6f-4ea4-bc15-97e1c64c4fbai︠ %md ### 6. Indexing and slicing

6. Indexing and slicing

a = np.arange(10) a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a[3] = 10 a
array([ 0, 1, 2, 10, 4, 5, 6, 7, 8, 9])
a[2:5]
array([ 2, 10, 4])
a[2:5] = 10
a
array([ 0, 1, 10, 10, 10, 5, 6, 7, 8, 9])
a[2:5] = [100,200,300] a
array([ 0, 1, 100, 200, 300, 5, 6, 7, 8, 9])

I hope you start to feel the power...

a[2:7:2]
array([100, 300, 6])
a[2:7:2] = [-1, -2, -3] a
array([ 0, 1, -1, 200, -2, 5, -3, 7, 8, 9])
a = np.arange(100).reshape(10,10) a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
a[1:3, :]
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
a[:, 1:3]
array([[11, 12], [11, 12], [21, 22], [31, 32], [41, 42], [51, 52], [61, 62], [71, 72], [81, 82], [91, 92]])
# set the first column to equal the last: a[:,0] = a[:,-1]
a
array([[19, 11, 12, 13, 14, 15, 16, 17, 18, 19], [19, 11, 12, 13, 14, 15, 16, 17, 18, 19], [29, 21, 22, 23, 24, 25, 26, 27, 28, 29], [39, 31, 32, 33, 34, 35, 36, 37, 38, 39], [49, 41, 42, 43, 44, 45, 46, 47, 48, 49], [59, 51, 52, 53, 54, 55, 56, 57, 58, 59], [69, 61, 62, 63, 64, 65, 66, 67, 68, 69], [79, 71, 72, 73, 74, 75, 76, 77, 78, 79], [89, 81, 82, 83, 84, 85, 86, 87, 88, 89], [99, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

All this amazing slicing and dicing, and getting references into nd arrays... works on nn-dimensional arrays of data.

a = np.zeros( (3,2,2) ) a
array([[[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]]])
a[0] = [[1,2], [3,4]] a
array([[[ 1., 2.], [ 3., 4.]], [[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]]])
a[1] = a[0] a
array([[[ 1., 2.], [ 3., 4.]], [[ 1., 2.], [ 3., 4.]], [[ 0., 0.], [ 0., 0.]]])
# what do you think will happen... a[0,0,0] = -20