| Download

Project: math480-2016

Path: lectures / 2016-05-18 / 2016-05-18.sagews

Views: ²¹⁵⁸

Math 480: Open Source Mathematical Software

2016-05-18

William Stein

Lectures 23: Numpy Arrays (vs Sage Matrices)

Today:

A podcast interview of me about Sage just appeared today. If you like podcasts, you might find it interesting.
Clarification of question 1 a-b: use text3d and get raw latex code that look ugly!
Start screencast
Basics of numpy (which you won't learn in your homework, so learn it NOW)

Basic Numpy Tutorial

%auto
import numpy as np
%default_mode python

1. Making numpy arrays

np.array(range(15))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

v = np.arange(15)   # different way to make above
v

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

v.shape

(15,)

a = v.reshape(3, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# Or we could do this:
np.array([[ 0,  1,  2,  3,  4], [ 5,  6,  7,  8,  9], [10, 11, 12, 13, 14]])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

a.shape

(3, 5)

a.ndim

2

a.dtype

dtype('int64')

a.itemsize

8

a.size

15

type(a)

<type 'numpy.ndarray'>

Challenge 1:

Make a numpy ndarray a with a.shape==(4,)
Make a numpy ndarray a with a.shape==(2,3,4)
Make a numpy ndarray a with a.ndim==4
Make a numpy ndarray a with a.dtype not dtype('int64')
Make a numpy ndarray a with itemsize 0


︠ed4c9bec-6c4b-4ef6-afd7-34e751efb535︠

︠b8535290-e79f-4b7c-b350-67fa46cb26af︠

︠9f61ed87-dd77-4b58-a2d0-d3b72e645854︠

︠ab45cd1a-1658-4874-a564-fbece79df231i︠
%md
### 2. More about making arrays

2. More about making arrays

Explicitly specifying the data type at creation time:

c = np.array([ [1,2], [3,4] ], dtype=complex)
c

array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

c.dot(c)

array([[  7.+0.j,  10.+0.j],
       [ 15.+0.j,  22.+0.j]])

np.zeros( (3,4) )

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

np.zeros( (3,2,2) )

array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

np.ones( (3,3) )

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

np.arange(0, 2, 0.3)    # points ever .3

array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])

np.linspace(0, 2, 5)    # 5 points between 0 and 2

array([ 0. ,  0.5,  1. ,  1.5,  2. ])

︠e3ffadbd-faea-4447-a4fa-ad1b9b0283fai︠
%md The following is an example of a "**vectorized operation**", where we apply a function to an array, which applies that function componentwise.

The following is an example of a "vectorized operation", where we apply a function to an array, which applies that function componentwise.

v = np.linspace(0, 2, 5)
v
w = np.sin(v)
w

array([ 0. ,  0.5,  1. ,  1.5,  2. ])
array([ 0.        ,  0.47942554,  0.84147098,  0.99749499,  0.90929743])

np.sin(0.5)

0.47942553860420301

**Challenge 2: **

Make a numpy array with 10 equally spaced values between 0 and 1.
Use %timeit to see how long np.sin(...) takes when applied to a numpy array v with one million entries in it. Compare that to the time of doing np.array([np.sin(x) for x in v]) and np.array([math.sin(x) for x inv]). Do you see the value of vectorization?
Does np.sin(a) work if a is an array with shape (2,2)?


︠b9afd03d-4634-41b3-9873-a19b892e1fd9︠

︠1c0a31e1-f94d-476d-af87-e1008bd19506s︠

︠653858a3-b32f-41cc-90cb-939ec1fc0d0fi︠
%md
### 3. Numpy arrays "do not respect math"...

... in the same sense that some people [do not respect wood](https://youtu.be/17hnNOFaRH4)...

3. Numpy arrays "do not respect math"...

... in the same sense that some people do not respect wood...

print "Numpy:"
a = np.array([[1,2],[3,4]])
np.exp(a)
print "Sage:"
b = matrix(RDF,[[1,2],[3,4]])
exp(b)

Numpy:
array([[  2.71828183,   7.3890561 ],
       [ 20.08553692,  54.59815003]])
Sage:
[51.968956198705044  74.73656456700328]
[112.10484685050491 164.07380304920997]

print "Numpy:"
a * a
print "Sage:"
b * b

Numpy:
array([[ 1,  4],
       [ 9, 16]])
Sage:
[ 7.0 10.0]
[15.0 22.0]

This is just an arbitrary choice, made differently by mathematicians and engineers, so you have to be aware of it.

np.exp(a)

array([[  2.71828183,   7.3890561 ],
       [ 20.08553692,  54.59815003]])

b.apply_map(exp)

[ 2.718281828459045   7.38905609893065]
[20.085536923187668 54.598150033144236]

# and...
a.dot(a)

array([[ 7, 10],
       [15, 22]])

Challenge 3: Strangely, I don't know how to:

compute Sage's matrix exponential exp(b) using numpy
do numpy's componentwise matrix multiplication a*a using Sage!

(no answer here -- I really don't know how to do these things. If anybody figure it out, let me know.)


︠b2338570-820d-403c-9ccb-c4452a7fd470︠

︠996253de-56ef-477a-a9fa-1d97c8da5605︠

︠3cd57739-a860-4fa2-90f0-c90a5f8bf8f5i︠
%md
### 4. Binary Operations

4. Binary Operations

a = np.array([[1,2],[4,5]])
b = np.array([[3,5],[2,4]])
a + b

array([[4, 7],
       [6, 9]])

a * b

array([[ 3, 10],
       [ 8, 20]])

a.dot(b)

array([[ 7, 13],
       [22, 40]])

Another major philosophical difference betweeen Sage and Numpy is that if you do a binary operation involving numpy arrays with different precision, the result has the higher precision, whereas in Sage, the result has the lower precision! Exactly the opposite choice.

a_low = np.array([[1,2],[3,4]], dtype=float)
a_high = np.array([[1,2],[3,4]], dtype=np.dtype('float128'))
a_low.dtype
a_high.dtype
print "numpy type of sum", (a_low + a_high).dtype

dtype('float64')
dtype('float128')
numpy type of sum float128

b_low = matrix(RealField(64), [[1,2],[3,4]])
b_high = matrix(RealField(128), [[1,2],[3,4]])
b_low.parent()
b_high.parent()
print "Sage parent of sum", (b_low + b_high).parent()

Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision
Full MatrixSpace of 2 by 2 dense matrices over Real Field with 128 bits of precision
Sage parent of sum Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision

(For the record, Magma does the same thing as Sage.)

5. Methods of numpy arrays

a = np.arange(100).reshape(10,10)
a

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

a.sum()

4950

a.sum(axis=0) # sums each column

array([450, 460, 470, 480, 490, 500, 510, 520, 530, 540])

a.sum(axis=1) # sums each row

array([ 45, 145, 245, 345, 445, 545, 645, 745, 845, 945])

a.prod() # of course it is 0

0

a.std()

28.866070047722118

a.std(axis=1)

array([ 2.87228132,  2.87228132,  2.87228132,  2.87228132,  2.87228132,
        2.87228132,  2.87228132,  2.87228132,  2.87228132,  2.87228132])

a.mean(axis=1)

array([  4.5,  14.5,  24.5,  34.5,  44.5,  54.5,  64.5,  74.5,  84.5,  94.5])

a.var()

833.25

a.max()

99

a.min()

0

Challenge 5:

How does the speed of Numpy's np.arange(1000000).sum() compare to Python's sum(range(1000000))?
How does the speed of Numpy's standard deviation (so std) compare to that of pandas std on a Data frame with input range(1000000)?

%timeit np.arange(1000000).sum()

%timeit  sum(range(1000000))

a = np.arange(1000000)
a.std()
%timeit a.std()

import pandas
df = pandas.DataFrame(np.arange(1000000))
df.std()
%timeit df.std()

Heh, look up at the actual OUTPUT of the above two .std() functions -- they are completely different after the decimal point.

Caveat emptor!

v = stats.TimeSeries(range(1000000))  # I wrote this in Cython once...
%timeit v.standard_deviation()


︠50f74575-ce6f-4ea4-bc15-97e1c64c4fbai︠



%md
### 6. Indexing and slicing

6. Indexing and slicing

a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

a[3] = 10
a

array([ 0,  1,  2, 10,  4,  5,  6,  7,  8,  9])

a[2:5]

array([ 2, 10,  4])

a[2:5] = 10

array([ 0,  1, 10, 10, 10,  5,  6,  7,  8,  9])

a[2:5] = [100,200,300]
a

array([  0,   1, 100, 200, 300,   5,   6,   7,   8,   9])

I hope you start to feel the power...

a[2:7:2]

array([100, 300,   6])

a[2:7:2] = [-1, -2, -3]
a

array([  0,   1,  -1, 200,  -2,   5,  -3,   7,   8,   9])

a = np.arange(100).reshape(10,10)
a

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

a[1:3, :]

array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

a[:, 1:3]

array([[11, 12],
       [11, 12],
       [21, 22],
       [31, 32],
       [41, 42],
       [51, 52],
       [61, 62],
       [71, 72],
       [81, 82],
       [91, 92]])

# set the first column to equal the last:
a[:,0] = a[:,-1]

array([[19, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [19, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [29, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [39, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [49, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [59, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [69, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [79, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [89, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [99, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

All this amazing slicing and dicing, and getting references into nd arrays... works on $n$ -dimensional arrays of data.

a = np.zeros( (3,2,2) )
a

array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

a[0] = [[1,2], [3,4]]
a

array([[[ 1.,  2.],
        [ 3.,  4.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

a[1] = a[0]
a

array([[[ 1.,  2.],
        [ 3.,  4.]],

       [[ 1.,  2.],
        [ 3.,  4.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

# what do you think will happen...
a[0,0,0] = -20