Authors: John Jeng, William A. Stein

# Math 480 - Grading for Homework 3:

Due Friday, April 22, 2016 by 6pm

%html
<font color="red"><h1>HOMEWORK TOTAL: 65 POINTS</h1></font>


# HOMEWORK TOTAL: 65 POINTS

• Go through your gradee's homework worksheet and assign a score to each question. Please state an individual questions assigned score in a cell below it, and leave a few words comment.
• State the total score you've assigned for the homework at the top of the worksheet. Put
TOTAL: X/65
in a cell on its own, where X is the total score for this homework.
• For visibility, I recommend any text you add be colored red. You're welcome to copy any of red html text below and paste it in, suitably edited, if you so desire.

## Problem 1: Standard deviation

a. Write a straightforward Python function (not using anything special from numpy or Sage) to compute the standard deviation of a list of numbers.

b. Compare the speed of your program when given as input v = range(1000), v = range(10000) and v = range(100000) with the speed of standard deviation on that input as implemented in: numpy, R, and stats.TimeSeries. Use %timeit.

c. Do a and b again, but for an implementation of mysd using Cython (you do not have to use any special features of Cython).

%html

<font color="red"><h2>10 Points Total</h2>
<h4>Part A (3 points total)</h4>
<p>Test that their function behaves the same way as the standard deviation function built into sage.</p>
<p>Award:
<ul>
<li>3 points if their function matches behavior to the examples below</li>
<li>1 point if they wrote something that looks close to correct</li>
</ul>
</p>
<h4>Part B (3 points total)</h4>
<p>Award:
<ul>
<li>3 points for correctly using %timeit</li>
<li>2 points if each instance has the same sort of problem.</li>
</ul>
</p>
<h4>Part C (3 points total)</h4>
<p>Award:
<ul>
<li>3 points for a correct implementation in Cython and using %timeit correctly</li>
<li>2 point for just a correct implementation in Cython or a partially correct implementation with correct %timeit usage</li>
<li>1 point for just using %timeit correctly</li>
</ul>
</p>
</font>



## 10 Points Total

#### Part A (3 points total)

Test that their function behaves the same way as the standard deviation function built into sage.

Award:

• 3 points if their function matches behavior to the examples below
• 1 point if they wrote something that looks close to correct

#### Part B (3 points total)

Award:

• 3 points for correctly using %timeit
• 2 points if each instance has the same sort of problem.

#### Part C (3 points total)

Award:

• 3 points for a correct implementation in Cython and using %timeit correctly
• 2 point for just a correct implementation in Cython or a partially correct implementation with correct %timeit usage
• 1 point for just using %timeit correctly

# Example Solution
def mysd(data):
# Size of data
size = len(data)

# Make sure that we're dealing with Sage Integers
data = map(Integer, data)

# Calculate the average
average = sum(data)/size

# Calculate the deviations from mean, squared
squared_deviations = sum(map(lambda x : (x - average)^2, data))

# Calculate the variance
variance = squared_deviations/(size - 1) # Note that we divide by size-1

# Return the standard deviation
return sqrt(variance)

# Some tests
print "These should all be sqrt(55/6)"
mysd(range(10))
mysd(range(1, 11))
mysd([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
std([1..10])
print "\nThese should be sqrt(7/2)"
std([1..6])
mysd([1..6])


These should all be sqrt(55/6) sqrt(55/6) sqrt(55/6) sqrt(55/6) sqrt(55/6) These should be sqrt(7/2) sqrt(7/2) sqrt(7/2)
# Example Solution

print "My Times"
%timeit mysd(range(1000))
%timeit mysd(range(10000))
%timeit mysd(range(100000))
print ""

print "Timings using stats.TimeSeries"
x = stats.TimeSeries(range(1000))
y = stats.TimeSeries(range(10000))
z = stats.TimeSeries(range(100000))
%timeit x.standard_deviation()
%timeit y.standard_deviation()
%timeit z.standard_deviation()
print ""

print "Timings using Numpy"
import numpy
numpy.std(range(1000), ddof=1)
%timeit numpy.std(range(1000), ddof=1)
%timeit numpy.std(range(10000), ddof=1)
%timeit numpy.std(range(100000), ddof=1)
print ""


My Times 125 loops, best of 3: 3.17 ms per loop 25 loops, best of 3: 35.6 ms per loop 5 loops, best of 3: 290 ms per loop Timings using stats.TimeSeries 625 loops, best of 3: 2.47 µs per loop 625 loops, best of 3: 24.1 µs per loop 625 loops, best of 3: 228 µs per loop Timings using Numpy 288.81943609574938 625 loops, best of 3: 133 µs per loop 625 loops, best of 3: 729 µs per loop 125 loops, best of 3: 7.62 ms per loop
%r
x <- c(1:1000)
start.time <- Sys.time()
sd(x, na.rm = FALSE)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

y <- c(1:10000)
start.time <- Sys.time()
sd(y, na.rm = FALSE)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

z <- c(1:100000)
start.time <- Sys.time()
sd(z, na.rm = FALSE)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

[1] 288.8194 Time difference of 0.004317999 secs [1] 2886.896 Time difference of 0.005000114 secs [1] 28867.66 Time difference of 0.006437063 secs

## Problem 2: Which approach

Imagine you're an a tenure-track academic researcher. To do some key research, you need to implement code to compute a function $f(n)$, and use it to evaluate $\alpha = f(1) + f(2) + \cdots + f(100)$.

You can implement this code in two ways:

STRATEGY 1: Work fulltime for one month implementing and debugging a very fast algorithm in C, so that actually computing $\alpha$ only takes $1$ hour on your laptop.

STRATEGY 2: Spend about 3 hours writing a non-optimized Python program (that calls some other slow code), then let your code run on a cluster for 1 month (running the program costs \$2K in grant money).

Come up with a creative (but serious) argument in favor of each of the above strategies.

(How this will be graded: (a) you may lose points for mistakes in English spelling and grammar, (b) you must have put real thought into the problem.)

## 10 Points Total

#### Part A and B (5 points each)

Award points based on their thoughtfulness.

Award:

• 5 points for a well reasoned answer describing situations for each strategy.
• ~3 points if they wrote some reasons down
• ~1 point if they wrote something trivial

## Problem 3:

Given a rough order of magnitude estimate for how long each of the following should take:

1. Doing a Google search for math software (just the time that Google reports it having taken).
2. A server in Seattle pinging a server in California (both with optimal network connections).
3. A server in Seattle pinging a server in Japan (both with optimal network connections).
4. Adding together 1 million double precision floating point numbers using pure Python.
5. Adding together 1 million double precision floating point numbers using Cython (or C) with type declarations.
6. Computing the determinant of a 100x100 matrix whose entries are random single-digit integers (using Sage). No rounding errors.
7. 1 million separate writes of random numbers to an SSD disk. By "separate write" we mean that you flush the write to disk every time. You can lookup specs for number of IO operations per second for some common SSD's by searching on Google.

## 10 Points Total

Award:

• 1 point per part if their answer is within an order of magnitude of the example solution.
• .5 points if it's within two orders of magnitude.
• + 3 points for completion

#### Example Solution

1. 0.1 seconds
2. 10 ms
3. 100 ms
4. 100 ms
5. 25 ms
6. 100 ns
7. 2 sec

Problem 4: Techniques used by projects

Track down projects that use various tools that we discussed this week.

1. Give five examples of existing open source Python projects that use Cython. What do they use Cython for?

2. Give five examples of existing open source Python projects that use Numpy. What is something they use Numpy for?

3. Give three examples of existing open source Python projects that use Numba. How do they use numba?
%html
<font color="red"><h2>15 Points Total</h2>
<h4>Part 1, 2, and 3</h4>
<p>Award:
<ul>
<li>1 point per example</li>
<li>+ 2 points for completion</li>
</ul>
</p>
<p>
If their example is not found in the example solution, briefly try to find out if their example works. Do not take off points for repeats. For example if a project uses both Cython and Numpy, it may be used for part 1 and part 2.
</p>
</font>


## 15 Points Total

#### Part 1, 2, and 3

Award:

• 1 point per example
• + 2 points for completion

If their example is not found in the example solution, briefly try to find out if their example works. Do not take off points for repeats. For example if a project uses both Cython and Numpy, it may be used for part 1 and part 2.

#### Solution:

Part 1: Any from this list

Part 2: Any from this list

Part 3: Some examples below.

## Problem 5:

1. Write a simple naive function using a for loop to compute $s(n) = 1 + 2 + 3 + \cdots + n$, the sum of the first $n$ integers. Do not use the formula $s(n)=n(n+1)/2$. Use %timeit and/or %time to see how long your function takes when $n=10^5$ and $n=10^6$.
2. Ensure that the numbers are of type Python int in part 1 (rather than Sage's Integers). How much does that improve the timings?
3. Use Numba to try to speed up your Python function and time it (put %python at the top of any cell in which you use numba). Does Numba help?
4. Compile the exact same function using Cython and time it.
5. Add Cython type declarations, so assume that everything is cdef long, and compile it.
6. Do 1-5 above again, but using the formula $s(n)=n(n+1)/2$ instead of a for loop.

## 20 Points Total

#### Part 1, 2, 3, 4, 5 (10 points, 2 points per part)

Each of these parts requires its own implementation. If each part has a similar error, only deduct points for one of them.

Award:

• 2 points if they have a correct implementation and use %timeit
• 1.5 points if their implementation is nearly correct and use %timeit
• 1 point if they are missing one of the requirements

#### Part 6 (10 points, 2 points per copy of parts 1-5)

This is the same as parts 1-5 except as one part.

Award points identically to above.

# Sample Solution
def sum_basic(n):
sum = 0
for x in [1..n]: # Using this notation iterates over sage Integers
sum += x
return sum

sum_basic(10)

55
def sum_int(n):
sum = int(0)
for x in range(n + 1): # Using range iterates over python ints
sum += x
return sum

sum_int(10)

55
%python

import numba
@numba.jit
def sum_numba(n):
sum = int(0)
for x in range(n + 1):
sum += x
return sum

sum_numba(10)

55
%cython
import math
def sum_numba(n):
sum = int(0)
for x in range(n + 1):
sum += x
return sum


Defined math, sum_numba
Auto-generated code...
%cython
import math
def sum_numba(long n):      # Don't take off any points if they used 'cdef int' instead of 'cdef long'
cdef long sum = 0       # -0.5 if this isn't a long
cdef long x = 0         # Don't take off any points if they used 'cdef int' instead of 'cdef long'
for x in range(n + 1):
sum += x
return sum

Defined math, sum_numba
Auto-generated code...

### %timeit usage should be similar to above in problem 1

︠17f2692a-e3cc-422f-b5fd-cebaac87f28c︠