Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Project: math480-2016
Views: 2158

Math 480: Open Source Mathematical Software

2016-04-11

William Stein

Lectures 7: Speed (part 1 of 3)

Or... why slower is often faster.

Plan:

  • Monday: benchmarking, using numpy and other libraries to make code factors

  • Wednesday: Cython

  • Friday: Other performance optimization tools (e.g., numba, pypy, Psycho?, etc.?)

Announcements

  • Start the screencast!!!! Unless you know for a fact otherwise, I forgot.

  • TA office hours: Thursday

  • "Virtual" office hours: email [email protected], [email protected], or [email protected]

  • Type in a chat in any project -- the TA and I should see a notification and help you.

  • Homework 2 will be collected at (or after) about 6pm tonight. Make sure your work is there, since recent days were a disaster (not any more!)

  • Peer grading will happen.

  • Note: filenames ending in ~ -- when updating an assignment the old files have a ~. We updated today's assignment right after making it.

This morning I woke up and read on Hacker News

"Python is slow. Painfully slow. Slow as in "it stores the entire AST at runtime and lets you modify it while the program runs" slow. But it's also great at numerical computation with numpy. And it's enormously flexible (which is what makes it slow). And it's clean. It writes really well. For a quick mathematical processing program or website whose code needs to be legible, it's great." -- [email protected]

Response by somebody else:

"Python doesn't store the entire AST at runtime. You can hook into the import process and modify code before it's compiled to bytecode. You can modify the bytecode at runtime but not any that's on the stack. Python is also not all that slow, CPython is slow-ish but it's has effectively no optimizations at all. PyPy which has decent optimizations is actually fairly fast and getting faster with every release." -- DasIch

Timing your code

Here's some code. It is intentionally naive in various ways...

def sin_sum(n): """ Return the sum sin(1) + sin(2) + ... + sin(n-1) """ return float(sum(sin(m) for m in xrange(1,n)))
sin(1) + sin(2)
sin(2) + sin(1)
range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
xrange(10^9)
xrange(1000000000)
sin(0)
0.9092974268256817
float(sin(1 + 1))
0.9092974268256817
sin_sum?

Explain this code line by line:

  • define function of one input

  • Docstring

  • sin(m) is the sage symbolic sin function

  • (sin(m) for m in range(n)) is like a list comprehension but it is "lazy" so it doesn't waste memory -- you can iterative over it though.

  • sum(...) iterates over its input summing it up

  • float(...) converts its input to a float

Throw out a mathematical question: Is there a simple formula for 0<mnsin(m)\sum_{0< m\leq n} \sin(m) similar to the formula 0<mnm=n(n1)2\sum_{0< m\leq n} m = \frac{n(n-1)}{2}?

sin_sum(10000) # this might feel slow if you try it!
1.9395054106807317
  • Is it slow?

  • Or is the network (or SMC) just slow at sending the answer back?

  • If it is slow, why?

There are many ways to time execution of code:

  1. Very naive: use a stopwatch. Start the code running and start a stopwatch on your phone or whatever. Look at the wall clock.

  2. Python: use the time module, and prints

  3. Python: use the timeit module, and better prints

  4. In worksheets: %time and %timeit

  5. In Sage: cputime and walltime functions.

  6. Using %prun in a terminal (so create a terminal and type "sage"). This is an ipython feature.

And there are also many, many ways to speed up code...

# Let's time the function in all these ways. # 1. seems like about a second....? # 2. time module import time t = time.time() # number of seconds since "the epoch" (exercise: when was that?) sin_sum(10000) print "elapsed time", time.time() - t # nice thing -- you can put time.time()'s anywhere in your code -- sort of like print statements -- for debugging performance issues..
1.9395054106807317 elapsed time 5.03362798691
import timeit timeit.timeit('sin_sum(1000)', number=50)
Error in lines 2-2 Traceback (most recent call last): File "/projects/sage/sage-6.10/local/lib/python2.7/site-packages/smc_sagews/sage_server.py", line 904, in execute exec compile(block+'\n', '', 'single') in namespace, locals File "", line 1, in <module> File "/projects/sage/sage-6.10/local/lib/python/timeit.py", line 236, in timeit return Timer(stmt, setup, timer).timeit(number) File "/projects/sage/sage-6.10/local/lib/python/timeit.py", line 201, in timeit timing = self.inner(it, self.timer) File "<timeit-src>", line 6, in inner NameError: global name 'sin_sum' is not defined
timeit.timeit('2+3', number=10000) # I'm not sure how to make this work... -- let's skip to sage's time/timeit, which basicaly makes the buitin timeit easy.
0.0002300739288330078
%time sin_sum(10000)
Error in lines 1-1 Traceback (most recent call last): File "/projects/sage/sage-6.10/local/lib/python2.7/site-packages/smc_sagews/sage_server.py", line 904, in execute exec compile(block+'\n', '', 'single') in namespace, locals File "", line 1, in <module> File "/projects/sage/sage-6.10/local/lib/python2.7/site-packages/smc_sagews/sage_server.py", line 945, in execute_with_code_decorators code = code_decorator(code) TypeError: 'module' object is not callable
# the above doesn't work because we replace %time by doing "import time".... so do this: reset('time')
%time sin_sum(10000)
1.9395054106807317 CPU time: 4.98 s, Wall time: 5.12 s CPU time: 4.98 s, Wall time: 5.12 s
%timeit sin_sum(1000)
Error in lines 1-1 Traceback (most recent call last): File "/projects/sage/sage-6.10/local/lib/python2.7/site-packages/smc_sagews/sage_server.py", line 904, in execute exec compile(block+'\n', '', 'single') in namespace, locals File "", line 1, in <module> File "/projects/sage/sage-6.10/local/lib/python2.7/site-packages/smc_sagews/sage_server.py", line 945, in execute_with_code_decorators code = code_decorator(code) TypeError: 'module' object is not callable
reset('timeit') %timeit sin_sum(1000) # runs many times so give it smaller input!
5 loops, best of 3: 79.8 ms per loop
%timeit sin_sum(100)
125 loops, best of 3: 4.49 ms per loop

Use cputime and walltime

t = cputime() sin_sum(10000) print cputime(t)
1.9395054106807317 4.464
t = walltime() sin_sum(10000) print walltime(t)
1.9395054106807317 4.84630799294

Discuss the difference between cpu and wall time.

Using %prun in a terminal

  • +New --> Terminal.

  • type sage to start sage.

  • Define the function.

  • Use %prun sin_sum(10000)

Exercise: Try to make sin_sum faster and time the result in several ways. Ideas for how:

  • replace m by float(m)

  • replace sin by math.sin

math.sin(4)
-0.7568024953079282
5/(2*10^(-3))
2500
︠fc2ecc3c-f1cd-4b0c-82df-c0ad33e4e02a︠ ︠a9e64b2c-d93e-4873-a233-6fb9ec5f327a︠ def sin_sum(n): """ Return the sum sin(1) + sin(2) + ... + sin(n-1) """ return float(sum(sin(m) for m in xrange(1,n))) def sin_sum(n): """ Return the sum sin(1) + sin(2) + ... + sin(n-1) """ return float(sum(math.sin(m) for m in xrange(1,n)))