Introduction to numpy

In your homework, you performed your first bit of data analysis (congratulations!). You learned how to use lists and dictionaries, and processed through data using for loops and if statements.

  • How can we make this process more efficient?

  • Also, what do we do if we need to analyze 100 clouds? 10000 clouds? 1,000,000,000,000,000 clouds?


What are numpy and numpy arrays?

Numpy provides

  • extension package to Python for multi-dimensional arrays

  • closer to hardware (efficiency)

  • designed for scientific computation (convenience)

  • Also known as array oriented computing

In [1]:
import numpy as np
a = np.array([0, 1, 2, 3])
a
Out[1]:
array([0, 1, 2, 3])

For example, a numpy array may contain:

  • values of an experiment/simulation at discrete time steps

  • signal recorded by a measurement device, e.g. weather station data

  • pixels of an image, grey-level or color, e.g. a satellite image

  • 3-D data measured at different X-Y-Z positions, e.g. climate model data

  • ...

Why it is useful: Memory-efficient container that provides fast numerical operations. The numpy code base is written in C, which is highly optimized for the cpu.

In [2]:
L = range(10000)
In [3]:
%timeit [i**2 for i in L]
100 loops, best of 3: 2.87 ms per loop
In [4]:
a = np.arange(10000)
In [5]:
%timeit a**2
The slowest run took 18.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.62 µs per loop

Reference documentation

In [6]:
# Get some help on np.array without going to the web
np.array?
  • Looking for something:
In [7]:
np.lookfor('create array') 
Search results for 'create array'
---------------------------------
numpy.array
    Create an array.
numpy.memmap
    Create a memory-map to an array stored in a *binary* file on disk.
numpy.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.fromiter
    Create a new 1-dimensional array from an iterable object.
numpy.partition
    Return a partitioned copy of an array.
numpy.ctypeslib.as_array
    Create a numpy array from a ctypes array or a ctypes POINTER.
numpy.ma.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.ma.make_mask
    Create a boolean mask from an array.
numpy.ctypeslib.as_ctypes
    Create and return a ctypes object from a numpy array.  Actually
numpy.ma.mrecords.fromarrays
    Creates a mrecarray from a (flat) list of masked arrays.
numpy.lib.format.open_memmap
    Open a .npy file as a memory-mapped array.
numpy.ma.MaskedArray.__new__
    Create a new masked array from scratch.
numpy.lib.arrayterator.Arrayterator
    Buffered iterator for big arrays.
numpy.ma.mrecords.fromtextfile
    Creates a mrecarray from data stored in the file `filename`.
numpy.asarray
    Convert the input to an array.
numpy.ndarray
    ndarray(shape, dtype=float, buffer=None, offset=0,
numpy.recarray
    Construct an ndarray that allows field access using attributes.
numpy.chararray
    chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0,
numpy.pad
    Pads an array.
numpy.asanyarray
    Convert the input to an ndarray, but pass ndarray subclasses through.
numpy.copy
    Return an array copy of the given object.
numpy.diag
    Extract a diagonal or construct a diagonal array.
numpy.load
    Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
numpy.sort
    Return a sorted copy of an array.
numpy.array_equiv
    Returns True if input arrays are shape consistent and all elements equal.
numpy.dtype
    Create a data type object.
numpy.choose
    Construct an array from an index array and a set of arrays to choose from.
numpy.nditer
    Efficient multi-dimensional iterator object to iterate over arrays.
numpy.swapaxes
    Interchange two axes of an array.
numpy.full_like
    Return a full array with the same shape and type as a given array.
numpy.ones_like
    Return an array of ones with the same shape and type as a given array.
numpy.empty_like
    Return a new array with the same shape and type as a given array.
numpy.zeros_like
    Return an array of zeros with the same shape and type as a given array.
numpy.asarray_chkfinite
    Convert the input to an array, checking for NaNs or Infs.
numpy.diag_indices
    Return the indices to access the main diagonal of an array.
numpy.chararray.tolist
    a.tolist()
numpy.ma.choose
    Use an index array to construct a new array from a set of choices.
numpy.savez_compressed
    Save several arrays into a single file in compressed ``.npz`` format.
numpy.matlib.rand
    Return a matrix of random values with given shape.
numpy.ma.empty_like
    Return a new array with the same shape and type as a given array.
numpy.ma.make_mask_none
    Return a boolean mask of the given shape, filled with False.
numpy.ma.mrecords.fromrecords
    Creates a MaskedRecords from a list of records.
numpy.around
    Evenly round to the given number of decimals.
numpy.source
    Print or write to a file the source code for a NumPy object.
numpy.diagonal
    Return specified diagonals.
numpy.einsum_path
    Evaluates the lowest cost contraction order for an einsum expression by
numpy.histogram2d
    Compute the bi-dimensional histogram of two data samples.
numpy.fft.ifft
    Compute the one-dimensional inverse discrete Fourier Transform.
numpy.fft.ifftn
    Compute the N-dimensional inverse discrete Fourier Transform.
numpy.busdaycalendar
    A business day calendar object that efficiently stores information
In [8]:
np.con*?
In [10]:
np.a*?
In [11]:
np.l*?
In [12]:
import numpy as np

values = np.array([12,321,5,236,57,2345,6345,7])
print(values)
[  12  321    5  236   57 2345 6345    7]
In [13]:
values**0.5
Out[13]:
array([  3.46410162,  17.91647287,   2.23606798,  15.3622915 ,
         7.54983444,  48.42520005,  79.65550828,   2.64575131])
In [14]:
np.sqrt(values)
Out[14]:
array([  3.46410162,  17.91647287,   2.23606798,  15.3622915 ,
         7.54983444,  48.42520005,  79.65550828,   2.64575131])
In [15]:
np.cos(values)
Out[15]:
array([ 0.84385396,  0.84855433,  0.28366219, -0.92846012,  0.89986683,
        0.19760673,  0.52578361,  0.75390225])
In [16]:
values**values
Out[16]:
array([       8916100448256, -1434141394955693759,                 3125,
                          0,  4951460147845608313,  8129662023945473129,
       -3670746008300892663,               823543])
In [17]:
values**values/values+values**0.5/values
Out[17]:
array([  7.43008371e+11,  -4.46773020e+15,   6.25447214e+02,
         6.50944555e-02,   8.68677219e+16,   3.46680683e+15,
        -5.78525770e+14,   1.17649378e+05])
In [ ]: