Often large amounts of data needs to be manipulated by addition, subtraction, division, and multiplication. In order to make programing simpler, and computation more efficient, problems are often vectorized. In other words, problems are formulated in terms of vectors, matrices, tensors, and scalers (0th order tensors), generally referred to as arrays. Operations commonly need to be applied element wise to matrix and vector quantities. With a lower level language, like C, these operations would be performed using loops, such as a for loop. However higher level languages, like NumPy in Python, allows programmers to generalize operations to apply to arrays, without need for explicit loops. In addition to allowing the programmer to avoid having to write loops, NumPy utilizes libraries that use specialized CPU routines which are more efficient than simple loops. Broadcasting refers how computations with arrays of different sizes, or shapes, are performed. Before defining specifically how broadcasting in NumPy works consider a few examples.
Here are two arrays with shape or dimension of (3,).
a*b in effect multiplies corresponding elements in each array together and results in another array of shape (3,)
This result is fairly straight forward. Notice that the array b is the same number for all array positions. Here we can use NumPy's broadcasting feature to simplify the operation. Again array a has shape (3,), but b is a scaler or a 0th-order tensor.
This is the same result as the first case, but b did not need to be defined as an array with the same shape as array a. b can be thought of as being "stretched" across the length of array a. For this example it seems trivial, but when moving to larger and higher order arrays this is a powerful tool.
The next example shows how to multiply each column of an array of shape (2,3) by a different value using broadcasting. Each element in the first column of a is multiplied by the first element of b, each element in the second column of a is multiplied by the second element of b, and each element in the third column of a is multiplied by the third element of b. This results in having the first column of a multiplied by 1, the second column multiplied by 2, and the third column multiplied by 3.
Similarly, broadcasting can be used to add 1 to the first column of a, 2 to the second column of a, and 3 to the third column of a.
In the following case broadcasting does not work.
To understand when broadcasting does and doesn't work a formal understanding is required.
The following is based on the official NumPy documentation
Broadcasting describes how NumPy treats arrays of different sizes (matrices, vectors, and tensors) during an operation.
When operating on arrays NumPy compares their shapes starting with the last dimension. Where the shape of an n-dimensional array is of the form (d1 , d2 , d3, ... , dn). In order for dimensions of two different arrays to be compatible for broadcasting one of the following two conditions must be met.
The resulting array has a size that is the maximum along each of the input arrays.
Note: The shape of any given array, a, can be found by a.shape.
In the previous example array a had shape (3,) and b was a scalar, or in otherwords of shape (1,).
A (2d array): 1x3 B (1d array): 1 a*b (2d array): 1x3
Numpy compares the last dimensions 3 and 1, the second condition for broadcasting is met. b is "stretched" to match a and the operation proceeds.
Here are a couple more generic examples.
A (3d array): 256 x 256 x 3 B (1d array): 3 Result (3d array): 256 x 256 x 3
In this case both A and B arrays have axes with length one that are expanded when broadcasting.
A (4d array): 8 x 1 x 6 x 1 B (3d array): 7 x 1 x 5 Result (4d array): 8 x 7 x 6 x 5
Now a couple examples that do not work
A (1d array): 3 B (1d array): 4
A * B will result in an error because their shapes are not compatible according to the two conditions stated earlier.
A (1d array): 2 x 1 B (1d array): 8 x 4 x 3
Here we see that the last dimension is compatible and the second to last is not.
When using broadcasting the command
np.newaxis can be helpful. The newaxis object creates an axis of length one. For example, given a 1-dimensional array with shape (x,) using new axis the shape can be changed to (1,x) or (x,1) depending on what is desired. In effect making a row or column vector.
The following example has an array, a, of shape (4,) and a second array, b, of shape (3,). According to the two conditions stated earlier broadcasting for these two matrices is not valid
An axis of length 1 can be created for array a changing its shape to (4,1) using np.newaxis to make broadcasting valid.
The result can be visualized with the following figure.
“EricsBroadcastingDoc - SciPy wiki dump,” Scipy: EricsBroadcastingDoc. [Online]. Available: http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc. [Accessed: 02-Apr-2016].
This corresponds to an "outer" operation. A common outer operation is the outer product of two vectors. Using numpy broadcasting the outer product of a and b can easily be realized.
Lets explore two practical examples that showcase how useful broadcasting can be.
This is an example taken with permission from Eli Bendersky's website from a blog post explaining broadcasting in NumPy.
E. Bendersky, “Broadcasting arrays in Numpy - Eli Bendersky’s website.” [Online]. Available: http://eli.thegreenplace.net/2015/broadcasting-arrays-in-numpy/. [Accessed: 02-Apr-2016].
Say we have a large data set; each datum is a list of parameters. In NumPy terms, we have a 2-D array, where each row is a datum and the number of rows is the size of the data set. Suppose we want to apply some sort of scaling to all this data - every parameter gets its own scaling factor; in other words, every parameter is multiplied by some factor.
Just to have something tangible to think about, let's count calories in foods using a macro-nutrient breakdown. Roughly put, the caloric parts of food are made of fats (9 calories per gram), protein (4 calories per gram) and carbs (4 calories per gram). So if we list some foods (our data), and for each food list its macro-nutrient breakdown (parameters), we can then multiply each nutrient by its caloric value (apply scaling) to compute the caloric breakdown of each food item:
Here is the first table shown above that has quantities in terms of grams (g).
Next a key to convert grams of fats, proteins, and carbs to calories is created. 9 cal per gram of fat, 4 cals per gram of protein, and 4 cals per gram of carbs.
Now, even if the table has an arbitrarily large number of entries converting grams of nutrient to calories is easy. As shown below the second table is generated.
Consider an image represented as an array of RGB values of shape (d1,d2,3), where d1 x d2 is the size of the image in pixels, and for each pixel there are three numeric values Red, Green, and Blue (RGB).
Here the SciPy library is used to import a generic image and matplotlib is used to display the image.
The shape is (768,1024,3) so the size of the image is 768 x 1024. We can use broadcasting to apply a filter to the RGB values at each pixel. As a simple example, to leave the R values and get rid of the G and B values multiply the image array by an array of shape (3,): [1,0,0].
Various other high level programming languages offer similar functionality. For Matlab/Octave users, the case of an array times a scalar works the same, but when two arrays are multiplied together the behavior is very different. Similar functionality to NumPy's broadcasting can be achieved in Matlab/Octave using the bsxfun() or the repmat() function, but it is not as straightforward or readable as the built in broadcasting for numpy. The Matlab/Octave * operator is similar to the NumPy .dot() or * using the NumPy matrix class.
For example, vectors are always matrices, and may need transposition for scalar multiplication.
Numpy does not have R's way of repeating vectors that are too short.
while this works in R
You can explore broadcasting by generating arrays with different sizes performing, an operation with them, and observing the result using the code below.
np.random.randint(N,size=(d1,d2,...,dn)) generates an array with shape (d1,d2,...,dn) filled with random integers between 0 and N.