Arrays with Numpy#
Numpy is a ubiquitously used Python library, especially in the scientific community. It provides fast multidimensional arrays of data such as integers or floating-point values. Many other Python libraries for higher-level functionality, such as plotting or fitting work with Numpy arrays.
Numpy is typically imported under the name np
:
import numpy as np
Creating arrays#
Overview#
Commonly, Numpy arrays are created with either a pattern, or from an existing Python array-like such as a list
:
a = np.array([1.5, 3.5, 4.5])
# Just typing the name of a variable will print its representation
a
array([1.5, 3.5, 4.5])
More examples:
print(np.zeros(4))
print(np.ones(3))
print(np.arange(5))
print(np.arange(1.5, 1.6, step=0.01))
print(np.linspace(100, 1000, num=7))
print(np.random.rand(4)) # uniform in [0,1)
[0. 0. 0. 0.]
[1. 1. 1.]
[0 1 2 3 4]
[1.5 1.51 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59 1.6 ]
[ 100. 250. 400. 550. 700. 850. 1000.]
[0.01005484 0.96231747 0.78423106 0.90032438]
Multidimensional arrays are created either by specifying a shape
argument, or by using reshape
:
a = np.zeros(shape=(4, 3))
a
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
a = np.array([1, 2, 3, 4, 5, 6]).reshape((2, 3))
a
array([[1, 2, 3],
[4, 5, 6]])
It is important to note that the last size in the shape
argument is the inner index, i.e., Numpy uses C-style order for arrays (unless specified otherwise, see the order
parameter, e.g., https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html).
Some functions do not have a shape
argument but accept multiple sizes instead:
np.random.rand(4, 3)
array([[0.5507634 , 0.33116666, 0.88744019],
[0.27438253, 0.79501575, 0.69155646],
[0.70213406, 0.7137517 , 0.00354103],
[0.44661292, 0.57864531, 0.31430209]])
Data types#
Numpy arrays support a variety of data-types (dtype
).
The most commonly used ones are integers and floating-point numbers in either double-precision or single-precision.
By default dtype
is inferred from the values used for initialization.
To control it, many functions support a dtype
keyword argument.
For explicit conversion astype()
can be used:
print(np.arange(5))
print(np.arange(5.0))
print(np.arange(5, dtype=np.float32))
print(np.arange(5).astype(np.float32))
[0 1 2 3 4]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
Note that the printed representation does not include the precision (32-bit vs. 64-bit).
The dtype
property provides this:
print(np.arange(5.0).dtype) # 64-bit since Python's float is 64 bit
print(np.arange(5, dtype=np.float32).dtype)
float64
float32
Slicing and views#
Basics#
Much of Numpy’s power comes from its convenient slicing-syntax, a generalization of slicing known from the Python builtin list
.
A recap of slicing for list
:
l = list(range(8)) # NOT a numpy array
print(l) # the full list
print(l[4]) # fifth item in the list
print(l[-2]) # second item from the back of the list
print(l[:2]) # range -> first two items
print(l[3:5]) # range -> items 3 to 5 (exclusive)
print(l[1:7:2]) # range -> items 1 to 7 (exclusive), with stride 2 ("every other")
[0, 1, 2, 3, 4, 5, 6, 7]
4
6
[0, 1]
[3, 4]
[1, 3, 5]
Exercise (1): Slicing a 1-D Numpy array
Experiment with accessing and slicing a 1-D Numpy array with indices and slices (using the begin:end
notation):
a = np.arange(8)
print(a)
print(a[3])
[0 1 2 3 4 5 6 7]
3
Solution:
Show code cell content
print(a[-2]) # second item from the back of the list
print(a[:2]) # range -> first two items
print(a[3:5]) # range -> items 3 to 5 (exclusive)
print(a[1:7:2]) # range -> items 1 to 7 (exclusive), with stride 2 ("every other")
6
[0 1]
[3 4]
[1 3 5]
For multidimensional arrays, we can specify indices and slices for each dimension separately. Consider a 2-D array:
a = np.arange(8).reshape((2, 4))
a
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
With just a single index, the outer dimension is sliced:
a[1]
array([4, 5, 6, 7])
Slicing again would return an individual element of the array:
a[1][1]
5
It is more convenient and common to use a different syntax though, separating indices or slices for each dimension by a comma:
a[1, 1]
5
Exercise (2): Multi-dimensional slicing with ranges
Combine range-based slicing (using begin:end
) with the ,
(comma) notation shown above:
Solution:
Show code cell content
a = np.arange(12).reshape((3, 4))
print(a[1:3, 1:3]) # range for both dimensions
print(a[:, 1]) # full range for first index, single index for second
print(a[:, 1:3]) # full range for first index, range for second
print(a[:-1, :-1]) # everything but last row and column
print(a[1:-1, 1:-1]) # remove one on all sides
print(a[::2, ::2]) # every other row and column ("stride 2")
[[ 5 6]
[ 9 10]]
[1 5 9]
[[ 1 2]
[ 5 6]
[ 9 10]]
[[0 1 2]
[4 5 6]]
[[5 6]]
[[ 0 2]
[ 8 10]]
To slice only the inner dimension we can specify the “full range” for the other dimension(s):
a[:, 1:3]
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
Modifying arrays and slices#
Consider a 2-D array, initialized with zeros:
a = np.zeros(12).reshape((3, 4))
a
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Numpy arrays support a large variety of mathematical operations. The simplest examples are operations between arrays and scalars:
a += 0.5
a
array([[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5]])
Numpy is broadcasting the scalar value to the shape of the array, i.e., it is (conceptually) duplicated and added to each element of the array. This is also supported for slices:
a[1] *= 10.0
a
array([[0.5, 0.5, 0.5, 0.5],
[5. , 5. , 5. , 5. ],
[0.5, 0.5, 0.5, 0.5]])
WARNING At this point, an important difference between slicing Python container and slicing Numpy arrays must be highlighted. Slicing a Numpy arrays returns a view, it does not copy the data. Consider:
lis = list(range(4))
subsection = lis[:2]
subsection[1] = 666
lis
[0, 1, 2, 3]
arr = np.array(range(4))
subsection = arr[:2]
subsection[1] = 666
arr
array([ 0, 666, 2, 3])
While this is convenient and crucial for efficient operation, it also represents a common pitfall to be aware of.
To obtain an actual copy of a slice, the copy()
method can be used.
Let use return to operations with the 2-D array a
:
a
array([[0.5, 0.5, 0.5, 0.5],
[5. , 5. , 5. , 5. ],
[0.5, 0.5, 0.5, 0.5]])
Operations between slices of matching shape are also supported:
a[0] + a[2] # returns a new 1-D array
array([1., 1., 1., 1.])
a[0] += a[2] # modify first row in-place
a
array([[1. , 1. , 1. , 1. ],
[5. , 5. , 5. , 5. ],
[0.5, 0.5, 0.5, 0.5]])
The same can be done for the inner dimension:
a[:, 0] += a[:, 1]
a
array([[ 2. , 1. , 1. , 1. ],
[10. , 5. , 5. , 5. ],
[ 1. , 0.5, 0.5, 0.5]])
We can also attempt to mix rows and columns. In this case, the number of rows and columns in the array is different, so we get an error:
a[:, 1] += a[1]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[27], line 1
----> 1 a[:, 1] += a[1]
ValueError: operands could not be broadcast together with shapes (3,) (4,) (3,)
This is actually a fortunate situation, because it indicates that something is (probably) wrong, instead of silently corrupting scientific data and results: If shapes happen to match, we may end up accidentally combining data that should not have been combined. The most common situation where this may occur is with square arrays:
xy = np.array([[1, 2], [3, 4]])
yx = xy.T # transpose
(
xy - yx
) # operation succeeds even though we are subtracting elements that were transposed!
array([[ 0, -1],
[ 1, 0]])
This may be intentional, but in many cases it may be a source of bugs when using Numpy. Higher-level libraries such as xarray building on top of Numpy have been developed as a remedy.
Exercise (3): Operations with slices of 3-D arrays
Create a 3-D array.
Try various operations and slicing as shown above, but with the 3-D array.
Solution:
Show code cell content
a3d = np.random.rand(3, 4, 5)
a3d[1:, 1:, 1:] *= 0.0
a3d
array([[[0.76493293, 0.26474991, 0.01633856, 0.70785937, 0.03169351],
[0.94956709, 0.12481299, 0.76470351, 0.21327206, 0.4260095 ],
[0.44957466, 0.69853004, 0.68411662, 0.94023961, 0.01971502],
[0.1560715 , 0.94028858, 0.19429756, 0.79213188, 0.41628459]],
[[0.61730426, 0.35090472, 0.59931171, 0.90178167, 0.53899403],
[0.42245438, 0. , 0. , 0. , 0. ],
[0.9287292 , 0. , 0. , 0. , 0. ],
[0.35458287, 0. , 0. , 0. , 0. ]],
[[0.32616147, 0.58284237, 0.64590523, 0.01093671, 0.32926251],
[0.69490124, 0. , 0. , 0. , 0. ],
[0.16999864, 0. , 0. , 0. , 0. ],
[0.16710029, 0. , 0. , 0. , 0. ]]])
Exercise (4): Using slicing to compute derivatives
Compute the second derivative (Laplace stencil, \((a_{i-1} - 2 a_i + a_{i+1})/h^2\)) using slicing.
Exclude the first and last point for simplicity.
There are (at least) two ways to solve this, one of them is simpler.
Note that Numpy has some built-in ways to compute derivatives, e.g., https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html, but here the goal is to practice the slicing notation, so we compute a derivative by hand.
h = 0.01
a = np.sin(np.arange(0, 10, step=h))
Solution:
Show code cell content
# Option 1: Using loops
num = len(a) - 2
result1 = np.zeros(num)
for i in range(num):
result1[i] = (a[i] - 2 * a[i + 1] + a[i + 2]) / h**2
# Option 2 (recommended): The "Numpy" way
result2 = (a[2:] - 2 * a[1:-1] + a[:-2]) / h**2
Optional advanced task: Use a 2-D array and compute (a) 1-D second derivative along the first axis, (b) along the second axis, and (c) the 2-D second derivative (2-D Laplace stencil).
Loading files#
Numpy can, e.g., load data from text files:
filename = "numpy-sample-data.txt"
a = np.loadtxt(filename)
a
array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.],
[1., 2., 3.]])
By default, this loads data as a 2-D array if there are multiple columns.
loadtxt has many options to control how data is loaded.
On example is the unpack
option, to unpack columns into separate variables:
col1, col2, col3 = np.loadtxt(filename, unpack=True)
print(col1)
[1. 4. 7. 1.]
Numpy also supports a native format for saving and loading Numpy arrays, numpy.save and numpy.load.
More functions#
We give a couple more examples from the extensive set of operations Numpy supports.
Transpose#
numpy.transpose is used to transpose arrays:
np.transpose(a)
array([[1., 4., 7., 1.],
[2., 5., 8., 2.],
[3., 6., 9., 3.]])
Note that this does not transpose the data in the array, but rather returns a transposed view, which could be used to modify the original array:
np.transpose(a)[0] = 0.0 # zero first row of transpose, i.e., first column of `a`
a
array([[0., 2., 3.],
[0., 5., 6.],
[0., 8., 9.],
[0., 2., 3.]])
Element-wise mathematical operations#
Numpy supports unary functions such as numpy.exp. These functions are simply applied to each element of an array:
np.exp(a)
array([[1.00000000e+00, 7.38905610e+00, 2.00855369e+01],
[1.00000000e+00, 1.48413159e+02, 4.03428793e+02],
[1.00000000e+00, 2.98095799e+03, 8.10308393e+03],
[1.00000000e+00, 7.38905610e+00, 2.00855369e+01]])
Note that in this case Numpy returns a new array, i.e., a
is not modified.
If this is desired, it can be achieved using the optional out
argument supported by many functions, including exp
.
Reduction operations#
numpy.sum is an example of a reduction operation. It supports summing all elements, or just selected axes:
print(np.sum(a))
print(np.sum(a, axis=0))
print(np.sum(a, axis=1))
38.0
[ 0. 17. 21.]
[ 5. 11. 17. 5.]
Note that axis 0 is the outermost dimension (leftmost in the shape
property), not to be confused with the data layout where the rightmost index is the innermost/fastest.
Exercise (5): Putting things together
Compute the (row) sum of the exponential of all but the last row of a
.
Solution:
Show code cell content
np.sum(np.exp(a[:-1]))
11666.358460234544