# Python Lesson 2

## 1 Lesson outline

- Working with pics.
- More about NumPy.
- Introduction to data representation with
`matplotlib`

. - Exercises

## 2 Working with pics

Import `NumPy`

as in the previous lesson and `pyplot`

and `image`

libraries from `matplotlib`

.

import numpy as np from matplotlib import pyplot as plt # import pyplot function from matplotlib library from PIL import Image

We read a `png`

figure into an array using the `imread`

function and display the image using `imshow`

. The picture can be downloaded from this link.

imgarray=np.array(Image.open("./iberian-lynx.png")) imgplot=plt.imshow(imgarray)

Let’s examine the array shape

imgarray.shape

The `(M, N, 3)`

values mean that this is a `M x N`

pixels figure with RGB colors. The colors in `png`

files are given by three integer values (red, green, and blue channels) per pixel that vary from 0 to 255. Let’s now perform some basic manipulation of this array. We select each of the three RGB channels and transform them to a 2D array, that is displayed as a heat map

red_imgarray, gr_imgarray, bl_imgarray = imgarray[:,:,0], imgarray[:,:,1], imgarray[:,:,2] plt.imshow(red_imgarray, cmap='gray') plt.show() plt.imshow(gr_imgarray, cmap='gray') plt.show() plt.imshow(bl_imgarray, cmap='gray')

You can change to any other color map

imgplot=plt.imshow(red_imgarray,cmap="hot")

## 3 More about NumPy

Apart from reading data from files or, as we will see in the next lesson, transforming native Python structures into NumPy ndarrays using `np.array`

NumPy provides a set of commands for the creation of arrays

`ones`

- Given array dimensions, it outputs an array with the given shape filled with the value 1.
`ones_like`

- Given an array, it outputs an array with the same dimensions and filled with the value 1.
`zeros`

- Given array dimensions, it outputs an array with the given shape filled with the value 0.
`zeros_like`

- Given an array, it outputs an array with the same dimensions and filled with the value 0.
`empty`

- Given array dimensions, it outputs an array with the same dimensions and with empty values (unitialized, be careful, getting into the wild side…).
`empty_like`

- Given an array, it outputs an array with the same dimensions and with unitialized values.
`full`

- Given array dimensions, it outputs an array with the same dimensions and with all elements equal to a given value.
`full_like`

- Given an array, it outputs an array with the same dimensions and with all elements equal to a given value.
`eye`

,`identitiy`

- Given a square array dimension, it outputs a unit array (diagonal array) with the given shape.
`arange`

- Given
*start*,*stop*[ and*step*] values, creates a 1D ndarray of evenly spaced values with*start*as its first element,*start + step*the second,*start + 2 step*the third, and so on. `linspace`

- Given
*start*,*stop*[ and*N*] values, creates a 1D ndarray of exactly*N*evenly spaced values with*start*as its first element and*stop*as the last one.

NumPy offers many types of data, with different `dtype`

, for its storage in arrays. We are mainly interested in numerical data types, that are indicated by the prefix *float* (floating point numbers) or *int* (exact integer numbers) followed by a number indicating the number of bits per element. The standard double-precision floating point value is *float64* (requires storage in 8 bytes) and the standard integer is *int64*. NumPy accepts complex values.

One of the main advantages of NumPy is *vectorization*, the possibility of performing simultaneously batches of operations in arrays without explicit loops. For example, we define two arrays of normally distributed random numbers. Both have mean value equal to 2 and one has standard deviation equal to 1 and the other to 1.5. We then perform some operations with them. Note that we will explain a better way of generating random numbers in Lesson 5.

array_a = np.random.normal(loc = 2, scale = 1, size = (3, 3)) array_b = np.random.normal(loc = 2, scale = 1.5, size = (3, 3)) # print(array_a, "\n\n", array_b, "\n\n", 10.0/(array_a + array_b)) # print("\n\n") # print(array_a, "\n\n",array_b,"\n\n", np.sqrt(array_a**2 + array_b**2))

The function `np.sqrt`

is an example of what is called an *universal function (ufunc)* that performs element-wise operations in data arrays. You can find a list of such NumPy functions in https://docs.scipy.org/doc/numpy-1.14.0/reference/ufuncs.html. Among them you can find the mathematical constants `np.pi`

and `np.e`

and the imaginary unit denoted as `1j`

..

One needs to be very aware that when working with NumPy arrays -and other data structures- Python uses the so called *pass by reference* and not the *pass by value* strategy of other programming languages. This means that an assignment implies a reference to data in the righthand side. This is completely different of what happens when we work with scalar data. If we execute

scalar_c = 8.5 scalar_c_2 = scalar_c array_c = array_b[:2,:2] # print("array_b = ", array_b, "\n\n","array_c = ", array_c) print("scalar_c = ", scalar_c, "\n\n","scalar_c_2 = ", scalar_c_2) # print("\n\n") # array_c[:] = 100.0 scalar_c_2 = 100.0 # print("array_b = ", array_b, "\n\n","array_c = ", array_c) print("scalar_c = ", scalar_c, "\n\n","scalar_c_2 = ", scalar_c_2)

Therefore `array_b`

and `array_c`

are bound to the same `ndarray`

object. This is due to the need of optimizing the work with large matrices. A side effect of this is that you cannot assign values to elements of an array that has not been previously created (the function `np.zeros`

is often used for this purpose). If you want a copy of the original matrix you can use the `copy`

method

array_d = array_a[:2,:2].copy() # print(array_a, "\n\n", array_d) # print("\n\n") # array_d[:] = 1000.0 # print(array_a, "\n\n", array_d)

NumPy also allows to index using integer arrays, something called *fancy indexing*. In this case the resulting array is copied and it is not a reference to the original array. This can be seen in the following example

array_e = np.empty((10,10)) for value in range(10): array_e[:, value] = value print(array_e) array_f = array_e[2:5,[-1,5,2,3,2]] # Selecting a subset of columns and slicing the rows print(array_f) print(array_e)

NumPy arrays can be transposed using the `transpose`

method or the special `T`

attribute

print(array_a) print(array_a.transpose()) print() print(array_f) print(array_f.T)

This is useful for example when computing the inner matrix product using `np.dot`

print(np.dot(array_a.T, array_a)) print("") print(np.dot(array_f.T, array_f))

However, to perform matrix multiplication it is preferred using `np.matmul`

or
the `a @ b`

notation.

Two or more NumPy arrays can also be concatenated, building up a large array from smaller ones. This can be done with the `hstack`

and `vstack`

methods. To do so we create arrays of random numbers from a normal distribution with zero mean and unity standard deviation using the `np.random.randn`

function

arr_a = np.random.randn(2,4) arr_b = np.random.randn(2,4) arr_horizontal=np.hstack((arr_a,arr_b)) print(arr_horizontal) arr_vertical=np.vstack([arr_a,arr_b]) print(arr_vertical)

Notice that in the `hstack`

(`vstack`

) case the number of rows (columns) in the
arrays combined should be the same. These two are convenience functions,
wrappers to the more general function `concatenate`

arr_v = np.concatenate([arr_a,arr_b]) arr_h = np.concatenate([arr_a,arr_b],axis=1) print(arr_h) print(arr_v)

Data in an array can also be flattened, tranforming the array into a vector (a one-dimensional array). This can be done with the NumPy `ravel`

or `flatten`

functions, both can act as a function or an array method.

arr_c = np.random.randn(4,4) vec_c_0 = arr_c.ravel() # Equivalent to np.ravel(arr_c) vec_c_1 = arr_c.flatten() # Equivalent to np.flatten(arr_c) if (np.array_equal(vec_c_0, vec_c_1)): # Comparing two arrays. print(vec_c_0)

Note how we check if the two vectors created are equal. The NumPy
function `np.array_equal`

check if two arrays have identical shape and
elements. You cannot check if two arrays are equal using the usual
`==`

conditional operator (try it). Both methods leave `arr_c`

unchanged, but the `ravel`

method provides an `ndarray`

vector with
access to the original data, while `flatten`

copy the data and creates
an independent object.

print(arr_c) vec_c_0[0] = 1000.0 print(arr_c) vec_c_1[0] = 10.0 print(arr_c)

The comparison between arrays yields Boolean arrays

print(array_a, "\n\n",array_b, "\n\n",array_a > array_b)

And you can use this Boolean arrays for indexing. In the example that follows we define a new matrix that only has negative non-zero elements, replacing the positive elements by zero.

boolean = array_a > 0 print(boolean) array_e = array_a.copy() array_e[boolean] = 0 print(array_a, "\n\n",array_e)

This is called *vectorized computation*, one of the greatest advantages of NumPy. We can, for example, select the positive elements of an array
If you want to create a new array with the same shape of `arr_c`

and with 0 in negative elements and 1 in positive elements you can easily do this in vectorized form, without loops (see Lesson 3)

arr_e = np.copy(arr_c) arr_e[arr_c>0]=1 arr_e[arr_c<0]=0 print(arr_e)

Working with arrays you can construct complex conditionals combining simpler expressions with the logical operators `&`

(*and*) and `|`

(*or*) (the keywords `and`

and `or`

do not work in this context. For example

arr_f = np.copy(arr_c) bool_mask = (arr_c > 1) | (arr_c < -1) arr_f[bool_mask] = 2.0 print(arr_c,"\n", arr_f)

Selecting data with Booleans arrays always creates a copy of the original date, even if the data are unchanged.

Be aware that Boolean selection will NOT fail if the Boolean array has not the correct shape and this can be error prone. We will learn a better way for doing this in Lessons 3 and 5, using the `np.where`

function.

Exercise 2.1 |
Create a 10 dimensional square array with Gaussian random values with zero mean and standard deviation equals to two. Replace the values that are within one standard deviation of the mean value by the integer value `1` , those between one and two standard deviation with the integer `2` , and those beyond two standard deviations with `3` . |

Exercise 2.2 |
Once an image is read into an array, one can easily apply filters to the image using `NumPy` . In the case of the lynx cub image read, copy the data read into a new array and, using `NumPy` , set an upper limit, `uplim` , and a lower limit, `lowlim` , transforming all image array data lower than `lowlim` to 0 and those higher than `uplim` to `255` and check the changes in the image. |

## 4 Basic Data Plotting

As in the first lesson, we read one of the files with monthly temperature data and strip the year (first column) from the array.

metdata_orig = np.loadtxt(fname='files/TData/T_Alicante_EM.csv', delimiter=',', skiprows=1) metdata = metdata_orig[:,1:]

We can plot the array directly as a heat map

plt.imshow(metdata)

Using a different color map

```
plt.imshow(metdata, cmap="BrBG")
```

This is of limited utility. Let’s compute and plot the mean monthly temperatures

ave_monthly = np.mean(metdata, axis=0) ave_monthly_plot = plt.plot(ave_monthly)

and the average annual temperatures

ave_annual = np.mean(metdata, axis=1) ave_annual_plot = plt.plot(ave_annual)

In the same fashion we can also plot the maximum and minimum monthly temperatures

max_monthly = np.max(metdata, axis=0) min_monthly = np.min(metdata, axis=0) max_monthly_plot = plt.plot(max_monthly) min_monthly_plot = plt.plot(min_monthly)

And the annual maximum and minimum temperatures

max_annual = np.max(metdata, axis=1) min_annual = np.min(metdata, axis=1) max_annual_plot = plt.plot(max_annual) min_annual_plot = plt.plot(min_annual)

Exercise 2.3 |
Plot the monthly and annual difference between max and min temperatures as a function of the month (1-12) and the year (1961-2096), respectively. In this case try to combine the `plt.plot` and `plt.scatter` functions. Hint: the plot function accept the syntax `plt.plot(x,y)` . |

This is the most basic plotting in `pyplot`

. You can improve the figure appearence as follows

fig, ax = plt.subplots() ax.plot(max_monthly) ax.plot(min_monthly) ax.set_title("Cyprus Temperature Dataset") ax.set_xlabel("Month (0-11)") ax.set_ylabel("Max and Min average T (ºC)")

Exercise 2.4 |
Plot the standard deviation of the monthly and annual temperatures as a function of the month (1-12) and the year (1961-2096), respectively. Hint: check the std function in NumPy. |

We can combine several plots in a multi-panel figure

fig, ax = plt.subplots(nrows=2, ncols=2) fig.tight_layout(pad=3.0) ax[0,0].plot(max_monthly) ax[0,1].plot(min_monthly) ax[1,0].plot(max_annual) ax[1,1].plot(min_annual)

Exercise 2.5 |
Prepare a plot with two panels (arranged as you wish) which depicts the annual dependence of the average Spring and Fall temperatures for meteorological seasons: Spring (Mar, Apr, May) and Fall (Sep, Oct, Nov). |

## 5 Exercises

-Exercise 2.1:: Create a 10 dimensional square array with Gaussian random values with zero mean and standard deviation equals to two. Replace the values that are within one standard deviation of the mean value by the integer value `1`

, those between one and two standard deviation with the integer `2`

, and those beyond two standard deviations with `3`

.

-Exercise 2.2:: Once an image is read into an array, one can easily apply filters to the image using `NumPy`

. In the case of the lynx cub image read, copy the data read into a new array and, using `NumPy`

, set an upper limit, `uplim`

, and a lower limit, `lowlim`

, transforming all image array data lower than `lowlim`

to 0 and those higher than `uplim`

to `255`

and check the changes in the image.

-Exercise 2.3:: Plot the monthly and annual difference between max and min temperatures as a function of the month (1-12) and the year (1961-2096), respectively. In this case try to combine the `plt.plot`

and `plt.scatter`

functions. *Hint: the plot function accept the syntax* `plt.plot(x,y)`

.

-Exercise 2.4:: Plot the standard deviation of the monthly and annual temperatures as a function of the month (1-12) and the year (1961-2096), respectively. *Hint: check the std function in NumPy*.

-Exercise 2.5:: Prepare a plot with two panels (arranged as you wish) which depicts the annual dependence of the average Spring and Fall temperatures for meteorological seasons: Spring (Mar, Apr, May) and Fall (Sep, Oct, Nov).

Created: 2022-10-10 Mon 13:46