Python Lesson 5
Table of Contents
1. Lesson outline
- Input and Output: dealing with files in Python and with NumPy
- More on NumPy
- More on Graphics
- Optimization using
Scipy
2. Writing and Reading data from files
We first explain the basics of file reading and writing in native Python and then move into the more specialized NumPy functions, better for dealing with data arrays.
2.1. File access in native Python
Using the native Python function open (with an absolute or a relative trajectory with respect to the notebook active directory) and the method close allows to set a filehandle. By default files are opened in the r mode (readonly).
path = "~/.bashrc" fileh = open(path) # fileh is the filehandle
You can treat the filehandle as an iterator and build a loop. In this case the number of characters in each line is printed
for line in fileh:
print(len(line))
It is important to close the filehandle once finished with it to return resources to the processor.
fileh.close
The usual form to open files makes unnecessary the last step, and once the file is dealt with it is automatically closed
with open(path) as fileh:
bashrc_lines = [line.rstrip() for line in fileh]
We are reading each line, removing the trailing spaces and carriage return into a list using a list comprehension.
The complete syntax for open would be open(path, mode) and the major modes are:
- mode
r - Default mode. Read only.
- mode
w - Write only. Beware, it deletes an existing file and overwrites it.
- mode
x - Write only. It exits if the file exists.
- mode
a - Write only. Append to the file if it exists.
- mode
r+ - Read and Write.
- mode
b - add to the mode for binary files. E.g. “rb”, “ab”…
You can use the file keyword argument of the print statement to write to a file handle.
with open("test_write.txt", "w") as tw_handle:
[print("The square root of {0} is {1:.5f}".format(line, np.sqrt(line)), file=tw_handle) for line in range(10)]
The write and writelines methods can be applied to a filehandle and allow to write into it. The first one expect a single string, while the second expect an iterable of string.
If, for example, we want to read a file and save it stripping the first and last lines we can run
with open("test_write_2.txt", "w") as tw_handle:
tw_handle.writelines([line for line in open(path)][1:-2])
The format statement can also be used with this methos.
with open("test_write_3.txt", "w") as tw_handle:
tw_handle.writelines(["The square root of {0} is {1:.5f}\n".format(line, np.sqrt(line)) for line in range(10)])
The most commonly used native Python file methods are
- read([size])
- Read data from file, with optional argument of the number of bytes to read. The method returns the specified number of bytes from the file. Default is -1 which means the whole file.
- readlines([size])
- Return list of lines from file, with optional argument of the number of bytes to read.
- write(str)
- Write string to file.
- writelines(strings)
- Write list of lines to file.
- close()
- Close the filehandle.
- flush()
- Flush the internal buffered I/O to the disk.
- seek(pos)
- Move to position
posin the file. - tell()
- Tell the current position in the file.
- closed
Trueif the filehandle is closed.
| Exercise 5.1 |
2.2. File access with NumPy
We have already used the NumPy function loadtxt that, together with savetxt, allows to load and save data to text files. For example
array_a = np.random.randn(3,3)
np.savetxt("array_a_saved.txt", array_a, delimiter=":")
!cat array_a_saved.txt
array_b = np.loadtxt("array_a_saved.txt", delimiter=":")
np.array_equal(array_a, array_b)
These functions are specially handy for reading csv and tsv data files
But using the load and save functions you can also save an array in binary format -saving cpu, time, and precision at the cost of not being able to open/edit the file- with standard suffix npy.
array_a = np.random.randn(5,5)
np.save("array_a_saved", array_a)
!ls -l array_a_saved.*
array_b = np.load("array_a_saved.npy")
np.array_equal(array_a, array_b)
With the function savez you can also save multiple arrays in an npz file that is not a compressed file. If you want to compress the data use the function savez_compressed
np.savez("savez_example", a=array_a, b=array_b, c=array_a)
np.savez_compressed("savez_example_compressed", a=array_a, b=array_b, c=array_a)
!ls -l savez*
Once you read them they are loaded into a hash-like NumPy object with the array argument names as keys
arr_hash = np.load("savez_example_compressed.npz")
print(type(arr_hash))
print(arr_hash["a"])
To improve portability, and especially when you read several files in your code, and probably in different code sections, it is considered a good practice to define variables for your home and data folders and then use the variables to define the path when needed. This improves the readability of your code and helps its maintenance. A possible example, using the dataset provided at the beginning of the course is the following:
home_dir = "/home/curro/" data_dir = "files/TData/" ## ## metdata_orig = np.loadtxt(fname=home_dir + data_dir + 'T_Alicante_EM.csv', delimiter=',', skiprows=1)
2.3. Saving Python objects with pickle
Pickling is used to save Python objects into a filesystem; lists, dictionaries, class objects, and more can be saved as a pickle file. Strictly speaking, pickling is the serializing and de-serializing of python objects to a byte stream. The opposite is unpickling.
This methodology is called serialization, marshalling, or flattening in other programming languages. Pickling is most useful when performing routine tasks on massive datasets or when you want to store, after sometimes massive calculations, a given Python object.
Once you import the pickle library, to save and load objects one makes use of the pickle.dump and pickle.load combined with the right filehandles. You can make this with the following two functions
import pickle
def save_obj(obj, name ):
with open('./'+ name + '.pkl', 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
def load_obj(name ):
with open('./' + name + '.pkl', 'rb') as f:
return pickle.load(f)
We can try saving a hash
test_hash = {0:"a", 1:"E",2:"I", 3: "J", 4: "H"}
save_obj(test_hash, "arr_hash_saving_test")
test_read = load_obj( "arr_hash_saving_test")
3. More on NumPy
The vectorization provided by NumPy is a great advantage when dealing
with large datasets and the computing times required can be an
order of magnitude less than the equivalent times for interpreted code
alternatives, sometimes more.
We illustrate this with a simple example, plotting the 3D function f(x,y) = np.exp(-1.5*xs ** 2 - 0.5* ys ** 2)*xs**3*ys.
We start defining the abscissa and ordinate grids
npoints = 200 # x_0, x_1 = -2, 2 points_x = np.linspace(x_0, x_1, npoints) # y_0, y_1 = -3, 3 points_y = np.linspace(y_0, y_1, npoints)
We can evaluate the function in these grids making use of for loops as in this function
def function_z(npoints_x, points_x, npoints_y, points_y):
z_2 = np.zeros((npoints_y, npoints_x))
for x_index, x_val in enumerate(points_x):
for y_index, y_val in enumerate(points_y):
z_2[y_index, x_index] = np.exp(-1.5*x_val ** 2 - 0.5* y_val ** 2)*x_val**3*y_val
return z_2
We can then obtain the required function evaluation and benchmark this option
z_2 = function_z(npoints, points_x, npoints, points_y) %timeit function_z(npoints, points_x, npoints, points_y)
Taking profit of NumPy tools for vectorization, we can use the
np.meshgrid function that builds arrays of abscissa and
ordinate values
xs, ys = np.meshgrid(points_x, points_y) # print(xs.shape, ys.shape) # z_1 = np.exp(-1.5*xs ** 2 - 0.5*ys ** 2)*xs**3*ys # %timeit np.exp(-1.5*xs ** 2 - 0.5*ys ** 2)*xs**3*ys
As the benchmark time clearly indicates, the second option is far more
effective than the first one, thanks to the use of NumPy optimized
features for calculations.
We now plot the figure using filled contours in the XY plane
plt.title("Image plot of $ x^2 y e^{-1.5 x^2 - 0.5 y^2}$ for a grid of values")
plt.contourf(points_x, points_y, z_2, 16, cmap=plt.cm.autumn); plt.colorbar()
You can plot z_1 and z_2 to check that both arrays are equal.
3.1. Random number generation
NumPy is very efficient in pseudorandom number generation as it also
applies vectorization techniques. In the course, we have already run
into the function np.random.randn, np.random.normal, and
np.random.randint. The first two generate arrays of random numbers
from a normal distribution (zero mean and unity standard deviation in
the first case, and given mean and standard deviation in the second
case) and the third one provide random integer values uniformly
distributed in a given range.
However, the preferred approach to
pseudorandom number generation is the use of default_rng, to get a
new instance of a Generator, and then call the generator methods to
obtain different distribution samples. You can find a quick reference
to random numbers in NumPy in this link. For normally distributed
standardized random data
# Define generator rng = np.random.default_rng() # # Normally distributed values normal_arr = rng.standard_normal((6,6)) normal_arr
These numbers are called pseudorandom as they are not true random
numbers. They are derived through a deterministic algorithm that makes
use of a seed value. To check or replicate your results you can fix
the seed value as an argument in the call to default_rng, as in the
next example. Using the rng generator you can select from a set of
different statistical distributions from which you can sample the data
rng = np.random.default_rng(12345) gauss_arr = rng.normal(loc = 1, scale = 2, size=(6,6)) beta_arr = rng.beta(1.0, 0.4, size=(6,6)) integer_arr = rng.integers(0, high = 6, size=(6,6)) print(gauss_arr) print(beta_arr) print(integer_arr)
This is a global random seed value, you can also create different pseudorandom number generators, isolated from each other
rng1 = np.random.default_rng(12345) rng2 = np.random.default_rng(123456789)
We can check how efficient NumPy is in pseudorandom number
generation comparing NumPy results with the results of the normalvariate
function in the random library. We can benchmark them
from random import normalvariate N = 1000000 # non-vectorized %timeit gaussian_samples = [normalvariate(0,2) for _ in range(N)] # vectorized %timeit gaussian_samples_vec = rng.normal(scale = 2, size = (N))
Notice that, depending on the system, the gain can be as large as one
order of magnitude, due to the lack of vectorization in the
normalvariate case. This approach is far slower for large
datasets. Notice also that in the list comprehension we have used the
conventional name for the loop variable in case the variable is not
used anywhere in the loop body: _.
Other available functions provided in np.random are
permutation- Return a random permutation of a sequence, or return a permuted range
shuffle- Randomly permute a sequence in-place
rand- Draw samples from a uniform distribution
randint- Draw random integers from a given low-to-high range
randn- Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
binomial- Draw samples from a binomial distribution
normal- Draw samples from a normal (Gaussian) distribution
beta- Draw samples from a beta distribution
chisquare- Draw samples from a chi-square distribution
gamma- Draw samples from a gamma distribution
uniform- Draw samples from a uniform [0, 1) distribution
| Exercise 5.2 |
3.2. Boolean indexing and np.where
Another powerful NumPy feature, already presented in Lesson 2, is the possibility of Boolean indexing. This is specially adequate when combined with the NumPy function np.where, presented in Lesson 3, a vectorized version of the standard Python ternary expression. In a previous example we replaced positive and negative values of a random array by one and minus one, respectively. This can be very conveniently done using np.where
arr_rand = rng.standard_normal((6,6)) print(arr_rand) np.where(arr_rand>0,1,-1)
You could also just replace negative values by -1
arr_rand = rng.standard_normal((6,6)) print(arr_rand) np.where(arr_rand>0,arr_rand,-1)
Avoiding the use of scalars, if we have two arrays and, depending on
the sign of a third array, we would like to choose elements from one
or the other.
arr_rand_A = rng.standard_normal((4,4)) arr_rand_B = rng.standard_normal((4,4)) arr_rand_C = rng.standard_normal((4,4)) print(arr_rand_A) print(arr_rand_B) print(arr_rand_C > 0) np.where(arr_rand_C>0,arr_rand_A,arr_rand_B)
3.3. Useful NumPy functions
There are a set of statistical and mathematical functions available in
NumPy, we quickly outline the main ones, applying them to GOE matrices
(GOE stands for Gaussian Orthogonal Ensamble, which are real, normally distributed random, and symmetric matrices). We
first define the matrix as a matrix of random normally distributed
numbers with zero mean and unity variance, and then add it to its
transpose, to get a symmetric matrix
rng = np.random.default_rng() dimension = 20 GOE_arr = rng.standard_normal((dimension, dimension)) GOE_arr = (GOE_arr + GOE_arr.T)/2
We can now compute the mean and standard deviation of the full array or just the values for the main diagonal
GOE_arr.mean() GOE_arr.std() # # Extract the diagonal of the array np.diag(GOE_arr).mean() np.diag(GOE_arr).std()
We already know that the option axis allows for limiting the calculation to rows or columns (or a given index in a multi index array)
print(GOE_arr.mean(axis=0)) print(GOE_arr.std(axis=0)) print(GOE_arr.mean(axis=1)) print(GOE_arr.std(axis=1))
Other useful functions are cumsum and cumprod for cumulative addition and product
print(GOE_arr.cumsum()) print(GOE_arr.cumprod())
Finally, another useful NumPy methods are min (argmin) and max (argmax)
# min and argmin print(GOE_arr.min()) print(GOE_arr.argmin()) # max and argmax print(GOE_arr.max()) print(GOE_arr.argmax())
The indices are provided for a flattened array, if you need the row and column index values this can be obtained using the np.unravel_index function
print(np.unravel_index(GOE_arr.argmax(), GOE_arr.shape)) print(np.unravel_index(GOE_arr.argmin(), GOE_arr.shape))
You can also sort values, using the np.sort function or the sort
method. There is a basic difference between these two. The function
provides a sorted copy of the original array, while the method
performs an in-place sort, without data copying.
# Sliced array copy GOE_arr_copy = GOE_arr[:4,:4].copy() # Notice that array is sorted by row (axis = 1) by default sorted_array = np.sort(GOE_arr_copy) print(sorted_array) # True if there are at least one non-zero element print(np.any(sorted_array-GOE_arr_copy)) # In-place sorting GOE_arr_copy.sort() print(np.any(sorted_array-GOE_arr_copy))
Numpy also provides basic linear algebra functions, e.g. you can compute the trace of a matrix
print(np.trace(GOE_arr))
You can also calculate the product of two matrices with the dot or
matmul method or function. Both provide the same result in the case
of 2D array multiplication
dot_result = np.dot(GOE_arr,GOE_arr) matmul_result=np.matmul(GOE_arr,GOE_arr) # np.all(np.equal(dot_result,matmul_result))
There are two main differences between these two functions. On the
first hand multiplication by scalars is not allowed with matmul (you should use
* instead). On the second hand, when the array dimension is larger than two,
in the matmul case the operation is broadcasted together as if the
matrices were elements residing in the last two indexes, respecting
the signature (n,k),(k,m)->(n,m). In the dot case it is a sum
product over the last axis of first matrix and the second-to-last of
the second matrix
a = rng.standard_normal((2,3,4)) b = rng.standard_normal((2,4,2)) c = np.matmul(a,b) print(c) print(c.shape) d = np.dot(a,b) print(d) print(d.shape)
In the case of two vectors, matmul provides the inner product (without taking the complex conjugate)
print(np.matmul(GOE_arr[0,:], GOE_arr[0,:]))
In Python 3.5 the @ symbol works as an infix operator for matrix multiplication
print(np.any(np.matmul(GOE_arr, GOE_arr)-GOE_arr @ GOE_arr))
In np.linalg several functions of linear algebra are found.
det- Matrix determinant
eigvals- Eigenvalues of a square matrix
eig- Eigenvalues and eigenstates of a square matrix
eigvalsh- Eigenvalues of a symmetric or Hermitian square matrix
eigh- Eigenvalues and eigenstates of a symmetric or Hermitian matrix
inv- Inverse of a square matrix
pinv- Compute the Moore-Penrose pseudo-inverse of a matrix.
qr- Compute the QR decomposition.
svd- Compute the singular value decomposition (SVD).
solve- Solve the linear system
Ax = bforx, whereAis a square matrix. lstsq- Compute the least-squares solution to
Ax = b.
For example
print(np.linalg.trace(GOE_arr)) print(np.linalg.det(GOE_arr)) print(np.linalg.eigvalsh(GOE_arr))
| Exercise 5.3 |
4. More on Graphics
We are going to elaborate on a seemingly neverending topic. You can find the complete Matplotlib documentation in https://matplotlib.org/stable/index.html We will cover some basic aspects with examples. Now, let’s load the library and create some data to display and learn some useful techniques
import numpy as np import matplotlib.pyplot as plt # x_data = np.linspace(0,4*np.pi,500) y_data = np.cos(x_data**2)*np.cosh(x_data)
Let’s first create a plot with a single panel. The recommended way is using subplot without arguments, as follows
import numpy as np
import matplotlib.pyplot as plt
#
x_data = np.linspace(0,4*np.pi,500)
y_data = np.cos(x_data**2)/np.cosh(x_data/5)
fig, ax = plt.subplots()
ax.plot(x_data, y_data, label="Some data")
ax.set_title('Single plot', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("F(w) (a.u.)", fontsize = 16) # idem for y axis
ax.legend() # Display labels
We have included a legend, the x and y axis labels, and the plot
title. There are several ways of plotting various curves in the same
panel. The easiest one is to run one plot instance for each curve, as
in the example that follows
x_data = np.linspace(0,4*np.pi,500)
y1_data = np.cos(x_data**2)/np.cosh(x_data/5)
y2_data = np.cos(x_data**3-5*x_data**2)/np.cosh(x_data/2)
fig, ax = plt.subplots()
ax.plot(x_data, y1_data, label="Data 1")
ax.plot(x_data, y2_data, label="Data 2")
ax.set_title('Single plot, several curves', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("F(w) (a.u.)", fontsize = 16) # idem for y axis
ax.legend()
We will later see other ways of plotting several curves. We can customize the style of the line and include symbols in the points in the plot command using one of the set of Line 2D properties. For example
cor ~color~=/color/- Control line color. Possible color abbreviations:
{'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'}. You can also use colors from the xkcd color name survey with the prefixxkcd:or an RGB or RGBA (red, green, blue, alpha) tuple of float values or hex string. - ~alpha~=/float/
- Set the alpha value used for blending. This is not supported on all backends.
lsorlinestyle- Control line style. Possible options:
{'-', '--', '-.', ':', '', (offset, on-off-seq), ...}. lworlinewidthfloat- Control linewidth.
markermarkerstyle- Control marker style. For possible options check Matplotlib Markers.
markersizeormsfloat- Control marker size in points.
markeveryNone or int or (int, int) or slice or List[int] or float or (float, float)- Control markers display
In this example we use some of these parameters
x_data = np.linspace(0,4*np.pi,500)
y1_data = np.cos(x_data**2)/np.cosh(x_data/5)
y2_data = np.cos(x_data**3-5*x_data**2)/np.cosh(x_data/2)
fig, ax = plt.subplots()
ax.plot(x_data, y1_data, label="Data 1", c="b", ls="-.", alpha = 0.6, marker="o", markersize=4)
ax.plot(x_data, y2_data, label="Data 2", linestyle="-", color="xkcd:olive", lw=2.0, marker = "+", markevery=3)
ax.set_title('Single plot, several curves', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("F(w) (a.u.)", fontsize = 16) # idem for y axis
ax.legend()
As already mentioned in the exercises of Lesson 2, you can also depict data using the pyplot.scatter function and another useful tool is the pyplot.bar. Let’s try these two. We can try scatter with the previous function
x_data = np.linspace(0,4*np.pi,500)
y1_data = np.cos(x_data**2)/np.cosh(x_data/5)
fig, ax = plt.subplots()
ax.scatter(x_data, y1_data, s=10, label="Data 1", c="b", alpha = 0.6)
ax.set_title('Single scatter plot', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("$F(\theta)$ (a.u.)", fontsize = 16) # idem for y axis
ax.legend()
And we can plot the squared components of two GOE arrays eigenvectors as a bar plot. In this case we fix the figure width and height using the figsize option in subplots (default units are inches, you can change to cm using a conversion factor cm = 1/2.54).
x = np.diag(GOE_arr) # Using the diagonal GOE_arr values as an index
width = 0.11 # bars width
fig, ax = plt.subplots(figsize=(9,5))
rects1 = ax.bar(x - width/2, avec_GOE[:,0]**2, width, label="G.S.")
rects2 = ax.bar(x + width/2, avec_GOE[:,6]**2, width, label='6th exc. state')
# Add text for labels, title etc.
ax.set_ylabel('Squared Eigenvector Components')
ax.set_xlabel('GOE diagonal value')
ax.set_title('Diagonal value of the basis state in the GOE matrix')
ax.legend()
You can also include insets into the plot. A simple way is declaring a new axis in the figure, you can provide in unitless percentages of the figure size the position of the
new axis -(0,0 is bottom left)- and the width and height of the inset.
x_data = np.linspace(0,3*np.pi,700)
y_data = ((np.sin(6*x_data**2-x_data**0.5)+2)*np.exp(-x_data/2))**2
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x_data, y_data, label="$[sin{(6\\theta^2-\sqrt{\\theta})}+2]^2e^{-\\theta}$")
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 20) # Set x axis label and fontsize
ax.set_ylabel("$F(\\theta)$ (a.u.)", fontsize = 20) # idem for y axis
ax.tick_params(axis='x', labelsize=16) # size of x axis tick labels
ax.tick_params(axis='y', labelsize=16) # idem for y axis
#
left, bottom, width, height = [0.55, 0.35, 0.25, 0.3]
ax_new = fig.add_axes([left, bottom, width, height])
ax_new.plot(x_data[:200], (np.sin(6*x_data[:200]**2-x_data[:200]**0.5)+2)**2 , color='green')
ax_new.tick_params(axis='x', labelsize=14) # size of x axis ticks
ax_new.tick_params(axis='y', labelsize=14) # idem for y axis
ax.legend(fontsize = 22, loc="upper right")
You can get a finer control using the inset_axes function
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
#
x_data = np.linspace(0,3*np.pi,700)
y_data = ((np.sin(6*x_data**2-x_data**0.5)+2)*np.exp(-x_data/2))**2
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x_data, y_data, label="$[sin{(6\\theta^2-\sqrt{\\theta})}+2]^2e^{-\\theta}$")
#ax.set_title('Single plot, several curves', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 20) # Set x axis label and fontsize
ax.set_ylabel("$F(\\theta)$ (a.u.)", fontsize = 20) # idem for y axis
ax.tick_params(axis='x', labelsize=16) # size of x axis ticks
ax.tick_params(axis='y', labelsize=16) # idem for y axis
axins = inset_axes(ax, width="30%", height="40%", loc=1, borderpad = 2)
axins.plot(x_data, np.log(y_data))
axins.set_xlim(0,3) # Setting the x coordinate range in the inset
axins.set_ylim(-3,2) # Setting the y coordinate range in the inset
axins.set_ylabel("$\\log(F(\\theta))$ (a.u.)",fontsize = 12)
axins.tick_params(axis='x', labelsize=14) # size of x axis ticks
axins.tick_params(axis='y', labelsize=14) # idem for y axis
ax.legend(fontsize = 22, bbox_to_anchor=(0.3,0.43)) # another way of locating the legend
4.1. Subplots
If we need to add several subplots we can do it as we did in Lesson 2, but we can also make use of loops to automatize the task. Imagine that we need to plot the previous function np.cos(x_data**A)/np.cosh(x_data/B) for (A,B) = {(1,2),(1,5),(2,2),(2,5),(3,2),(3,5)}. We can use subplots as follows
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [2,5]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(9,7))
fig.suptitle("Title centered above subplots", fontsize=18)
#
for (A_value, A_i) in zip(A_parameter,A_index):
for (B_value, B_j) in zip(B_parameter, B_index):
#
y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
ax[A_i,B_j].plot(x_data, y_data, label="A = {0}, B = {1}".format(A_value, B_value))
ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
ax[A_i,B_j].legend()
#
fig.tight_layout(pad=3.0) # Control the extra padding around the figure border and between subplots. The pads are specified in fraction of fontsize.
You can also plot several lines in each subplot
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [2,5]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(9,7))
#
for (A_value, A_i) in zip(A_parameter,A_index):
for (B_value, B_j) in zip(B_parameter, B_index):
y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
y_data = np.vstack((y_data,np.sin(x_data**A_value)*np.tanh(x_data/B_value)))
ax[A_i,B_j].plot(x_data, y_data.T, label="A = {0}, B = {1}".format(A_value, B_value))
ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
ax[A_i,B_j].legend()
#
fig.tight_layout(pad=3.0)
fig.suptitle("Title centered above subplots", fontsize=18)
As all the panels share the same axis scaling you can only add ticks to the outer panels
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [1,2]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(9,7),sharex=True, sharey=True)
#
for (A_value, A_i) in zip(A_parameter,A_index):
for (B_value, B_j) in zip(B_parameter, B_index):
y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
ax[A_i,B_j].plot(x_data, y_data, label="A = {0}, B = {1}".format(A_value, B_value))
if A_i == 2:
ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
if B_j == 0:
ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
ax[A_i,B_j].legend()
#
fig.tight_layout(pad=1)
fig.suptitle("Title centered above subplots", fontsize=18, y=1.020)
The possible values for arguments sharex and sharey are boolean values (False is the default value and True or all will share the given axes between all panels), and row or col (only panels in the same row or column share a common axis).
4.2. Histograms
To generate 1D histograms a data vector is needed. We create one with the eigenvalues of a GOE array. We can create the array using a randn, a convenience function for users porting Matlab code
np.random.seed(123454321) GOE_dim = 2000 GOE_arr_old = np.random.randn(GOE_dim, GOE_dim) GOE_arr_old = (GOE_arr_old + GOE_arr_old.T)/2 aval_GOE_old = np.linalg.eigvalsh(GOE_arr_old)
from numpy.random import default_rng rng = default_rng() GOE_arr = rng.standard_normal((GOE_dim, GOE_dim)) GOE_arr = (GOE_arr + GOE_arr.T)/2 aval_GOE = np.linalg.eigvalsh(GOE_arr)
We can then plot the histogram, taking into account that the hist
method returns as an output the number associated to each bin, the
bins, and a set of patches that allows to operate on the histogram. In
the left panel we display the absolute histogram while in the right
one we normalize by the total number of counts and display a
percentage.
n_bins = 51 fig, ax = plt.subplots(tight_layout=True) bins_result_old = ax.hist(aval_GOE_old, bins=n_bins) bins_result = ax.hist(aval_GOE, bins=n_bins)
You can combine a histogram with a plot. Let’s plot the value of the theoretical GOE level density (a semicircle). Let’s first compute it
sigma_nd = 2 # variance of non-diagonal GOE matrix elements A_value = 1/(4*sigma_nd) sqrd_a_value = GOE_dim/A_value
We now make use of the information about the bins in the output of hist.
n_bins = 71 fig, ax = plt.subplots(tight_layout=True) bins_result = ax.hist(aval_GOE, bins=n_bins, density=True) # GOE_density = np.sqrt(sqrd_a_value-bins_result[1][1:]**2)/(np.pi*sqrd_a_value/2) # ax.plot(bins_result[1][1:], GOE_density)
To plot a 2D histogram, one only needs two data vectors of the same length, corresponding to each axis of the histogram.
x = np.random.normal(3, 1, 200) y = np.random.normal(4, 0.1, 200)
It is not mandatory, but one can also define the plot bins
### Bin Edges xedges = [0, 1, 3, 5, 7, 9, 11] yedges = [0, 2, 3, 4, 6, 8]
The histogram is computed as
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges)) # Histogram does not follow Cartesian convention (see Matplotlib Notes), H = H.T
And it can be depicted using imshow or meshgrid
fig = plt.figure(figsize=(7, 3)) ax = fig.add_subplot(121, title='imshow: square bins') plt.imshow(H, interpolation='nearest', origin='lower', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]]) ## ax = fig.add_subplot(122, title='pcolormesh: actual edges', aspect='equal') X, Y = np.meshgrid(xedges, yedges) ax.pcolormesh(X, Y, H) plt.show()
4.3. Saving your figures
You should save a script to easily recreate any of your figures, but you can also save them in a graphic format. The available formats depend on your Python distribution. To know them run
fig = plt.gcf() fig.canvas.get_supported_filetypes()
To save a figure you can issue the command
plt.savefig("figure_name.extension")
If you want to further modify the figure in a program as inkscape or xfig you can save the figure as svg file -vector graphics- but in this case, when you import the matplotlib library, you should run the following statement
from matplotlib import rcParams rcParams['svg.fonttype'] = 'none'
And now you can save the figure of your interest
rcParams['svg.fonttype'] = 'none'
#
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [1,2]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(10,7),sharex=True, sharey=True, linewidth=3.0)
#
for (A_value, A_i) in zip(A_parameter,A_index):
for (B_value, B_j) in zip(B_parameter, B_index):
y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
ax[A_i,B_j].plot(x_data, y_data, label="A = {0}, B = {1}".format(A_value, B_value))
if A_i == 2:
ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
if B_j == 0:
ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
ax[A_i,B_j].legend()
#
fig.tight_layout()
fig.suptitle("Title centered above subplots", fontsize=18, y=1.020)
plt.savefig("test.svg")
plt.savefig("test.pdf",facecolor="r",edgecolor="g",bbox_inches="tight")
The savefig command accepts some options that are quite useful:
- dpi
- Figure resolution in dots-per-inch. Default value is 100.
- facecolor
- Color of the background. Default value
w, white. - edgecolor
- Color of the figure edge line. Default value
w, white. Notice that in this case the figurelinewidthoption needs to be set to a value different from the null default one. - bboxinches
- Portion of the figure saved. The value
trimattempts to trim empty white space around the figure.
Some of these options are used in the second savefig command in the previous example.
| Exercise 5.4 |
|---|
| Exercise 5.5 |
5. Optimizing and function fitting with SciPy.
The ScyPy library provides many user-friendly and efficient numerical routines, such as routines for numerical integration, interpolation, optimization, linear algebra, and statistics. In this course we will only very briefly explain how to use it to perform linear and non-ĺinear data fittings. You can find more information about this useful library in its documentation.
In a linear fit case you should import the Scipy module stats.
from scipy import stats
We consider that we have a data set that we suspect it follows the functional law f(N) = a*N**b. Let’s first prepare our dataset and add to it a normal error.
# Prepare Data assumint a power law f(N) = a*N**b a_theor = 5.05 b_theor = -1.25 N_values = np.array(range(500,10600,500)) f_values = a_theor*N_values**b_theor f_errors = np.array(list(map(lambda x:np.random.normal(x,np.abs(x)/20), f_values)))
The dependence is not linear but we can transform it into a linear dependence taking logarithm of both abscyssa and ordinate values
x_data = np.log10(N_values) y_data = np.log10(f_errors) result = stats.linregress(x_data, y=y_data) result
The obtained result provides the straight line slope and intercept values, the correlation coefficient (rvalue), the p-value of the fit, and the coefficient errors. We can depict the data plus errors and the best fit line
fig, ax = plt.subplots()
ax.plot(x_data, y_data, label="Reference values", linestyle="-", color="xkcd:olive", lw=1.0, marker = "o", markevery=1, markersize=2)
ax.plot(x_data, result.slope*x_data + result.intercept, label="log-log linear fit", linestyle="-.", color="xkcd:red", lw=1.0, alpha = 0.76)
ax.set_xlabel("$log(N)$", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("$log[f(N)]$", fontsize = 16) # idem for y axis
ax.legend()
In case you cannot linearize your function dependence, as in the next example, you import the optimize module.
from scipy import optimize
We now prepare our dataset and add again a random normal error.
# Prepare Data assumint a function g(x) = a + b*x + c/x**d a_theor = 1.05 b_theor = 1.25 c_theor = -2.5 d_theor = 2 x_values = np.arange(0.5,12,0.25) f_values = a_theor+ b_theor * x_values + c_theor/(x_values**d_theor) f_errors = np.array(list(map(lambda x:np.random.normal(x,np.abs(x)/20), f_values)))
You now define the function that will be used to perform the nonlinear fit
def f_fit(x, a, b, c, d):
'''
a + b*x + c*x**-d
'''
return a + b*x + c/x**d
And the fit can be performed now as follows
params, pcov = optimize.curve_fit(f_fit,x_values,f_errors, p0=(0.5,0.7,-0.9,3)) #, method='trf') parerr = np.sqrt(np.diag(pcov)) params, parerr
The array pcov is the covariance matrix and therefore the parerr vector is the vector that contains the error of the obtained parameters. We can plot the original values and the resulting optimized function.
fig, ax = plt.subplots()
ax.plot(x_values, f_errors, label="Reference values", linestyle="-", color="xkcd:olive", lw=1.0, marker = "o", markevery=1, markersize=2)
ax.plot(x_values, f_fit(x_values, *params) , label="nonlinear fit", linestyle="-.", color="xkcd:red", lw=1.0, alpha = 0.76)
ax.set_xlabel("$x$", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("$f(x)$", fontsize = 16) # idem for y axis
ax.legend()
Created: 2025-09-04 Thu 14:11