Algomath μse

Physics informed neural networks (PINNs) for Partial Differential Equations (PDEs)

Pietro Zanotta — Sun, 05 May 2024 17:05:09 GMT

Introduction

Despite the grandiose name, Physics Informed Neural Networks (PINNs from now on) are simply neural networks trained to solve supervised learning tasks while adhering to any provided law of physics described by general nonlinear partial differential equations (PDEs from now on). The resulting neural network acts as a universal function approximator that inherently incorporates any underlying physical laws as prior knowledge, making them suitable for solving PDEs.

This not only signifies a stark departure from traditional numerical methods (such as finite difference, finite volume, finite elements, etc.) but also marks a shift in how we approach modeling and understanding physical systems. In fact, unlike conventional numerical techniques that depend on discretization and iterative solvers, PINNs offer a more comprehensive and data-centric approach and, by combining the capabilities of neural networks with the principles of physical laws, PINNs hold the potential to open up new pathways for exploration and innovation across various scientific fields.

In this blog post, my goal is to discuss all the essential components required to address PINNs for PDEs. Therefore, the post is composed of 4 key parts:

Firstly, I introduce PDEs and explain the necessity of relying on numerical methods;
Secondly, I provide a brief overview of neural networks;
Next, I delve into discussing PINNs;
Finally, I demonstrate how to implement PINNs using PyTorch.

About Partial Differential Equations (PDEs)

PDEs serve as fundamental tools in describing physical phenomena and natural processes across various scientific domains, from physics and engineering to biology and finance. Unlike ordinary differential equations (ODEs), which involve only one independent variable, PDEs incorporate multiple independent variables, such as space and time. For example

$$\frac{\delta f}{\delta t}+\alpha \frac{\delta f}{\delta x}=0$$

is known as the advection equation, where:

$f(t,x)$ is a function of two independent variables $x$ (space) and $t$ (time);
$\alpha$ is a constant;
$\frac{\delta f}{\delta t}$ represents the rate of change of $f$ with respect to time;
$\frac{\delta f}{\delta x}$represents the rate of change of $f$ with respect to space.

Physically, this equation describes how a quantity $f$ evolves over time $t$ as it is transported by a flow with constant speed $\alpha$ in the $x$-direction. In other words, it describes how $f$ moves along the $x$-axis with time, where the rate of change in time is proportional to the rate of change in space multiplied by the constant $\alpha$.

A closed form general solution to the advection function can be derived and corresponds to:

$$f(t,x)=g(x-at)$$

for any differentiable function $g(\cdot)$.

Not every PDE however admits a closed-form solution due to several reasons:

Complexity: many PDEs describe highly intricate physical phenomena with nonlinear behavior, making it difficult to find analytical solutions. Nonlinear PDEs, in particular, often lack closed-form solutions because of their intricate interdependence between variables;
Boundary conditions: solution to a PDE often depends not only on the equation itself but also on the boundary and initial conditions. If these conditions are complex or not well-defined, finding a closed-form solution becomes exceedingly challenging;
Non-standard formulation: some PDEs might be in non-standard forms that don't lend themselves easily to analytical techniques. For example, PDEs with non-constant coefficients or with terms involving higher-order derivatives may not have straightforward analytical solutions.
Inherent nature of the problem: certain systems are inherently chaotic or exhibit behaviors that resist simple mathematical description. For such systems, closed-form solutions may not exist, or if they do, they might be highly unstable or impractical.

For all the reasons mentioned, scientists usually depend on numerical methods to estimate the solution to PDEs (like PINNs, finite elements, finite volume, finite difference, and spectral method). In this post, I am only covering PINNs, while another post about other methods is on its way.

About neural networks

Once the problem that PINNs aim to solve is clear, we will now discuss some essential topics about neural networks. Since this is a very broad subject, I will only cover the most important aspects.

Neural networks are algorithms inspired by the workings of the human brain. In our brains, neurons process incoming data, such as visual information from our eyes, to recognize and understand our surroundings. Similarly, neural networks operate by receiving input data (input layer), processing it to identify patterns (hidden layer), and producing an output based on this analysis (output layer). Therefore, a neural network is typically represented as shown in the following picture:

The basic unit of computation in a neural network is the neuron. It receives input from other nodes or an external source and calculates an output. Each input is linked with a weight (w, which the network learns), and every neuron has a unique input known as the bias (b), which is always set to 1. The output from the neuron is calculated as the weighted sum of the inputs, which is then sent to the activation function (introducing non-linearity into the output).

That said, a neural network comprises multiple interconnected neurons. While various architectures are tailored for specific issues, we will now concentrate on basic neural networks, also referred to as Feedforward neural networks (FNN).

Learning in neural networks

What is learnable in a neural networks are the weights and the bias and the learning process is divided in two parts:

feed forward propogation;
backward propogation.

In fact, learning occurs by adjusting connection weights after processing each piece of data, depending on the error in the output compared to the expected result.

Feedforward propagation

Feedforward propagation is the foundational process in neural networks where input data is processed through the layers to produce an output. This process is crucial for making predictions or classifications based on the given input. In feedforward propagation:

the flows in a unidirectional manner from the input layer through the hidden layers (if any) to the output layer;
each neuron in a layer receives inputs from all neurons in the previous layer. The inputs are combined with weights and a bias term, and the result is passed through an activation function to produce the neuron's output;
this process is repeated for each layer until the output layer is reached.

Backward propagation

Backward propagation, also known as backpropagation, is the process by which a neural network learns from its mistakes and adjusts its parameters (weights and biases) to minimize the difference between its predictions and the true targets. In backward propagation:

after the output is generated through feedforward propagation, the network's performance is evaluated using a loss function, which measures the difference between the predicted output and the true target values;
the gradient of the loss function with respect to each parameter (weight and bias) in the network is computed using the chain rule of calculus. This gradient indicates the direction and magnitude of the change needed to minimize the loss function;
the gradients are then used to update the parameters of the network in the opposite direction of the gradient, a process known as gradient descent. This update step involves adjusting the parameters by a small amount proportional to the gradient and a learning rate hyperparameter.

Neural networks as universal function approximators

Pivotal to Physics-Informed Neural Networks (PINNs) is a crucial theoretical concept concerning neural networks: their capability as universal function approximators. This fundamental property implies that neural networks can effectively approximate any continuous function with remarkable precision, given a sufficient number of neurons and an appropriate network configuration. Considering that the goal of PINNs is to estimate the solution to a Partial Differential Equation (PDE), which essentially involves approximating a function, this characteristic holds immense significance for the success and efficacy of PINNs in their predictive tasks.

About physics informed neural networks (PINNs)

Once the basics of deep learning are clear, we can delve deeper into understanding Physics Informed Neural Networks (PINNs).

Physics Informed Neural Networks (PINNs) serve as universal function approximators, with their neural network architecture representing solutions to specific Partial Differential Equations (PDEs). The core concept behind PINNs, as implied by their name, involves integrating prior knowledge about the system's dynamics into the cost function. This integration allows for penalizing any deviations from the governing PDEs by the network's solution.

Moreover, PINNs necessitate addressing the differences between the network's predictions and the actual data points within the cost function. This process is crucial for refining the network's accuracy and ensuring that it aligns closely with the observed data, thereby enhancing the model's predictive capabilities.

Therefore the loss function is:

$$\text{total loss} = \text{data loss + physics loss}$$

and (once a norm $|\cdot|$ is chosen) becomes:

$$\text{total loss} = \frac1n\sum |y_i - \hat y(x_i|\theta)| + \frac \lambda m\sum |f(x_j, \hat y(x_j|\theta))|$$

where:

$\hat y(x| \theta) $ is our neural network;
$x_i \text{ and } y_i$ are the data;
$f(x, g)=0$ is the PDE;
$\lambda$ is an hyperparameter.

and is equivalent to means squared error (MSE) when the chosen norm is the squared L2 norm.

Once we have this, we can train our PINN as a regular neural network.

A comparison between PINNs and classical methods

Compared to the traditional numerical simulation approaches, PINNs have the following desiderable properties:

PINNs are mesh-free, i..e they can handle complex domains, with a potential computational advantage;
PINNs are well-suited for modeling complex and nonlinear systems, because of the universa approximation theorem.

PyTorch implementation

This final section focuses on a PyTorch example to apply the theory in practice.

While this blog post mainly discussed PDEs, before approximating a PDE with a PINN, I aim to approximate an ODE using a PINN.

Approximating an ODE using a PINN: the logistic equation

The logistic equation is a differential equation used to model population growth in situations where resources are limited. It is often represented as:

$$\frac{dP}{dt} = rP\left(1 - \frac{P}{K}\right)$$

where:

$P$ represents the population size at time $t$;
$r$ is the intrinsic growth rate of the population;
$K$ is the carrying capacity of the environment, representing the maximum population size that the environment can sustain.

The analytical solution to the logistic equation is given by the logistic function:

$$P(t) = \frac{K}{1 + \left(\frac{K-P_0}{P_0}e^{-rt}\right)}$$

First of all we need a neural network architecture:

# code from https://github.com/EdgarAMO/PINN-Burgers/blob/main/burgers_LBFGS.pyimport torchimport torch.nn as nnimport numpy as npfrom random import uniformclass PhysicsInformedNN():    def __init__(self, X_u, u, X_f):        # x & t from boundary conditions:        self.x_u = torch.tensor(X_u[:, 0].reshape(-1, 1),                                dtype=torch.float32,                                requires_grad=True)        self.t_u = torch.tensor(X_u[:, 1].reshape(-1, 1),                                dtype=torch.float32,                                requires_grad=True)        # x & t from collocation points:        self.x_f = torch.tensor(X_f[:, 0].reshape(-1, 1),                                dtype=torch.float32,                                requires_grad=True)        self.t_f = torch.tensor(X_f[:, 1].reshape(-1, 1),                                dtype=torch.float32,                                requires_grad=True)        # boundary solution:        self.u = torch.tensor(u, dtype=torch.float32)        # null vector to test against f:        self.null =  torch.zeros((self.x_f.shape[0], 1))        # initialize net:        self.create_net()        #self.net.apply(self.init_weights)        # this optimizer updates the weights and biases of the net:        self.optimizer = torch.optim.LBFGS(self.net.parameters(),                                    lr=1,                                    max_iter=50000,                                    max_eval=50000,                                    history_size=50,                                    tolerance_grad=1e-05,                                    tolerance_change=0.5 * np.finfo(float).eps,                                    line_search_fn="strong_wolfe")        # typical MSE loss (this is a function):        self.loss = nn.MSELoss()        # loss :        self.ls = 0        # iteration number:        self.iter = 0    def create_net(self):        """ net takes a batch of two inputs: (n, 2) --> (n, 1) """        self.net = nn.Sequential(            nn.Linear(2, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 20), nn.Tanh(),            nn.Linear(20, 1))    def init_weights(self, m):        if type(m) == nn.Linear:            torch.nn.init.xavier_normal_(m.weight, 0.1)            m.bias.data.fill_(0.001)    def net_u(self, x, t):        u = self.net( torch.hstack((x, t)) )        return u    def net_f(self, x, t):        u = self.net_u(x, t)        u_t = torch.autograd.grad(            u, t,             grad_outputs=torch.ones_like(u),            retain_graph=True,            create_graph=True)[0]        u_x = torch.autograd.grad(            u, x,             grad_outputs=torch.ones_like(u),            retain_graph=True,            create_graph=True)[0]        u_xx = torch.autograd.grad(            u_x, x,             grad_outputs=torch.ones_like(u_x),            retain_graph=True,            create_graph=True)[0]        f = u_t + (u * u_x) - (nu * u_xx)        return f    def plot(self):        """ plot the solution on new data """        import matplotlib.pyplot as plt        from mpl_toolkits.axes_grid1 import make_axes_locatable        x = torch.linspace(-1, 1, 200)        t = torch.linspace( 0, 1, 100)        # x & t grids:        X, T = torch.meshgrid(x, t)        # x & t columns:        xcol = X.reshape(-1, 1)        tcol = T.reshape(-1, 1)        # one large column:        usol = self.net_u(xcol, tcol)        # reshape solution:        U = usol.reshape(x.numel(), t.numel())        # transform to numpy:        xnp = x.numpy()        tnp = t.numpy()        Unp = U.detach().numpy()        # plot:        fig = plt.figure(figsize=(9, 4.5))        ax = fig.add_subplot(111)        h = ax.imshow(Unp,                      interpolation='nearest',                      cmap='rainbow',                       extent=[tnp.min(), tnp.max(), xnp.min(), xnp.max()],                       origin='lower', aspect='auto')        divider = make_axes_locatable(ax)        cax = divider.append_axes("right", size="5%", pad=0.10)        cbar = fig.colorbar(h, cax=cax)        cbar.ax.tick_params(labelsize=10)        plt.show()    def closure(self):        # reset gradients to zero:        self.optimizer.zero_grad()        # u & f predictions:        u_prediction = self.net_u(self.x_u, self.t_u)        f_prediction = self.net_f(self.x_f, self.t_f)        # losses:        u_loss = self.loss(u_prediction, self.u)        f_loss = self.loss(f_prediction, self.null)        self.ls = u_loss + f_loss        # derivative with respect to net's weights:        self.ls.backward()        # increase iteration count:        self.iter += 1        # print report:        if not self.iter % 100:            print('Epoch: {0:}, Loss: {1:6.3f}'.format(self.iter, self.ls))        return self.ls    def train(self):        """ training loop """        self.net.train()        self.optimizer.step(self.closure)if __name__ == '__main__' :    nu = 0.01 / np.pi         # constant in the diff. equation    N_u = 100                 # number of data points in the boundaries    N_f = 10000               # number of collocation points    # X_u_train: a set of pairs (x, t) located at:        # x =  1, t = [0,  1]        # x = -1, t = [0,  1]        # t =  0, x = [-1, 1]    x_upper = np.ones((N_u//4, 1), dtype=float)    x_lower = np.ones((N_u//4, 1), dtype=float) * (-1)    t_zero = np.zeros((N_u//2, 1), dtype=float)    t_upper = np.random.rand(N_u//4, 1)    t_lower = np.random.rand(N_u//4, 1)    x_zero = (-1) + np.random.rand(N_u//2, 1) * (1 - (-1))    # stack uppers, lowers and zeros:    X_upper = np.hstack( (x_upper, t_upper) )    X_lower = np.hstack( (x_lower, t_lower) )    X_zero = np.hstack( (x_zero, t_zero) )    # each one of these three arrays haS 2 columns,     # now we stack them vertically, the resulting array will also have 2     # columns and 100 rows:    X_u_train = np.vstack( (X_upper, X_lower, X_zero) )    # shuffle X_u_train:    index = np.arange(0, N_u)    np.random.shuffle(index)    X_u_train = X_u_train[index, :]    # make X_f_train:    X_f_train = np.zeros((N_f, 2), dtype=float)    for row in range(N_f):        x = uniform(-1, 1)  # x range        t = uniform( 0, 1)  # t range        X_f_train[row, 0] = x         X_f_train[row, 1] = t    # add the boundary points to the collocation points:    X_f_train = np.vstack( (X_f_train, X_u_train) )    # make u_train    u_upper =  np.zeros((N_u//4, 1), dtype=float)    u_lower =  np.zeros((N_u//4, 1), dtype=float)     u_zero = -np.sin(np.pi * x_zero)      # stack them in the same order as X_u_train was stacked:    u_train = np.vstack( (u_upper, u_lower, u_zero) )    # match indices with X_u_train    u_train = u_train[index, :]    # pass data sets to the PINN:    pinn = PhysicsInformedNN(X_u_train, u_train, X_f_train)    pinn.train()

Then we can build our loss function and define our ODE:

from typing import Callableimport argparseimport matplotlib.pyplot as pltimport torchfrom torch import nnimport numpy as npimport torchoptfrom pinn import make_forward_fn, LinearNNR = 1.0  # rate of maximum population growth parameterizing the equationX_BOUNDARY = 0.0  # boundary condition coordinateF_BOUNDARY = 0.5  # boundary condition valuedef make_loss_fn(f: Callable, dfdx: Callable) -> Callable:    """Make a function loss evaluation function    The loss is computed as sum of the interior MSE loss (the differential equation residual)    and the MSE of the loss at the boundary    Args:        f (Callable): The functional forward pass of the model used a universal function approximator. This            is a function with signature (x, params) where `x` is the input data and `params` the model            parameters        dfdx (Callable): The functional gradient calculation of the universal function approximator. This            is a function with signature (x, params) where `x` is the input data and `params` the model            parameters    Returns:        Callable: The loss function with signature (params, x) where `x` is the input data and `params` the model            parameters. Notice that a simple call to `dloss = functorch.grad(loss_fn)` would give the gradient            of the loss with respect to the model parameters needed by the optimizers    """    def loss_fn(params: torch.Tensor, x: torch.Tensor):        # interior loss        f_value = f(x, params)        interior = dfdx(x, params) - R * f_value * (1 - f_value)        # boundary loss        x0 = X_BOUNDARY        f0 = F_BOUNDARY        x_boundary = torch.tensor([x0])        f_boundary = torch.tensor([f0])        boundary = f(x_boundary, params) - f_boundary        loss = nn.MSELoss()        loss_value = loss(interior, torch.zeros_like(interior)) + loss(            boundary, torch.zeros_like(boundary)        )        return loss_value    return loss_fnif __name__ == "__main__":    # make it reproducible    torch.manual_seed(42)    # parse input from user    parser = argparse.ArgumentParser()    parser.add_argument("-n", "--num-hidden", type=int, default=5)    parser.add_argument("-d", "--dim-hidden", type=int, default=5)    parser.add_argument("-b", "--batch-size", type=int, default=30)    parser.add_argument("-lr", "--learning-rate", type=float, default=1e-1)    parser.add_argument("-e", "--num-epochs", type=int, default=100)    args = parser.parse_args()    # configuration    num_hidden = args.num_hidden    dim_hidden = args.dim_hidden    batch_size = args.batch_size    num_iter = args.num_epochs    tolerance = 1e-8    learning_rate = args.learning_rate    domain = (-5.0, 5.0)    # function versions of model forward, gradient and loss    model = LinearNN(num_layers=num_hidden, num_neurons=dim_hidden, num_inputs=1)    funcs = make_forward_fn(model, derivative_order=1)    f = funcs[0]    dfdx = funcs[1]    loss_fn = make_loss_fn(f, dfdx)    # choose optimizer with functional API using functorch    optimizer = torchopt.FuncOptimizer(torchopt.adam(lr=learning_rate))    # initial parameters randomly initialized    params = tuple(model.parameters())    # train the model    loss_evolution = []    for i in range(num_iter):        # sample points in the domain randomly for each epoch        x = torch.FloatTensor(batch_size).uniform_(domain[0], domain[1])        # compute the loss with the current parameters        loss = loss_fn(params, x)        # update the parameters with functional optimizer        params = optimizer.step(loss, params)        print(f"Iteration {i} with loss {float(loss)}")        loss_evolution.append(float(loss))    # plot solution on the given domain    x_eval = torch.linspace(domain[0], domain[1], steps=100).reshape(-1, 1)    f_eval = f(x_eval, params)    analytical_sol_fn = lambda x: 1.0 / (1.0 + (1.0/F_BOUNDARY - 1.0) * np.exp(-R * x))    x_eval_np = x_eval.detach().numpy()    x_sample_np = torch.FloatTensor(batch_size).uniform_(domain[0], domain[1]).detach().numpy()    fig, ax = plt.subplots()    ax.scatter(x_sample_np, analytical_sol_fn(x_sample_np), color="red", label="Sample training points")    ax.plot(x_eval_np, f_eval.detach().numpy(), label="PINN final solution")    ax.plot(        x_eval_np,        analytical_sol_fn(x_eval_np),        label=f"Analytic solution",        color="green",        alpha=0.75,    )    ax.set(title="Logistic equation solved with PINNs", xlabel="t", ylabel="f(t)")    ax.legend()    fig, ax = plt.subplots()    ax.semilogy(loss_evolution)    ax.set(title="Loss evolution", xlabel="# epochs", ylabel="Loss")    ax.legend()    plt.show()

And this is the result:

Approximating a PDE using a PINN: the heat equation

The heat equation is a classical partial differential equation that describes the diffusion of heat (or equivalently, the distribution of temperature) in a given region over time. The one-dimensional form of the heat equation is given by:

$$\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}$$

Again, we implement the PINN:

# code from https://github.com/udemirezen/PINN-1/blob/main/solve_PDE_NN.ipynbimport torchimport torch.nn as nnfrom torch.autograd import Variabledevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")import numpy as np# We consider Net as our solution u_theta(x,t)"""When forming the network, we have to keep in mind the number of inputs and outputsIn ur case: #inputs = 2 (x,t)and #outputs = 1You can add ass many hidden layers as you want with as many neurons.More complex the network, the more prepared it is to find complex solutions, but it also requires more data.Let us create this network:min 5 hidden layer with 5 neurons each."""class Net(nn.Module):    def __init__(self):        super(Net, self).__init__()        self.hidden_layer1 = nn.Linear(2,5)        self.hidden_layer2 = nn.Linear(5,5)        self.hidden_layer3 = nn.Linear(5,5)        self.hidden_layer4 = nn.Linear(5,5)        self.hidden_layer5 = nn.Linear(5,5)        self.output_layer = nn.Linear(5,1)    def forward(self, x,t):        inputs = torch.cat([x,t],axis=1) # combined two arrays of 1 columns each to one array of 2 columns        layer1_out = torch.sigmoid(self.hidden_layer1(inputs))        layer2_out = torch.sigmoid(self.hidden_layer2(layer1_out))        layer3_out = torch.sigmoid(self.hidden_layer3(layer2_out))        layer4_out = torch.sigmoid(self.hidden_layer4(layer3_out))        layer5_out = torch.sigmoid(self.hidden_layer5(layer4_out))        output = self.output_layer(layer5_out) ## For regression, no activation is used in output layer        return output### (2) Modelnet = Net()net = net.to(device)mse_cost_function = torch.nn.MSELoss() # Mean squared erroroptimizer = torch.optim.Adam(net.parameters())## PDE as loss function. Thus would use the network which we call as u_thetadef f(x,t, net):    u = net(x,t) # the dependent variable u is given by the network based on independent variables x,t    ## Based on our f = du/dx - 2du/dt - u, we need du/dx and du/dt    u_x = torch.autograd.grad(u.sum(), x, create_graph=True)[0]    u_t = torch.autograd.grad(u.sum(), t, create_graph=True)[0]    pde = u_x - 2*u_t - u    return pde## Data from Boundary Conditions# u(x,0)=6e^(-3x)## BC just gives us datapoints for training# BC tells us that for any x in range[0,2] and time=0, the value of u is given by 6e^(-3x)# Take say 500 random numbers of xx_bc = np.random.uniform(low=0.0, high=2.0, size=(500,1))t_bc = np.zeros((500,1))# compute u based on BCu_bc = 6*np.exp(-3*x_bc)### (3) Training / Fittingiterations = 20000previous_validation_loss = 99999999.0for epoch in range(iterations):    optimizer.zero_grad() # to make the gradients zero    # Loss based on boundary conditions    pt_x_bc = Variable(torch.from_numpy(x_bc).float(), requires_grad=False).to(device)    pt_t_bc = Variable(torch.from_numpy(t_bc).float(), requires_grad=False).to(device)    pt_u_bc = Variable(torch.from_numpy(u_bc).float(), requires_grad=False).to(device)    net_bc_out = net(pt_x_bc, pt_t_bc) # output of u(x,t)    mse_u = mse_cost_function(net_bc_out, pt_u_bc)    # Loss based on PDE    x_collocation = np.random.uniform(low=0.0, high=2.0, size=(500,1))    t_collocation = np.random.uniform(low=0.0, high=1.0, size=(500,1))    all_zeros = np.zeros((500,1))    pt_x_collocation = Variable(torch.from_numpy(x_collocation).float(), requires_grad=True).to(device)    pt_t_collocation = Variable(torch.from_numpy(t_collocation).float(), requires_grad=True).to(device)    pt_all_zeros = Variable(torch.from_numpy(all_zeros).float(), requires_grad=False).to(device)    f_out = f(pt_x_collocation, pt_t_collocation, net) # output of f(x,t)    mse_f = mse_cost_function(f_out, pt_all_zeros)    # Combining the loss functions    loss = mse_u + mse_f    loss.backward() # This is for computing gradients using backward propagation    optimizer.step() # This is equivalent to : theta_new = theta_old - alpha * derivative of J w.r.t theta    with torch.autograd.no_grad():        print(epoch,"Traning Loss:",loss.data)

and then plot the result:

from mpl_toolkits.mplot3d import Axes3DAxes3D = Axes3Dimport matplotlib.pyplot as pltfrom matplotlib import cmfrom matplotlib.ticker import LinearLocator, FormatStrFormatterimport numpy as npfig = plt.figure()ax = fig.add_subplot(111, projection='3d')x=np.arange(0,2,0.02)t=np.arange(0,1,0.02)ms_x, ms_t = np.meshgrid(x, t)## Just because meshgrid is used, we need to do the following adjustmentx = np.ravel(ms_x).reshape(-1,1)t = np.ravel(ms_t).reshape(-1,1)pt_x = Variable(torch.from_numpy(x).float(), requires_grad=True).to(device)pt_t = Variable(torch.from_numpy(t).float(), requires_grad=True).to(device)pt_u = net(pt_x,pt_t)u=pt_u.data.cpu().numpy()ms_u = u.reshape(ms_x.shape)surf = ax.plot_surface(ms_x,ms_t,ms_u, cmap=cm.coolwarm,linewidth=0, antialiased=False)ax.zaxis.set_major_locator(LinearLocator(10))ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))fig.colorbar(surf, shrink=0.5, aspect=5)plt.show()

which returns the following:

And that's it for this article. Thanks for reading. If you have any suggestions for improvement or any further insights to share, please don't hesitate to reach out and leave a comment below. Your feedback is invaluable and greatly appreciated.

Reference

Quantum Blockchain

Pietro Zanotta — Sun, 10 Mar 2024 21:23:44 GMT

In the landscape of technological innovation, two disruptive forces stand out: quantum computing and blockchain. While each has made significant strides independently, their convergence holds the promise of revolutionizing cryptography and reshaping the foundations of digital trust. At the heart of this synergy lies the concept of quantum blockchain, a novel blockchain model infused with quantum cryptographic principles.

Blockchain technology, epitomized by cryptocurrencies like Bitcoin and Ethereum, has redefined trust in digital transactions. Its decentralized ledger system offers immutable records, resistant to tampering and censorship, transforming industries beyond finance. Meanwhile, quantum computing, leveraging quantum mechanics, offers exponential computational power, poised to tackle problems deemed infeasible by classical computers.

While the two technologies seems unrelated, a profound connection exists between quantum computing and blockchain and this blog post intoduces you to quantum blockchain after a brief digression on fundamental concepts of quantum computing and quantum cryptography.

Quantum computing

As you may know all the information a computer store and process are just interminable strings of 0s and 1s, the so-called bits. Quantum computing is a completely different computational paradigm, relying on quantum bits (also called qubits), which can exist in a superposition of states, enabling them to represent both 0 and 1 simultaneously.

While this may seems a logical contraddiction, according to postulates of quantum mechanics, the state of a system is described as a linear combination of all possible states until measured, when the state collapse and is deterministically defined. This superposition property allows qubits to hold exponentially more information than classical bits. Furthermore, qubits can exhibit another peculiar quantum behavior called entanglement, where the state of one qubit becomes correlated with the state of another qubit. The third ingredient that makes a quantum computer faster than a classical one is quantum interference. Quantum interference occurs when the probability amplitudes of different quantum states interfere constructively or destructively, resulting in the amplification or reduction of certain outcomes. In quantum computing, this interference allows for the manipulation and processing of information in a highly efficient manner.

These phenomena enables quantum computers to outperform classical computers in solving certain types of problems, particularly those that require extensive exploration of solution spaces. Thus, superposition, entanglement, and quantum interference collectively contribute to the computational power and speed of quantum computers, offering the potential for revolutionary advancements in various fields of science and technology.

Quantum cryptography

One of the most fascinating and more important application of quantum technologies lies in quantum cryptography, a subset of quantum information science, that aims to utilize the principles of quantum mechanics to secure communication channels in a fundamentally different way than classical cryptographic methods.

In fact traditional cryptographic techniques rely on mathematical complexity, such as factorization or discrete logarithm problems, for securing data transmission. The idea here is to consider a problem that may take ages to a classical computer to solve due to its complexity and use this difficulty as the basis for encryption. Obviously having more and more powerful computers poses a threat to the security of these classical encryption methods. Moreover quantum computers will be able to efficiently solve these mathematical problems using algorithms like Shor's algorithm, rendering traditional encryption schemes obsolete.

Quantum cryptography, on the other hand, offers a solution that is fundamentally secure, regardless of the computational power of the adversary. By exploiting the properties of quantum mechanics, such as the superposition and entanglement of quantum states, quantum cryptography provides a means for two parties to communicate with absolute secrecy. Quantum Key Distribution (QKD), one of the most prominent applications of quantum cryptography, allows two parties to share a secret cryptographic key with the assurance that any attempt to intercept the key will be detected. This is achieved through the use of quantum states to encode the key information, making it impossible for an eavesdropper to gain knowledge of the key without disturbing the quantum states and revealing their presence.

As such, quantum cryptography offers a level of security that is unparalleled by classical cryptographic methods, making it an essential tool for ensuring the confidentiality and integrity of sensitive information in the digital age.

Quantum money

Before diving into quantum blockchain here I want to discuss another intriguing application of quantum technologies, somehow related to the blockchain and the cryptocurrencies: quantum money.

The concept of quantum money traces back to the early days of quantum information theory and cryptography, with theoretical proposals emerging in the 1970s and gaining momentum in subsequent decades. One of the pioneering works in this field was proposed by physicist Stephen Wiesner which was published in a scientific journal in 1983.

Wiesner's idea involved using quantum states to encode information on banknotes, making them effectively unforgeable due to the inherent properties of quantum mechanics. Specifically, Wiesner proposed a scheme where each banknote would contain a unique quantum state, which could not be precisely duplicated or measured without disturbing its state. This would make counterfeiting quantum money practically impossible, as any attempt to copy or measure the quantum state would inevitably alter it, thus revealing the counterfeit attempt.

Despite the theoretical appeal of Wiesner's proposal, the practical implementation of quantum money remains a significant challenge. Generating and manipulating quantum states with the precision and reliability required for quantum money presents formidable technical hurdles. Additionally, quantum systems are inherently fragile and susceptible to environmental noise, which could compromise the security of quantum money schemes.

Therefore, similarly to cryptocurrencies, quantum money seeks to provide an unforgeable form of currency by exploiting the fundamental principles of quantum mechanics.

Quantum blockchain

As you may already know, a blockchain functions as an immutable ledger where data is stored in the form of transactions, interconnected through a Merkle tree, and organized into blocks linked by hash functions. This network operates in a decentralized manner, with each node retaining a copy of the growing chain of blocks. Consensus protocols determine the addition of new blocks and establish agreement on the block sequence. Typically, the blockchain process begins with users broadcasting transactions, which are then verified and organized into a new block according to specific consensus rules, such as proof-of-work or proof-of-stake. Participants, often referred to as "miners" in systems like Bitcoin, compete to create the next block, with the successful miner being rewarded. The longest chain of blocks is considered definitive, providing a basis for consensus.

One of the main features of blockchain is that if any block within the chain is altered, it invalidates all subsequent blocks. Consequently, nodes in the blockchain network reject the tampered version and continue to work on the version supported by the majority.

Moreover access control in blockchains relies on public-key cryptography, where users safeguard private keys as passwords and use public keys as account identifiers. Transactions are authenticated using signatures generated with private keys, which are verified by network nodes against the corresponding public keys. Once you are familiar with the above information, you are ready to explore quantum blockchains.

Quantum blockchain typically refers to a variety of protocols, including classical blockchains with quantum-resistant cryptography, hybrid blockchains leveraging Quantum Key Distribution networks (just hybrid blockchains from now on), and fully quantum blockchains operating in the realm of quantum computing.

Hybrid blockchains aim to tackle the fact that public-key cryptography is not quantum resistant (we already mentioned Shors algorithm), therefore substitutig publik-key cryptography with the already mentioned Quantum Key Distribution.

Quantum blockchains on the other hand are more variegate and substitute some core block of a classical blockchain with a quantum counterpart. For example, Rajan, D., & Visser, M. (2019), whose quantum blockchain is regarded as a pioneering theoretical work, replaces the functionality of time-stamped blocks and hash functions linking them with a temporal entangled state, which offers a fairly interesting advantage: since the sensitivity towards tampering is significantly amplified, meaning that the full local copy of the blockchain is destroyed if one tampers with a single block (due to entanglement) while on a classical blockchain only the blocks following the compromised block are changed , which leaves it open to vulnerabilities. However let's dive a little more in the formulation of both the blockchain and the network as proposed in Rajan, D., & Visser, M. (2019).

Blockchain

This subsection explores the implementation of a quantum version of a block and a blockchain, utilizing temporally entangled states (a concept in quantum mechanics where the quantum states of multiple particles become correlated over time, rather than in space).

Entanglement, essentially the inseparability of distinct states, forms the basis for capturing the chain-like structure. Consequently, the blockchain can be viewed as an entangled quantum state, with a block's timestamp emerging from the immediate absorption of the first qubit of a block.

Constructing the blockchain from a series of entangled states involves amalgamating the blocks into a specialized entangled state known as a GreenbergerHorneZeilinger (GHZ) state.

Network

After establishing the blockchain, additional components are necessary for a functional blockchain system, notably a protocol for disseminating the blockchain's state to all network nodes. Since the blockchain's state is quantum in nature, a quantum channel must replace the classical one, with digital signatures implemented through Quantum Key Distribution (QKD) protocols.

Similar to classical blockchain systems, each node in a quantum blockchain setup must possess a copy of the blockchain, and new blocks must undergo verification before integration into each node's blockchain.

Conclusion

Quantum blockchain is still an area of research and is an authors opinion that, given the rise of classical blockchains and the realistic development of a global quantum network, quantum blockchain can potentially open the door to a new research frontier in quantum information science as well as new business possibilities.

Thanks for reading. This article does not want to be exhaustive on the topic and is no more than an introduction to quantum blockchain. To go further there are resources online and the source section below is a good starting point.

Sources:

Blockchain and randomness

Pietro Zanotta — Wed, 23 Aug 2023 12:17:29 GMT

Getting random numbers on the blockchain used to be a headache for those who wanted to use truly random numbers in a dapp or protocol, and the lotteries that used these pseudo-random numbers were easily hacked by fast and malicious agents. However, the idea of a blockchain whose dapps can function without random numbers was (and is) out of the question, and obtaining random numbers has become easy and relatively secure.

The reason for this difficulty is that the calculations must be deterministic in order to be replayed in a decentralized manner, and any data that can serve as random sources is also available to an attacker.

In this article, I'll review the solutions that blockchain engineers have developed in the past to address this problem and their weaknesses and conclude with the simplest and most commonly used method currently available.

Pseudo-randomness from unknowable at the time of transacting information

One of the first sources of entropy that blockchain engineers used was the block timestamp, a global variable that represents the timestamp of the current block in which the contract is executed. This timestamp is a Unix timestamp that indicates the number of seconds that have elapsed since January 1, 1970 (UTC) and provides information about when the block was mined.

The problem with block timestamps is that miners have the ability to influence them as long as the timestamp doesn't precede that of the parent block. Although timestamps are usually quite accurate, there is a potential problem if a miner benefits from inaccurate timestamps. In such cases, the miner could use his mining power to create blocks with incorrect timestamps and thus manipulate the results of the random function to his advantage.

For example, imagine a lottery in which a random bidder is selected from a set of bidders by a function that uses the timestamp of a block as the source of the randomness: a miner may enter the lottery and then modify the timestamp value to increase his chances of winning.

While these attacks may sound anachronistic, they are not beyond the realm of possibility. In fact, Feathercoin was the victim of a time-warp attack in 2013. In it, a group of miners exploited a vulnerability in Feathercoin's mining algorithm that allowed them to manipulate the timestamps of blocks, resulting in the rapid creation of new blocks. The attack undeniably caused significant damage to Feathercoin's value and reputation.

Still, one might think that using the block hash as a source of entropy or other block information that is generally unknown at the time of the transaction is a good idea. However, similar implementations have a major problem: they rely on publicly available information, which means that malicious actors can increase the probability of winning the lottery with an attack similar to the time-warp attack. This is because these quantities can be read and manipulated by any other transaction within the same mining block if the attacker is also a miner.

Even using a sophisticated combination of all information unknown at the time of the transaction is not a good idea: it makes the attack much more difficult, but does not make the protocol as secure as other methods do.

Randomness from off-chain data: oracles and APIs

I hope you have been convinced that using on-chain information is not a good practice when security is a crucial feature. What can we do to get an unpredictable random number for our lottery?

We can turn our attention to off-chain data, i.e. use the data that an API or oracle provides. For example, if we have an API that provides the temperature in a particular city, we can use it to calculate the remainder when dividing the number of tipsters and use the result as a random number. The temperature in a particular city changes frequently, and if the API's answer is updated frequently, the likelihood of a malicious agent guessing the number is very low.

Although this is a better solution than using on-chain data, it is not the best available because we centralize our random source and the smart contract is useless if the API is corrupted.

Moreover, no one would trust the lottery contract, since it can be assumed that the API is programmed to always return the same set of values and the protocol is no longer trustless.

Despite these drawbacks, oracles and APIs have been widely used to obtain data outside the chain, and are sometimes still used. It's worth noting that combining the results of different APIs and oracles can result in almost unpredictable output, which can be a good deal for small dapps or protocols that don't rely entirely on randomness. The reputation of the data provider is also important in this case.

The most important attack on APIs and oracles is so-called oracle manipulation, in which vulnerabilities in a blockchain oracle are exploited to make it report inaccurate information about events outside the chain. This attack is often part of a broader attack on a protocol, as malicious actors can cause a protocols smart contracts to execute based on false input or in a way that is advantageous to them.

Verifiable random functions (VRFs)

Steering clear of intricate mathematics,Verifiable Random Functions (VRFs ) can be described as public key pseudorandom functions. Put simply, these functions produce outputs that appear pseudorandom based on a given seed and mimic the behavior of true random outputs (if you want to dig deeper, read this article). The real power of VRFs is their ability to prove the correctness of their output calculations. The possessor of the secret key is the only one able to compute the output of the function (i.e., the random output) along with a corresponding proof for any input value. Conversely, anyone else who has the proof and the corresponding public key can verify the exact computation of this output. However, this information is not sufficient to derive the secret key.

One of the most commonly used VRFs is the Chainlink VRF, which relies on a decentralized oracle network (i.e., a set of oracles that receive data from multiple reliable sources) to enhance existing blockchains by providing verified off-chain data.

Chainlink VRF enables the generation of random numbers within smart contracts, enabling blockchain developers to create improved user experiences by incorporating unpredictable outcomes into their blockchain-powered applications. In addition, Chainlink VRF is immune to tampering, whether done by node operators, users, or malicious entities.

To go further

To be an outstanding blockchain developer it's not necessary to know everything about VRFs, however for the curious ones I suggest Micali, Rabin, Vadhan (1999) and the Chainlink VRF docs.

And that's it for this article.

Thanks for reading.

For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me here.

Testing smart contracts: unit tests and invariant tests

Pietro Zanotta — Thu, 03 Aug 2023 08:35:11 GMT

Testing plays a vital role in ensuring the security, functionality, and reliability of smart contracts and being able to write some goods test can save not only a lot of time but also a lot of money. In this article, we will discuss two types of testing methodologies: unit tests and invariant tests.

Note that I assume a basic knowledge of the Solidity language and Foundry framework, however, even someone without this knowledge should be able to follow along.

Blockchain 101: smart contracts

In simple words, smart contracts are like digital agreements that automatically execute and enforce themselves when certain conditions are met. We can liken smart contracts to vending machines: once they receive the right inputs, they automatically execute the agreement.

For example, to decentralize a lottery, we will write a function that takes in input the amount paid by a particular player and a function that, once a particular condition is met (e.g. if the number of players is 10 or if one day has passed), it generates a pseudo-random number between 0 and the number of players and pays the amount to the selected winner.

These digital agreements can be used for various purposes, such as transferring money, buying and selling assets, or even voting in elections. Since smart contracts run on the blockchain, they are tamper-resistant and transparent (or at least they should be). Nevertheless, not all blockchain developers pay attention to the contract doing what it is supposed to do, and in fact, the number of hacked or tampered smart contracts is surprisingly high.

To prevent this, it is important to develop a comprehensive testing strategy that includes both unit tests and invariant tests.

Set up

First of all, we need to set up the foundry environment:

forge init

Then we need a contract to test. The following lines of code implement a lottery as the one described before. Note that for simplicity the winner should be the first player (players[0]) that joins the lottery (pseudo-random numbers on the blockchain is a big theme for an upcoming article) and that the lottery ends once there are at least 5 participants and the owner of the lottery calls the function endTheLottery.

// SPDX-License-Identifier: MITpragma solidity ^0.8.19;contract Lottery {    error Lottery__notEnoughEthSent(uint256 amount);    error Lottery__notTheOwner(address sender);    error Lottery__notEnoughtPlayers();    error Lottery__invalidTransaction();    uint256 immutable i_lotteryPriceInEth;    address owner;    address[] players;    address winner;    modifier onlyOwner() {        if (msg.sender != owner) revert Lottery__notTheOwner(msg.sender);        _;    }    modifier moreThanFivePlayers() {        if (players.length < 5) revert Lottery__notEnoughtPlayers();        _;    }    constructor(uint256 lotteryPriceInEth) {        i_lotteryPriceInEth = lotteryPriceInEth;        owner = msg.sender;    }    function joinLottery() public payable {        if (msg.value < i_lotteryPriceInEth)            revert Lottery__notEnoughEthSent(msg.value);        players.push(msg.sender);    }    function endTheLottery() public onlyOwner moreThanFivePlayers {        if (players.length % 2 == 0) {            winner = players[1];        }        if (players.length % 2 == 1) {            winner = players[0];        }        (bool success, bytes memory data) = payable(winner).call{            value: address(this).balance        }("");        if (!success) revert Lottery__invalidTransaction();    }    function transferOwnership(address newOwner) public {        owner = newOwner;    }    function getNumberOfPlayer() public view returns (uint256) {        return players.length;    }    function getPlayer(uint256 index) public view returns (address) {        return players[index];    }    function getWinner() public view returns (address) {        return winner;    }}

In this first version of the contract, a couple of things are not correct and by testing we should be able to spot them.

Unit tests

Unit tests are deterministic tests, i.e. they produce deterministic results, are easy to debut and are used to assert particular behaviors of the contract. Before we write any unit test we need a deployer script as the following one:

// SPDX-License-Identifier: MITpragma solidity ^0.8.19;import {Script} from "../lib/forge-std/src/Script.sol";import {Lottery} from "src/Lottery.sol";contract DeployLottery is Script {    Lottery lottery;    function run() public returns (Lottery) {        vm.startBroadcast();        lottery = new Lottery(1 ether);        vm.stopBroadcast();        return lottery;    }}

Suppose now we want to assess that the modifier onlyOwner is doing his job (which is to prevent addresses different from the owner to call the endLotteryFunction), what we need to do is:

deploy the contract;
transfer the ownership of the contract calling the transferOwnership function;
prank an address (different from the owner) and try to call the endTheLottery function;
assert that the contract trows the Lottery__notTheOwner error.

// SPDX-License-Identifier: MITpragma solidity ^0.8.19;import {Test} from "lib/forge-std/src/Test.sol";import {Lottery} from "src/Lottery.sol";import {DeployLottery} from "script/DeployLottery.s.sol";contract LotteryTest is Test {    address player0 = makeAddr("Alice");    address player1 = makeAddr("Bob");    address player2 = makeAddr("Carl");    address player3 = makeAddr("David");    address player4 = makeAddr("Eleonor");    address owner = makeAddr("Owner");    uint256 public constant BALANCE = 100 ether;    Lottery lottery;    function setUp() public {        DeployLottery deployer = new DeployLottery();        vm.deal(player0, BALANCE);        vm.deal(player1, BALANCE);        vm.deal(player2, BALANCE);        vm.deal(player3, BALANCE);        vm.deal(player4, BALANCE);        vm.deal(owner, BALANCE);        lottery = deployer.run();        lottery.transferOwnership(owner);    }    function testOnlyOwnerCanEndTheLottery() public {        vm.expectRevert(            abi.encodeWithSelector(                Lottery.Lottery__notTheOwner.selector,                player0            )        );        vm.startPrank(player0); // not the owner        lottery.endTheLottery();        vm.stopPrank();    }}

Since player0 is the caller of endTheLottery, the contract trows the Lottery__notTheOwner error, as expected:

Running 1 test for test/LotteryTest.t.sol:LotteryTest[PASS] testOnlyOwnerCanEndTheLottery() (gas: 13825)Test result: ok. 1 passed; 0 failed; 0 skipped; finished in 1.22ms

Another classical use of unit tests is for asserting a particular relation between two variables. For example, let's assert that the number of players is five in the following script:

// SPDX-License-Identifier: MITpragma solidity ^0.8.19;import {Test} from "lib/forge-std/src/Test.sol";import {Lottery} from "src/Lottery.sol";import {DeployLottery} from "script/DeployLottery.s.sol";contract LotteryTest is Test {    address player0 = makeAddr("Alice");    address player1 = makeAddr("Bob");    address player2 = makeAddr("Carl");    address player3 = makeAddr("David");    address player4 = makeAddr("Eleanor");    address owner = makeAddr("Owner");    uint256 public constant BALANCE = 100 ether;    Lottery lottery;    function setUp() public {        DeployLottery deployer = new DeployLottery();        vm.deal(player0, BALANCE);        vm.deal(player1, BALANCE);        vm.deal(player2, BALANCE);        vm.deal(player3, BALANCE);        vm.deal(player4, BALANCE);        vm.deal(owner, BALANCE);        lottery = deployer.run();        lottery.transferOwnership(owner);    }    function testOnlyOwnerCanEndTheLottery() public {        vm.expectRevert(            abi.encodeWithSelector(                Lottery.Lottery__notTheOwner.selector,                player0            )        );        vm.startPrank(player0); // not the owner        lottery.endTheLottery();        vm.stopPrank();    }    function testAssertNumberOfPlayers() public {        vm.startPrank(player0);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player1);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player2);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player3);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player4);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        uint256 expectedNumberOfPlayers = 5;        uint256 numberOfPlayers = lottery.getNumberOfPlayer();        assertEq(expectedNumberOfPlayers, numberOfPlayers);    }}

As expected, assertEq(expectedNumberOfPlayers, numberOfPlayers); is true and the test passed:

Running 2 tests for test/LotteryTest.t.sol:LotteryTest[PASS] testAssertNumberOfPlayers() (gas: 192928)[PASS] testOnlyOwnerCanEndTheLottery() (gas: 13880)Test result: ok. 2 passed; 0 failed; 0 skipped; finished in 2.34ms

Note that these are only two simple cases and we haven't tested edge cases (for example the contract has undesired behavior after the first lottery concludes as the address array is never reinitialized).

As we saw, unit tests are particularly powerful tests in particular when the contract is quite simple. If the contract has some complex functions or inherits from other contracts we may want to conduct a different type of test: the invariant tests.

Invariant tests

Invariant tests are a form of stochastic testing, meaning the results may vary across test runs (unless the same seed is set). In other words, performing an invariant test means supplying random data to the contract functions trying to individuate some unexpected behavior.

The part of my reader that actually read the contract may have noticed that the function endTheLottery does something undesired. In fact, if the length of players is an even number (% is the modulus operator), the contract behaves correctly (remember that for simplicity we want the first to join the lottery to be the winner), but if the number is odd the winner is player[1] (i.e. the second one who joined the lottery).

It appears that the victory of player[0] should be an invariant property of the contract. Since many contracts have at least an invariant property and testing these properties with unit tests may be difficult or impossible (especially for complex contracts), knowing how to perform invariant tests is the conditio sine qua non to be a proficient blockchain engineer.

Note that there are two types of invariant tests:

stateless invariant tests: tests where the states of the test are independent of one other.;
stateful invariant tests: tests where the state of the next run is affected by all the previous states;

We can in fact find the undesired behaviour of endTheLottery just by performing the following test:

// SPDX-License-Identifier: MITpragma solidity ^0.8.19;import {Test} from "lib/forge-std/src/Test.sol";import {Lottery} from "src/Lottery.sol";import {DeployLottery} from "script/DeployLottery.s.sol";import {StdInvariant} from "lib/forge-std/src/StdInvariant.sol";contract LotteryTest is Test {    address player0 = makeAddr("Alice");    address player1 = makeAddr("Bob");    address player2 = makeAddr("Carl");    address player3 = makeAddr("David");    address player4 = makeAddr("Eleonor");    address owner = makeAddr("Owner");    uint256 public constant BALANCE = 100 ether;    Lottery lottery;    function setUp() public {        DeployLottery deployer = new DeployLottery();        vm.deal(player0, BALANCE);        vm.deal(player1, BALANCE);        vm.deal(player2, BALANCE);        vm.deal(player3, BALANCE);        vm.deal(player4, BALANCE);        vm.deal(owner, BALANCE);        lottery = deployer.run();        lottery.transferOwnership(owner);        targetContract(address(lottery));    }    function testOnlyOwnerCanEndTheLottery() public {        vm.expectRevert(            abi.encodeWithSelector(                Lottery.Lottery__notTheOwner.selector,                player0            )        );        vm.startPrank(player0); // not the owner        lottery.endTheLottery();        vm.stopPrank();    }    function testAssertNumberOfPlayers() public {        vm.startPrank(player0);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player1);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player2);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player3);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        vm.startPrank(player4);        lottery.joinLottery{value: 1 ether}();        vm.stopPrank();        uint256 expectedNumberOfPlayers = 5;        uint256 numberOfPlayers = lottery.getNumberOfPlayer();        assertEq(expectedNumberOfPlayers, numberOfPlayers);    }    function testFuzz_WinnerIsAlwaysPlayers0(uint96 numPlayers) public {        vm.startPrank(player0);        for (uint256 i = 5; i < numPlayers; i++) {            lottery.joinLottery{value: 1 ether}();        }        vm.stopPrank();        vm.startPrank(owner);        lottery.endTheLottery();        vm.stopPrank();        address expectedWinner = lottery.getPlayer(0);        assertEq(            lottery.getPlayer(0),            expectedWinner        );    }}

The test fails (as expected) and it returns the following logs to notify that there is at leas a situation in which endTheLottery has unexpected behavior:

Test result: FAILED. 2 passed; 1 failed; 0 skipped; finished in 4.96msFailing tests:Encountered 1 failing test in test/LotteryTest.t.sol:LotteryTest[FAIL. Reason: Lottery__notEnoughtPlayers() Counterexample: calldata=0x515cecbc0000000000000000000000000000000000000000000000000000000000000000, args=[0]] testFuzz_WinnerIsAlwaysPlayers0(uint96) (runs: 0, : 0, ~: 0)Encountered a total of 1 failing tests, 2 tests succeeded

In fact 0x515cecbc0000000000000000000000000000000000000000000000000000000000000000 is the hexadecimal representation of an even number. Adding an even number to 5 (the starting point of the for loop) results in an odd number, which triggers the second if statement in the endLottery function and thus the winner is players[1].

To go further

To learn more about testing Solidity contract with the Foundry framework and discover advanced testing techniques consult the Foundry docs.

And that's it for this article.

Thanks for reading.

For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me here.

Numerical methods for ODEs

Pietro Zanotta — Mon, 17 Jul 2023 10:00:00 GMT

In mathematics, an ordinary differential equation (ODE) is a type of differential equation whose definition and analysis rely exclusively on a single independent variable. The solution of an ODE is no different from the solution of any other differential equation, as the solutions are one or more functions that satisfy the equation.

Lets take a look at a simple differential equation

$$\frac{\delta y}{\delta x}=ky$$

where $k \in R$.

The solutions of the above equation are the functions whose derivatives are proportional to a constant factor ($k$) of the original function.

Bringing back some calculus, consider the function $y = ce^{kx}$, where $c$ is a real constant: the derivative of $y$ with respect to $x$ is $\frac{dy}{dx} = ky$. Consequently, the family of solutions is represented by the expression $y = ce^{kx}$, where q is a real number.

It is common to add an initial condition that gives the value of the unknown function at a particular point in the domain. For example:

$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$

It is straightforward to prove that the solution of the above system is $y = 2e^{2x}$.

Unfortunately, not every ODE can be directly solved explicitly, so numerical methods come to the rescue by providing an approximation to the solution.

It is worth noting that these numerical methods are not only useful for solving first-order ODEs but are equally valuable for addressing higher-order ODEs as well (i.e. ODE involving higher-order derivative) since a higher-order ODE can often be transformed into a system of first-order ODEs.

In this article, I introduce two among the multitude of available numerical methods.

Euler method

The Euler method offers a simple approach by breaking down the continuous ODE into discrete steps. The idea is to update the function's value based on its derivative at each step, effectively simulating the behavior of the ODE over a range of points. In fact, from any point $p$ on a curve, we can find an approximation of the nearby points on the curve by moving along the line tangent to $p$.

Let

$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=f(x, y(x))\\ y(x_0)=y_0 \end{cases} \end{equation}$$

be our initial system.

Replacing the derivate with its discrete version and rearranging we get

$$\begin{equation} \begin{cases} y(x+h)=y(x) + h f(x, y(x))\\ y(x _0)=y_0 \end{cases} \end{equation}$$

and we can compute the following recursive scheme

$$y_{n+1}=y_n+hf(x_n,y_n)$$

Graphically:

Using the above equation, we can now compute $y(x_n)$ $\forall \space x_n$ with the following steps:

store $y(x_0)=y_0$;
compute $y(x_1)=y_0+hf(x_0, y_0)$;
store $y(x_1)$;
compute $y(x_2)=y_1+hf(x_1, y_1)$;
store $y(x_2)$;

and so on.
We now want to approximate the solution of the initial system

$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$

and visualize the exact solution and the approximation

import numpy as npimport matplotlib.pyplot as plt# define params, ode and inital conditionk = 2f = lambda y, x: k*np.exp(k*y)h = 0.1x = np.arange(0, 1 + h, h)x_= np.arange(0, 1, 0.0001)y0 = 2 # initialize the y vectory = np.zeros(len(x))y[0] = y0# populate the y vectorfor i in range(0, len(x) - 1):    y[i + 1] = y[i] + h*f(x[i], y[i])# plot the resultsplt.plot(x, y, 'bo--', label='Approximated solution')plt.plot(x_, np.exp(k*x_)+1, 'g', label='Exact solution')plt.xlabel('x')plt.ylabel('y')plt.grid()plt.legend(loc='lower right')plt.show()

Runge-Kutta methods

The Euler method is often not enough accurate. One solution is to use more than one point in the interval $[x_n, x_{n+1}]$ as the Runge-Kutta methods do. The number of points in the interval $[x_n, x_{n+1}]$ defines the order of the method.

Second-Order Runge-Kutta Method (RK2)

Starting with the Runge-Kutta method of order 2, we need the following second-order Taylor expansion:

$$y(x+h)=y(x)+h\frac{\delta y}{\delta x}(x) + \frac {h^2}2\frac{\delta^2 y}{\delta x^2}(x)+\epsilon$$

where $\epsilon$ is the truncation error.

We can obtain $\frac{\delta^2 y}{\delta x^2}(x)$ by differentiating the ODE $\frac{\delta y}{\delta x}(x)=f(x, y(x))$:

$$\frac{\delta^2 y}{\delta x^2}(x)=\frac{\delta }{\delta x}f(x, y)+\frac{\delta}{\delta y}f(x, y)f(x, y)$$

and the Taylor expansion hence becomes

$$y(x+h)=y(x)+hf(x,y) + \frac {h^2}2(\frac{\delta }{\delta x}f(x, y)+\frac{\delta}{\delta y}f(x, y)f(x, y) )+\epsilon$$

After some manipulation, we obtain

$$y(x+h)=y(x)+\frac h2f(x,y) + \frac {h}2f(x+h,y+hf(x,y))+\epsilon$$

which corresponds to the following recursive scheme:

$$y_{n+1}=y_n+\frac h2(s_1+s_2)$$

with

$$s_1=f(x_n, y_n)\\$$

$$s_2 = f(x_n+\frac h2, y_n+\frac h2{s_1})$$

Note that $s_1$ and $s_2$ correspond to two different estimates of the slope of the solution and the method is nothing more than the average between the two.

Again using the above equation, we can compute $y(x_n)$ $\forall \space x_n$ with the following steps:

store $y(x_0)=y_0$;
compute $s_1 $ and $s_2$;
compute $y(x_1)=y_0+\frac h2(s_1+s_2)$;
store $y(x_1)$;
update $s_1 $ and $s_2$;
$y(x_2)=y_1+\frac h2(s_1+s_2)$;
store $y(x_2)$;

and so on.

Again, we want to approximate the solution of the initial system

$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$

import numpy as npimport matplotlib.pyplot as plt# define params, the ode and the inital consitionk = 2f = lambda y, x: k*np.exp(k*y)h = 0.1x = np.arange(0, 1 + h, h)x_= np.arange(0, 1, 0.0001)y0 = 2 # initialize thw y vectory = np.zeros(len(x))y[0] = y0# populate the y vectorfor i in range(0, len(x) - 1):    s1=f(x[i], y[i])    s2=f(x[i] + h/2, y[i]+  h/2*s1)    y[i + 1] = y[i] + h/2 * (s1+s2)# plot the resultsplt.plot(x, y, 'bo--', label='Approximated solution')plt.plot(x_, np.exp(k*x_)+1, 'g', label='Exact solution')plt.xlabel('x')plt.ylabel('y')plt.grid()plt.legend(loc='lower right')plt.show()

Fourth-Order Runge-Kutta Method (RK4)

Repeating what we did to find the recursive scheme for the Range-Kutta method of order 2, but using a fourth-order Taylor expansion, we obtain the following recursive scheme:

$$y_{n+1}=y_n+\frac h3(\frac {s_1}2 + s_2 + s_3 + \frac{s_4}2)$$

with

$$s_1 = f(x_n, y_n)$$

$$s_2 = f(x_n+\frac h2, y_n+\frac h2{s_1})$$

$$s_3 = f(x_n+\frac h2, y_n+\frac h2{s_2})$$

$$s_4 = f(x_n+h, y_n+h{s_3})$$

The steps used to approximate the system

$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$

are specular to the ones used in the Runge-Kutta method of order 2.

import numpy as npimport matplotlib.pyplot as plt# define params, the ode and the inital consitionk = 2f = lambda y, x: k*np.exp(k*y)h = .1x = np.arange(0, 1 + h, h)x_= np.arange(0, 1, 0.0001)y0 = 2 # initialize thw y vectory = np.zeros(len(x))y[0] = y0# populate the y vectorfor i in range(0, len(x) - 1):    s1=f(x[i], y[i])    s2=f(x[i] + h/2, y[i]+  h/2*s1)    s3=f(x[i] + h/2, y[i]+  h/2*s2)    s4=f(x[i] + h, y[i]+  h*s3)    y[i + 1] = y[i] + h/3 * (s1/2+s2+s3+s4/2)# plot the resultsplt.plot(x, y, 'bo--', label='Approximated solution')plt.plot(x_, np.exp(k*x_)+1, 'g', label='Exact solution')plt.xlabel('x')plt.ylabel('y')plt.grid()plt.legend(loc='lower right')plt.show()

There are also higher-order Range-Kutta methods but they are relatively inefficient, so I won't cover them in this article.

Comparison between the three methods

Euler Method:
- Accuracy: the Euler method is a first-order method, which means that it can accumulate a significant error over many steps or for stiff ODEs.
- Computational Complexity: the Euler method involves a single evaluation of the derivative function per step.
Second-Order Runge-Kutta Method (RK2):
- Accuracy: RK2 is a second-order method: it offers better accuracy than the Euler method and is less prone to accumulating a relevant error over many steps.
- Computational Complexity: RK2 requires two evaluations of the derivative function per step (one at the beginning and one at the midpoint).
Fourth-Order Runge-Kutta Method (RK4):
- Accuracy: RK4 is a fourth-order method, which implies that it's significantly more accurate than both Euler and RK2 methods, making it suitable for many practical applications.
- Computational Complexity: RK4 involves four evaluations of the derivative function per step, along with weighted combinations of these evaluations.
  Despite the higher computational cost compared to Euler and RK2, RK4 still remains a popular choice due to its reliability and accuracy.

Graphically:

import numpy as npimport matplotlib.pyplot as plt# define params, the ode and the inital consitionk = 2f = lambda y, x: k*np.exp(k*y)h = 0.1x = np.arange(0, 1 + h, h)x_= np.arange(0, 1, 0.0001)y0 = 2 # initialize thw y vectory = np.zeros(len(x))y[0] = y0# euler methodfor i in range(0, len(x) - 1):    y[i + 1] = y[i] + h*f(x[i], y[i])plt.plot(x, y, label='Euler', linestyle='--')# rk2 methodfor i in range(0, len(x) - 1):    s1=f(x[i], y[i])    s2=f(x[i] + h/2, y[i]+  h/2*s1)    y[i + 1] = y[i] + h/2 * (s1+s2)plt.plot(x, y, label='RK2', linestyle='--')# rk4 methodfor i in range(0, len(x) - 1):    s1=f(x[i], y[i])    s2=f(x[i] + h/2, y[i]+  h/2*s1)    s3=f(x[i] + h/2, y[i]+  h/2*s2)    s4=f(x[i] + h, y[i]+  h*s3)    y[i + 1] = y[i] + h/3 * (s1/2+s2+s3+s4/2)plt.plot(x, y, label='RK4', linestyle='--')plt.plot(x_, np.exp(k*x_)+1, label='Exact solution')plt.xlabel('x')plt.ylabel('y')plt.grid()plt.legend(loc='lower right')plt.show()

And that's it for this article.

Thanks for reading.

For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me here.

Simulated annealing in Python

Pietro Zanotta — Sat, 18 Mar 2023 13:06:20 GMT

Optimization is a crucial aspect of many fields, as it helps us find the best possible solution to a problem. In statistics, for example, its common to maximize the likelihood function or minimize the norm of residuals, in microeconomics optimization is used to study the behaviour of economic agents, who are assumed to maximize their utility subject to various constraints.

There are many different types of optimization problems, including linear programming, nonlinear programming, convex optimization, and integer programming, to name a few. Each type of optimization problem requires a different approach and a different set of algorithms to solve it.

In this post, I will talk about simulated annealing, which is a well-known algorithm but also is still exotic for noninitiated. For the sake of simplicity, I'll talk about minimization problems since seeking the maximum of a function $f$ equals seeking the maximum of a function $-f$.

Simulated annealing

Simulated annealing is an iterative method for solving unconstrained and bound-constrained optimization problems. The algorithm borrows inspiration from the physical process of heating a material and then slowly lowering the temperature.

At each iteration of the simulated annealing algorithm, a new point $x_i$ is randomly generated (if you don't know how computers deal with randomicity, see this article). As we'll see in a minute, the distance of the new point $x\_i$ from the current point $x\_{i-1}$ is proportional to the temperature and based on a certain probability distribution. The algorithm accepts all new points $x_i $ such that $f(x\_i) \leq f(x\_{i-1})$ where $f$ is the objective function (i.e. the function to be minimized), but also $x_i $ such that $f(x\_i) \geq f(x\_{i-1})$, with a certain probability. This property is significant and it prevents the algorithm from being trapped in local minima.

Simulated annealing with Python

First of all, we need to load some packages:

import mathimport random as rd

We now define the parameters, we need:

an objective function $f$;
a domain (where the algorithm should look for a solution);
initial temperature;
an initial point (which is usually selected randomly);
a step size;
a maximum number of iterations.

# 1) the objective functiondef f(x):    return x**3 - 8# 2) the domaindomain = [-10., 10.]# 3) initial temperaturestart_temp = 100# 4) starting valuex_0 = rd.uniform(domain[0], domain[1])# 5) the step sizestep_size = 2# 6) maximum number of iterationsmax_iter = 1000iteration = 0

First of all, we evaluate $x_0$ and assign $x_0$ and $y_0 $ to x_best and y_best (the best value since now) and x_curr and y_curr (the current solution).

y_0 = f(x_0)x_curr, y_curr = x_0, y_0x_best, y_best = x_0, y_0

The first step of the algorithm is to generate a new candidate solution $f(x_1) $ from the current solution $f(x_0)$. We count an iteration (this step is crucial, otherwise the algorithm would run forever).

x_1 = x_curr + step_size * rd.uniform(-1, 1)y_1 = f(x_1)iteration += 1

Since we are looking for a minimum, if y_1 is smaller than y_best, we assign y_1 and x_1 to y_best and x_best. We then calculate the difference between y_best and y_curr.

if y_1 < y_best:    x_best, y_best = x_1, y_1diff = y_1 - y_curr

Here comes the most exciting part: we update the temperature (using fast annealing schedule) and use this value to calculate the Metropolis criterion:

where $\Delta y$ is diff and $t$ is temp. The numbers represent the probability of accepting the transition from $x\_i$ to $x\_{i+1}$ and are what allows to escape local minima.

temp = start_temp / (iteration + 1.)metropolis = math.exp(-diff / temp)if diff <= 0 or rd.random() < metropolis:    x_curr, y_curr = x_best, y_best

And this is the last step. In fact after that, the algorithm calculates $x_3$ and $y_3$ and evaluates them for x_best and y_best and x_curr and y_curr and repeat itself until iteration == max_iter.

Making a function for simulated annealing

Since the algorithm at some point repeats itself, we may want to wrap it up in a function.

import mathimport numpy as npimport random as rddef simulated_annealing(f, domain, step_size, start_temp, max_iter = 1000):    x_0 = rd.uniform(domain[0], domain[1])    y_0 = f(x_0)    x_curr, y_curr = x_0, y_0    x_best, y_best = x_0, y_0    for n in range(max_iter):        x_i = x_curr + step_size * rd.uniform(-1, 1)        y_i = f(x_i)        if y_i < y_best:            x_best, y_best = x_i, y_i        diff = y_best - y_curr        temp = start_temp/ float(n + 1)        metropolis = math.exp(-diff / start_temp)        if diff <= 0 or rd.random() < metropolis:            x_curr, y_curr = x_i, y_i    return [y_best, x_best]

Note that we don't have to count the iterations since we are using a for loop.

If we test the function we see that for well-chosen parameters the algorithm finds the value with a good approximation.

def fun(x):    return x**2 + np.sin(x**4)simulated_annealing(f = fun, domain = [-3, 3], step_size = 1, start_temp = 100, max_iter = 1000)#> [9.915548806706291e-08, -0.00031488962865622305]

Finally, we plot what we got (the red line is the real minimum while the blue one is our result):

import numpy as npfrom matplotlib import pyplot as pltdef fun(x):    return x**2 + np.sin(x**4)x = np.linspace(-3, 3, 1000)y = fun(x)plt.plot(x, y)plt.axvline(x = 0, color = "blue", label = "real minimum")plt.axvline(x = 9.915548806706291e-08, color = "red", label = "approximate minimum")plt.legend(bbox_to_anchor = (1.0, 1), loc = "upper left")plt.show()

In the picture, the approximate minimum overlaps the real minimum (they are too close) and only the approximate minimum is visible.

Beyond 2D

Of course, the algorithm work also in more than one dimension, but the function needs some adjustment. In particular, we have to define a domain for y:

def simulated_annealing_3d(f, domain_x, domain_y, step_size, start_temp, max_iter = 1000):    x_0 = rd.uniform(domain_x[0], domain_x[1])    y_0 = rd.uniform(domain_y[0], domain_y[1])    z_0 = f(x_0, y_0)    x_curr, y_curr, z_curr = x_0, y_0, z_0    x_best, y_best, z_best = x_0, y_0, z_0    for n in range(max_iter):        x_i = x_curr + step_size * rd.uniform(-1, 1)        y_i = y_curr + step_size * rd.uniform(-1, 1)        z_i = f(x_i, y_i)        if z_i < z_best:            x_best, y_best, z_best = x_i, y_i, z_i        diff = z_i - z_curr        temp = start_temp / (n + 1)               metropolis = math.exp(-diff / temp)        if diff <= 0 or rd.random() < metropolis:            x_curr, y_curr, z_curr = x_i, y_i, z_i    return [z_best, y_best, x_best]

Let's test the function:

def fun_3d(x, y):    return (x-y)**2 + (x+y)**2simulated_annealing_3d(f = fun_3d, domain_x = [-5, 5], domain_y = [-5, 5], step_size = 1, start_temp = 1000, max_iter = 10000)#> [0.0007833147844967029, 0.018319454959260906, 0.007486986192318135]

If we plot the result:

from matplotlib import pyplot as pltx = np.linspace(-.1, .1, 20)y = np.linspace(-.1, .1, 20)X, Y = np.meshgrid(x, y)Z = fun_3d(X,Y)a = np.repeat(0, 50)b = np.repeat(0, 50)c = np.arange(0, .05, .001)a_ = np.repeat(0.015536251178558613, 50) b_ = np.repeat(-0.014426988378332561, 50)c_ = np.arange(0, .05, .001)fig = plt.figure(figsize=(4,4))ax = fig.add_subplot(111, projection='3d')ax.plot_wireframe(X, Y, Z, color = "red", linewidth = .3)ax.plot(a, b, c, color = "blue", label = "real minimum")ax.plot(a_, b_, c_, color = "green", label = "approximated minimum")ax.set_xlabel("x")ax.set_ylabel("y")ax.set_zlabel("z")plt.legend(bbox_to_anchor = (1.0, 1), loc = "upper left") plt.show()

Zooming we can appreciate the error.

And that's it for this article.

Thanks for reading.

For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me here.

Efficiently computing eigenvalues and eigenvectors in Python

Pietro Zanotta — Mon, 20 Feb 2023 11:00:39 GMT

Let $M$ be an $n \times n$ matrix. A scalar $\lambda $ is an eigenvalue of $M$ if there is a non-zero vector $x$ (called eigenvector) s.t.:

$$M x = \lambda x$$

Eigenvalues and eigenvectors are crucial in many fields of science. For example, consider a discrete-time and discrete states Markov chain, whose transition matrix $M$ is defined as follows:

Let the initial state vector $x_1$ be:

We know that from $M$ and $x\_1$ we could compute all the successive states and it's true that:

$$x\_2 = M x \_1$$

$$x\_3 = M x\_2$$

and in general

$$x\k = M x\{k-1}$$

We may want to find a vector $x$ s.t.

$$Mx = x$$

Vectors with this property as known as steady-state vectors. It can be demonstrated that finding steady-state vectors equals finding any eigenvector $x$ with eigenvalue 1.

For example, the steady-state vector for the matrix $M$ is:

and one can easily show that

Finding eigenvalues and eigenvectors is not always easy to do by hand, and there are some algorithms to compute them. Unfortunately, this calculation may be expensive, especially with large matrices, and the result may be inaccurate due to approximations.

However, some algorithms perform better than others, and I want to discuss some of them in this article.

Solving characteristic equation

We can rewrite $M x = \lambda x$ as

$$M x-\lambda x = 0$$

$$( M-\lambda I)x = 0$$

This system has a non-trivial solution (i.e. $x \neq 0$) only if $det(M-\lambda I) =0$. $det(M-\lambda I) =0$ is known as the characteristic equation.

Expanding $det(M-\lambda I) =0$ we obtain a polynomial of degree $n$, whose roots are the eigenvalues of $M$. Computing eigenvectors from eigenvalues is trivial: for each eigenvalue $\lambda$, we just need to find the null space of the matrix $M-\lambda I$.

This is how we compute eigenvalues and eigenvectors by hand, but following this approach on a computer leads to some problems:

it depends on the computation of the determinant, which is a time-consuming process (due to the symbolic nature of the computation);
there is no formula for solving polynomial equations of degree higher than 4. Even though some techniques exist, like Newton's method, it's tough to find all the roots.

Therefore we need a different approach.

Iterative methods

Unfortunately, there is no simple algorithm to directly compute eigenvalues and eigenvectors for general matrices (there are special cases of matrices where it's possible, but I won't cover them in this article).

However, there are iterative algorithms that produce sequences that converge to eigenvectors or eigenvalues. There are several variations of these methods, I'll just cover two of them: the power method and the QR algorithm.

The power method

This method applies to matrices that have a dominant eigenvalue $\lambda\_d$ (i.e. an eigenvalue that is larger in absolute value than the other eigenvalues).

Let $M$ be an $n \times n$ matrix, the power method approximates a dominant eigenvector in the following steps:

$$x\_1 = Mx\_0$$

$$x\_2 = Mx\_1$$

$$x\k = Mx\{k-1}$$

And the more steps we take (i.e. the bigger $k$ is) the more accurate will be our approximation. This is expressed in the following formula

Once we have an approximation of the dominant eigenvector $x\_d$ we find the corresponding dominant eigenvalue $\lambda\_d$ with the Rayleigh quotient

$$\frac{(Mx)x}{xx} = \frac{(\lambda\_d x)x}{xx} = \frac{\lambda\_d (xx)}{xx}\lambda\_d$$

Once we have $\lambda\_d$, we use the observation that if $\lambda$ is an eigenvalue of $M$, $\lambda - \beta$ is an eigenvalue of $M-\beta I$ for any scalar $\beta$. We can then apply the power method to compute a second eigenvalue. Repeating this process will allow us to compute all of the eigenvalues.

In Python this is:

import numpy as npdef power_method(M, n_iter = 100):    n = M.shape[0]    x_d = np.repeat(.5, n)    lambda_d = n    for i in range(n_iter):        x_0 = x_d        x_d = np.matmul(M, x_0)    lambda_d = np.matmul(np.matmul(M, x_d), x_d) / np.matmul(x_d, x_d)    h = np.zeros((n, n), int)    np.fill_diagonal(h, lambda_d)    N = M - h     x_1 = np.array([1, 0])    lambda_1 = n    for j in range(n_iter):        x_0 = x_1        x_1 = np.matmul(N, x_0)    lambda_1 = np.matmul(np.matmul(M, x_1), x_1) / np.matmul(x_1, x_1)    return [[x_d, lambda_d], [x_1, lambda_1]]

The function above works only for $2 \times2$ matrices, but can easily be modified to $n\times n$ matrices. We now test the function:

Matr = np.array([[1, 3], [2, 1]])power_method(Matr)#> [[array([1, 0.81649658]), 3.449489742783178],#> [array([-1.22474487, 1]), -1.449489742783178]]

We can even prove that those values represent a good approximation by checking the equation

$$Mx=\lambda x$$

Since this is an approximation, the == operator is not suited, we define instead the is_close function.

def is_close(x, y):    if all(abs(x-y) < 1e-5):        return True    else:        return FalseMatr = np.array([[.7, .2], [.3, .8]])sol = power_method(Matr)lambda_a = sol[0][1]lambda_b = sol[1][1]x_a = sol[0][0]x_b = sol[1][0]print(is_close(np.matmul(Matr, x_a), lambda_a * x_a))#> Trueprint(is_close(np.matmul(Matr, x_b), lambda_b * x_b))#> True

Above we defined the algorithm as follows

$$x\k = Mx\{k-1}$$

We can notice that if

$$x\{k-1} = Mx\{k-2}$$

then we can substitute

$$x\k = MMx\{k-2}$$

By induction, we can prove that

$$x\k = M^kx{0}$$

We now use this formula to update the Python function above. The new function is the following:

def power_method_2(M, n_iter = 100):    n = M.shape[0]               x_d = np.array([1, 0])    M_k = np.linalg.matrix_power(M, n_iter)     M_k = M_k / np.max(M_k)    x_d = np.matmul(M_k, x_d)    x_d = x_d / np.max(x_d)         lambda_d = np.matmul(np.matmul(M, x_d), x_d) / np.matmul(x_d, x_d)    D = np.zeros((n, n), float)    np.fill_diagonal(D, lambda_d)    N = M - D    x_nd = np.array([1,0])    N_k = np.linalg.matrix_power(N, n_iter)     N_k= N_k / np.max(N_k)    x_nd = np.matmul(N_k, x_nd)    x_nd = x_nd/np.max(x_nd)      lambda_nd = np.matmul(np.matmul(N, x_nd), x_nd) / np.matmul(x_nd, x_nd)    lambda_nd = lambda_nd + lambda_d     return [[x_d, lambda_d], [x_nd, lambda_nd]]

Again we test the function:

Matr = np.array([[.7, .2], [.3, .8]])sol_2 = power_method(Matr)lambda_a = sol_2[0][1]lambda_b = sol_2[1][1]x_a = sol_2[0][0]x_b = sol_2[1][0]print(is_close(np.matmul(Matr, x_a), lambda_a * x_a))#> Trueprint(is_close(np.matmul(Matr, x_b), lambda_b * x_b))#> True

Once we are sure both the functions work correctly, we can now test which has a better performance.

import timeit%timeit power_method(Matr)#> 558 s  32.6 s per loop (mean  std. dev. of 7 runs, 1000 loops each)%timeit power_method_2(Matr)#> 144 s  12.3 s per loop (mean  std. dev. of 7 runs, 10000 loops each)

And we have a winner: the second function is 3.875 times faster than the first one.

The QR algorithm

One of the best methods for approximating the eigenvalues and the eigenvectors of a matrix applies the QR factorization and for this reason is known as the QR algorithm.

Let $M$ be an $n\times n$ matrix, first of all, we need to factor it as

$$M = Q\_0R\_0$$

then we set

$$M\_1 = R\_0Q\_0$$

We then factor $M\_1 = Q\_1R\_1$ and define $M\_2 = R\_1Q\_1$ and so on.

It can be proven that $M$ is similar to $M\_1,M\_1, \dots, M\_k$, which means $M $ and $M\_1,M\_1, \dots, M\_k$ have the same eigenvalues.

It can also be shown that the matrices $M\_k$ converge to a triangular matrix $T$ and that the elements on the diagonal are the eigenvalues of $M\_k$.

In Python this is:

import numpy as npdef QR_argo(M, n_iter = 100):    n = M.shape[1]    Q_k = np.linalg.qr(M)[0]    R_k = np.linalg.qr(M)[1]    e_values = []    for i in range(n_iter):        M_k = np.matmul(R_k, Q_k)        Q_k = np.linalg.qr(M_k)[0]        R_k = np.linalg.qr(M_k)[1]    for j in range(M_k.shape[1]):        e_values.append(M_k[j, j])    return e_values

We can now test the function and compare it to the power method.

def is_close(x, y):    if abs(x-y) < 1e-5:        return True    else:        return FalseMatr = np.array([[1, 3], [2, 1]])pow_lambda_a = power_method(Matr)[0][1]pow_lambda_b = power_method(Matr)[1][1]QR_lambda_a = QR_argo(Matr)[0]QR_lambda_b = QR_argo(Matr)[1]is_close(QR_lambda_a, pow_lambda_a)#> Trueis_close(QR_lambda_b, pow_lambda_b)#> True

Once we have eigenvalues $\lambda\_i$, computing eigenvectors is easy: they are the non-trivial solution of

$$(M-\lambda\_i I) x=0$$

And that's it for this article.

Thanks for reading.

For any questions or suggestions related to what I covered in this article, please add them as a comment. In case of more specific inquiries, you can contact me here.

An introduction to PRNGs with Python and R

Pietro Zanotta — Sun, 29 Jan 2023 23:00:39 GMT

Life's most important questions are, for the most part, nothing but probability problems.
Pierre-Simon de Laplace

Introduction

Imagine this scenario: you and your brother want to go to the cinema. Two movies are played: Interstellar (the one you want to see) and A Clockwork Orange (that your brother wants to see).

The classic solution to this problem is flipping a coin, but since we are not unimaginative people (or we don't have a coin) we may want to find a more elegant solution.

Thus let's define a program that decides what to see in R and Python. The program will generate a number between 0 and 1. If this number is closer to 0 we watch Interstellar. Otherwise, A Clockwork Orange is chosen.

import random as rdx = rd.uniform(0, 1)if x < .5:    print("Interstellar")else:    print("A Clockwork Orange")

We now do the same in R:

x <- runif(1)ifelse(x < .5, "Interstellar", "A Clockwork Orange")

Fair enough, but there is something paradoxical in the previous examples: a computer, a perfectly deterministic machine, is creating something randomly.

In this article, I want to introduce you to pseudorandom number generators and their application.

Determinism versus randomness

Above I wrote "computer, a perfectly deterministic machine", but what does it mean to be deterministic?

In brief, computers are deterministic because they follow a set of instructions, or a program, in a predictable manner, i.e. given some inputs they return always the same output. The paradox lies in the fact that in the above example, x = rd.uniform(0, 1) and x <- runif(1) return a different value every time the line is compiled.

Are x = rd.uniform(0, 1) and x <- runif(1) exceptions to the deterministic property of computers?

The answer is no, and in a minute I'll explain the reasons behind that.

What is randomness

Before diving into PRNGs we need to define randomicity. We usually call random a sequence of numbers with the following trait:

lack of pattern: a random sequence should not have any discernible structure;
independence: the numbers in a random sequence should not be affected by one another;
unpredictability: a random sequence of numbers should not be able to be predicted or reconstructed.

It's important to notice that randomicity is a complex concept and it's hard to quantify it precisely. Therefore it's common to use statistical tests to evaluate the randomness of a sequence of numbers, but this is beyond the scope of this article.

Random number generators are mathematical algorithms or mechanical devices that produce a sequence that follows the above properties.

As you may suppose, there are two types of random number generators:

true random number generators (TRNGs from now on)
pseudorandom number generators (PRNGs from now on)

In this article, I'll just cover PRNGs but be aware that TRNGs exist and have important applications in many fields such as gaming, gambling and cryptography.

Pseudorandom number generators (PRNGs)

As the name suggests, pseudorandom number generators are a type of software used to generate a sequence of numbers that mimic the properties of truly random numbers. The algorithm takes an initial input (the seed) which produces a sequence. The seed is what determines the sequence of numbers, for example, if we set the 1234 seed, compiling multiple times the following lines of code, x remains the same. In Python this is:

import random as rdrd.seed(1234)x = rd.uniform(0, 1)print(x)

The same is true for the following R code:

set.seed(1234)x <- runif(1)print(x)

Properties

The goodness of a PRNG is given by its properties. The most important properties are:

periodicity: PRNGs will generate a sequence of numbers that repeats itself after a certain number of iterations, known as the period. A PRNG with a long period is more desirable than one with a shorter period;
uniformity: PRNGs generate numbers that are distributed uniformly across the range of possible values;
independence: the numbers generated by a PRNG should be independent of one another;
randomness: the numbers generated by a PRNG should not have any discernible patterns;
seed-ability: PRNGs should be able to be seeded with an initial value in order to produce a different sequence.

Two PRNGs algorithms

In this section, I want to present the most known PRNGs algorithms to practically show how PRNGs look like. I will present the middle square algorithm and the linear congruential generator algorithms.

Middle square algorithm

Proposed by von Neumann, the middle square algorithm takes a seed that is squared and its midterm is fetched as the random number. Let's discuss an example and then implement it in Python and R.

seed	square	random number
12	0144	14
33	1089	08
24	0576	57
66	4356	35

Usually, the algorithm is repeated more than once, i.e the random number becomes the new seed and is then squared and its midterm becomes the random number and so on.

Here is an implementation in Python:

import numpy as npdef middle_square_algo(seed):    # first of all we square the seed    square = str(np.square(seed))    # then we need to take the mid-term, we have two possibilities    # the square may have an even number of digits:    if len(square) % 2 == 0:        half_str = int(len(square) / 2)    # the number has an odd number of digits:    else:        half_str = int(len(square) / 2 - .5)    mid = square[half_str - 1 : half_str + 1]    return int(mid)# finally the testing:print(middle_square_algo(12))#> 14

And here is the R code:

middle_square_algo <- function(seed){  # first of all we square the seed  square <- seed^2  # we now need to get the number of digits of square  len <- nchar(square)  # we have two possible scenarios  # len is even:  if(len %% 2 == 0){    half_square <- len / 2  # len is odd:    } else{    half_square <- len / 2 + .5  }  square <- as.character(square)  mid <- substr(square, half_square, half_square + 1)  return(as.double(mid))}# finally the testing:print(middle_square_algo(33))#> 8

Assuming now that we want to loop more than one time the algorithm, the Python code is:

def middle_square_algo_deep(seed, deep):    # we just need to repeat what we did before but more than one time    for rep in range(deep):        seed = int(middle_square_algo(seed))    return seed# finally the testing:      middle_square_algo_deep(33, 3)#> 9

And similarly, the R code is:

middle_square_algo_deep <- function(seed, deep=2){  # we just need to repeat what we did before but more than one time  for( rep in 1:deep){    seed <- middle_square_algo(seed)  }  return(seed)}# finally the testing:middle_square_algo_deep(33, 3)#> 9

The most important weakness of this algorithm is that it needs an appropriate starting seed. In fact, some seeds have a really short period.

For example the seed 50 has a really short period (1) as shown in the following lines of code:

middle_square_algo_deep(50, 1)#> 50middle_square_algo_deep(50, 2)#> 50middle_square_algo_deep(50, 3)#> 50middle_square_algo_deep(50, 4)#> 50

Linear congruential generators

The linear congruential generators (LCGs) are a family of PRNGs and are probably the most used approach to generate pseudorandom numbers. The algorithms are defined by a linear congruential equation as the following one:

$$x_{n+1} = ax_n + b \space \space mod(y)$$

where $a$, $b$ and $c$ are positive integers and we also need a seed.

Let's now consider (and then implement) a particular LCG: the Lagged Fibonacci Generator (LFG).

$$x\_{n+1} = a\1 x\{n-1} + a\2 x\{n-j} + b \space \space mod(y)$$

We just need to provide LFG from $x\_1$ to $x\_{max(i, j)+1}$ and it will generate a pseudorandom sequence of numbers.

Let me make an example to clear your mind. Let the following equation be our LFG:

$$x\{n+1} = x\{n-3} + x\_{n-5} \space \space mod(10)$$

and let's say we want to generate a sequence of random numbers between 1 and 9 from the initial seed [4, 2, 9, 5, 5].

The sequence starts from $x\_6$ (you can easily prove that the values before $x_6$ don't exist by cause of $max(i, j) = 5$).

Thus the sequence is:

$$x\{6} = x\{3} + x\_{1} \space \space mod(10) = 9 + 4 \space \space mod(10) = 3$$

$$x\{7} = x\{4} + x\_{2} \space \space mod(10) = 5 + 2 \space \space mod(10) = 7$$

$$x\{8} = x\{5} + x\_{3} \space \space mod(10) = 5 + 9 \space \space mod(10) = 4$$

and so on.

We now implement the LFG in Python and R. In Python the algorithm is something like this:

def lagged_fib_gen(seed, i, j, mod, length, a_1 = 1, a_2 = 1, c = 0):    l_f = seed        # we suppose that i < j    for rep in range(max([i, j]) + 1, length + 1):        x = (a_1 * l_f[rep - i - 1] + a_2 * l_f[rep - j - 1]) % 10        l_f.append(x)    return l_f# finally the testing:lagged_fib_gen([4, 2, 9, 5, 5], 3, 5, 10, 10)#> [4, 2, 9, 5, 5, 3, 7, 4, 8, 2]

In R the algorithm is:

lagged_fib_gen <- function(seed, i, j, mod, length, a_1 = 1, a_2 = 1, c = 0){  l_f <- seed  for(rep in (max(c(i, j))+1):length){    x <- (a_1 * l_f[rep - i] + a_2 * l_f[rep - j]) %% mod    l_f[rep] <- x  }  return(l_f)}# finally the testing:lagged_fib_gen(c(4, 2, 9, 5, 5), 3, 5, 10, 10)#> 4 2 9 5 5 3 7 4 8 2

As for the middle square algorithm, the efficiency of LGCs depends on the chosen parameters.

Applications of PRNGs

Now we know what PRNGs are, but what are they used for? Well they have many applications, some examples include:

cryptography: random numbers are used to generate encryption keys (the PRNGs used in cryptography are much more complex than the two I showed before);
modelling: many scientific simulations use random numbers to represent uncertainty;
gaming: random numbers are used to make games less predictable and complex (e.g. biomes generation in Minecraft);
randomized algorithms: some algorithms use randomness to solve problems more efficiently (e.g. the famous Randomized Hill Climbing algorithm).

To go further

As you may imagine, the world of PRNGs is quite vast and complex and has applications in almost every field of science. This article doesn't want to be exhaustive on the topic and is no more than a gentle introduction to PRNGs. To go further there are many resources online but The Art of Computer Programming - Seminumerical Algorithms by D. Knuth and this CRAN vignette are great starting points.

Thanks for reading.

For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me here.

print("hello world")

Pietro Zanotta — Fri, 27 Jan 2023 10:47:50 GMT

Hello world, and welcome to this blog. This article is just an introduction to Algomath se (is pronounced as "muse"), but first let me introduce myself: my name is Pietro Zanotta and currently I am an economics student in Switzerland (to read more see here). My main interests are programming and statistics and this blog is my chance to tell the world what I'm learning in my free time.

Let me take a step back and explain why I decided to start this journey in blogging. In January 2023 I wrote an article about web scraping in R (you find it here) and I discovered how useful is to share knowledge to deeply understand a topic. In fact not only it requires a deep comprehension of the topic but also great summary skills. Therefore, I decided to create my own blog and here we are.

Embark on this captivating journey with me, as we explore the boundless horizons of science. Together, we'll uncover the extraordinary in the ordinary and ignite our curiosity to new heights.