<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Algomath μse]]></title><description><![CDATA[I write posts about programming and statistics, sharing what I daily learn.]]></description><link>https://amm.zanotp.com</link><generator>RSS for Node</generator><lastBuildDate>Sun, 12 Apr 2026 05:48:37 GMT</lastBuildDate><atom:link href="https://amm.zanotp.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Quantum Principal Component Analysis and Self-Tomography]]></title><description><![CDATA[High-dimensional data presents significant analytical challenges, for example some algorithms suffer the curse of dimensionality (i.e. as the number of dimensions increases, the volume of the data space grows exponentially, making the computation exp...]]></description><link>https://amm.zanotp.com/qpca</link><guid isPermaLink="true">https://amm.zanotp.com/qpca</guid><category><![CDATA[qpca]]></category><category><![CDATA[Pca]]></category><category><![CDATA[quantum machine learning (QML)]]></category><category><![CDATA[QML]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Fri, 07 Feb 2025 21:43:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ISHD1ovpJ-k/upload/8a0043530ae3f4bcb5db825afea6fc3a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>High-dimensional data presents significant analytical challenges, for example some algorithms suffer the curse of dimensionality (i.e. as the number of dimensions increases, the volume of the data space grows exponentially, making the computation expensive or even unfeasible), while the presence of many features might result in models becoming overly complex and learning noise instead of true patterns.</p>
<p>One of the most well known and powerful techniques to address high-dimensionality is known as Principal Component Analysis (PCA), a statistical technique used to simplify complex datasets by reducing their dimensionality while preserving the most important information. This blog post discusses PCA, focusing on the selection of the principal components, and then introduce a quantum circuit performing quantum Principal Component Analysis, a quantum algorithms providing an exponential speedup over PCA.</p>
<h2 id="heading-principal-component-analysis">Principal Component Analysis</h2>
<p>Principal Component Analysis (PCA), also known as the Karhunen-Loève transformation, the Hotelling transformation or the method of empirical orthogonal functions, aims to project \(p\)-dimensional vectors to the so-called principal components, i.e. \(q\)-dimensional vectors, where \(q
<p>There are several equivalent ways of deriving the principal components mathematically and the following section shows that finding the projectors maximizing the variance is equivalent to minimizing the means squared distance between the original vectors and their projections on to the principal components.</p>
<h3 id="heading-mathematics-of-principal-components">Mathematics of principal components</h3>
<p>Let a centered matrix \(X \in C^{n\times p}\) and let \(\{x_i\}_{i=1}^n\) be \(p\)-dimensional vectors (i.e. they represent the columns of \(X\)). The projection of the \(x_i\) on to a line (\(w\)) through the origin (for simplicity) is:</p>
<p>\[(x_i ^\dagger w) w\]</p><p>It’s relevant to note that the mean of the projections is zero being the vectors \(x_i\) centered:</p>
<p>\[\frac 1n \sum_i (x_i ^\dagger w) w=\frac 1n \sum_i (x_i)^\dagger w w\]</p><p>Being a projection, the projected vectors are (in general) different from the original vectors, which means there’s some error. Such error is defined as:</p>
<p>\[\begin{aligned}\left\|{x_i}-\left({w} ^\dagger {x_i}\right) {w}\right\|^2= &amp; \left({x_i}-\left({w} ^\dagger {x_i}\right) {w}\right) ^\dagger\left({x_i}-\left({w} ^\dagger {x_i}\right) {w}\right) \\ = &amp; {x_i} ^\dagger {x_i}-{x_i} ^\dagger\left({w} ^\dagger {x_i}\right) {w} \\ &amp; -\left({w} ^\dagger {x_i}\right) {w} ^\dagger {x_i}+\left({w} ^\dagger {x_i}\right) {w} ^\dagger\left({w} ^\dagger {x_i}\right) {w} \\ = &amp; \left\|{x_i}\right\|^2-2\left({w} ^\dagger {x_i}\right)^2+\left({w} ^\dagger {x_i}\right)^2 {w} ^\dagger {w} \\ = &amp; {x_i} ^\dagger {x_i}-\left({w} ^\dagger {x_i}\right)^2\end{aligned}\]</p><p>Since the vector \(w\) is defined as normal, the mean squared error (MSE) is equivalent to:</p>
<p>\[\text {MSE}=\frac 1n \sum_{i=1}^n {x_i} ^\dagger {x_i}-\left({w} ^\dagger {x_i}\right)^2\]</p><p>Considering that the first inner product doesn’t involve \(w\) and is therefore a constant, minimizing the MSE is then equivalent to:</p>
<p>\[\text{max}_w \frac 1n\sum_{i=1}^n (w^\dagger x_i)^2\]</p><p>Since the mean of a square is always equal to the square of the mean plus the variance, the function to be maximized is equivalent to:</p>
<p>\[\text{max}_w \frac 1n\sum_{i=1}^n (w^\dagger x_i)^2 = \text{max}_w \left(\frac 1n \sum_{i=1}^n x_i ^\dagger w\right)^2 + \text{var}(w^\dagger x_i)\]</p><p>However, since that the mean of the projections is zero (see the above), minimizing the residual sum of squares turns out to be equivalent to maximizing the variance of the projections.</p>
<p>This is true also if we don’t want to project on to just one vector, but on to multiple principal components.</p>
<p>Accordingly then, the variance \(\sigma\left({{w}}^2\right) \) is defined (in matrix form) as:</p>
<p>\[\begin{aligned} \sigma^2 \left({{w}}\right) &amp; =\frac{1}{n} \sum_i\left({x_i} ^\dagger {w}\right)^2 \\ &amp; =\frac{1}{n}({x w})^\dagger({x w}) \\ &amp; =\frac{1}{n} {w}^\dagger {x}^\dagger {x w} \\ &amp; ={w}^\dagger \frac{{x}^\dagger {x}}{n} {w} \\ &amp; ={w}^\dagger {V w}\end{aligned}\]</p><p>where \(V\) is the covariance matrix of \(x\).</p>
<p>Therefore the constrained maximization problem is:</p>
<p>\[\text{max}_w \space\sigma^2(w) \space \text{s.t.} \space w^\dagger w =1\]</p><p>Using the Lagrange multiplier \(\gamma\), the objective function becomes:</p>
<p>\[L(\gamma, w) = w^\dagger V w-\gamma(w^\dagger w-1)\]</p><p>The first order conditions are:</p>
<p>\[\begin{align} &amp; \frac {\partial L}{\partial w}\ = 2Vw-2\gamma w\\ &amp; \frac{\partial L}{\partial \gamma}\ = w^\dagger w-1\end{align}\]</p><p>Setting the derivatives to zero at optimum the system becomes:</p>
<p>\[\begin{align} &amp; Vw=\gamma w\\ &amp;w^\dagger w=1\end{align}\]</p><p>and from the top equation is clear that the \(w\) maximizing the variance are the orthonormal eigenvectors of the covariance matrix associated with the largest \(q\) eigenvalues \(\gamma\).</p>
<p>It’s clear that if the data are approximately \(q\)-dimensional (i.e. \(p-q\) eigenvalues are close to 0), the residual will be small and the \(R^2\) (the fraction of the original variance of the dependent variable kept by the fitted values), computed as:</p>
<p>\[R^2 = \frac{\sum_{i=1}^q \gamma_i}{\sum_{i=1}^p \gamma_i}\]</p><p>will be close to 1.</p>
<h3 id="heading-complexity-analysis-of-pca">Complexity analysis of PCA</h3>
<p>Assuming \(X\in C^{n\times p}\), the cost of PCA is:</p>
<ul>
<li><p>computing \(V\)is \(\mathcal O(n\times p^2)\)</p>
</li>
<li><p>computing the eigenvalues and eigenvectors requires \(\mathcal O(p^3)\)</p>
</li>
</ul>
<p>Hence the overall complexity is \(\mathcal O(n\times p^2 + p^3) \approx \mathcal O(p^3)\).</p>
<h2 id="heading-quantum-principal-component-analysis">Quantum Principal Component Analysis</h2>
<p>The idea behind Quantum Principal Component Analysis (qPCA) is to use quantum subroutines to perform PCA faster. In particular, the idea is to use Quantum Phase Estimation (QPE) to get information about the eigenvalues and the eigenvectors of a density matrix representing the covariance matrix. The next section is therefore about introducing QPE, while the following sections discuss qPCA.</p>
<h3 id="heading-quantum-phase-estimation">Quantum Phase Estimation</h3>
<p>Let \(U\) a unitary operator and let \(\ket{u_k}\) and \(e^{i\lambda_k}\) be the \(k\)-th eigenvector and eigenvalue of \(U\). Assume also a generic state \(\ket \psi\), which can always be defined as:</p>
<p>\[\ket \psi= \sum_{k=1}^nc_k\ket {u_k}\]</p><p>The goal of the QPE is to perform the following transformation:</p>
<p>\[\text{QPE}:\ket{0}^{\otimes n}\ket{\psi} \rightarrow \sum_{k=1}^nc_k\ket{\lambda_k }\ket{u_k}\]</p><p>where \(\ket {\lambda_k}\) is the quantum state \(\ket {j_1\dots j_n}\) corresponding to \(n\) digits representation of the binary fractional representing the eigenvalue phase.</p>
<p>The circuit of the algorithm is the following:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738960092600/39527e43-e9d1-4870-92da-24ec82088e82.png" alt class="image--center mx-auto" /></p>
<p>First, each of the nn qubits initialized to \(\ket 0\) is subjected to the Hadamard gate and control unitary operations acting on \(\ket u\), which perform the following transformation:</p>
<p>\[\ket 0^{\otimes n}\ket u \rightarrow \bigotimes_{k=1}^n \left(\frac{\ket 0 +e^{i2\pi0.j_1\dots j_k}\ket {1}}{\sqrt 2} \right) \otimes\ket u\]</p><p>In other words, binary decimal representation of the phase of the eigenvalue is stored in the phase of each auxiliary qubit, with digits shift one position repeatedly.</p>
<p>The states of the \(n\) auxiliary qubits have exactly the same form as the expression for the result of the quantum Fourier transform, therefore applying the inverse quantum Fourier transform and measuring the \(n\)ancilla bit results in \(0.j_1\dots j_n\) which the \(\lambda\).</p>
<h3 id="heading-density-matrix-exponentiation">Density matrix exponentiation</h3>
<p>One can then imagine that if we’re able to encode the covariance matrix as a quantum gate, we can use QPE to obtain information regarding the eigenvalues and the eigenvectors. That’s indeed true, however the most important property of quantum gates is unitarity, i.e. for any quantum gate \(G\):</p>
<p>\[G^\dagger= G^{-1}\]</p><p>This however is not generally true for covariance matrices, however, one can make a covariance matrix unitary by using exponentiation. Assume the covariance matrix has been encoded in a density matrix \(\rho\) and assume one is presented with \(n\) copies of \(\rho\). The density matrix exponential:</p>
<p>\[e^{-i\rho t}\]</p><p>is unitary.</p>
<p>One method to perform such exponentiation up to \(n\)-th order in \(t\) is to repeat the following:</p>
<p>\[\text{Tr}_p \left[e^{-iS\Delta t} \otimes \sigma \otimes e^{iS\Delta t}\right] = \sigma -i\Delta t[\rho, \sigma]+\mathcal O(\Delta t^2)\]</p><p>where \(\sigma\) is any density matrix, \(S\) is the swap operator and \([A, B] = A-B\) and \(\text{Tr}_p\) is the partial trace on the first system. It’s worth to note that since \(S\) is a sparse matrix, the exponentiation of \(S\) can be computed efficiently. Applying the above formula \(n\) times leads to:</p>
<p>\[e^{-i\rho n\Delta t} \otimes \sigma \otimes e^{i\rho n\Delta t}\]</p><p>which, couple with the quantum matrix inversion technique of (<a target="_blank" href="https://arxiv.org/pdf/0811.3171">Harrow, Hassidim, Lloyd, 2009, “Quantum algorithm for linear systems of equations“</a>) allows to efficiently construct the exponential of \(\rho\).</p>
<p>So, assuming a non-sparse positive \(X\) whose trace is 1, to construct:</p>
<p>\[e^{-iXt}\]</p><p>requires to factor \(X\) as:</p>
<p>\[X=A^\dagger A=\sum_i |a_i|\ket{\hat a_i}\ket {e_i}\]</p><p>where \(A=\sum_i |a_i|\ket{\hat a_i}\ket {e_i}\) \(\hat a_i\) is the version of \(a_i\) normalized to 1, the columns of \(A\) and \(\ket {e_i}\) is an orthonormal basis. Assuming a qRAM (<a target="_blank" href="https://arxiv.org/pdf/0708.1879">Giovannetti, Lloyd, Maccone, 2008, “Quantum random access memory“</a>) that performs the following:</p>
<p>\[\ket i \ket 0 \ket 0 \rightarrow \ket i\ket {\hat a_i} \ket {|a_i|}\]</p><p>one can easily construct the state \(\ket \psi = \sum_i |a_i|\ket {e_i}\ket {\hat a_i}\), whose density matrix is:</p>
<p>\[(\sum_i |a_i|\ket {e_i}\ket {\hat a_i})\otimes (\sum_i |a_i|\bra{e_i}\bra{\hat a_i}) = X\]</p><p>So by using \(n=\mathcal O(t^2\epsilon^{-1})\) copies of \(X\) one can implement \(e^{-iXt}\) with accuracy \(\epsilon\) in \(\mathcal O(n\log d)\) time.</p>
<h3 id="heading-obtaining-the-principal-components-using-self-tomography">Obtaining the Principal Components using self-tomography</h3>
<p>Once the exponentiation of the covariance matrix is performed, one can use QPE to find the eigenvectors and eigenvalues of the density matrix using conditional application of:</p>
<p>\[e^{-iXt}\]</p><p>for varying times \(t\), using \(\ket \psi\) as the initial state, resulting in the following state:</p>
<p>\[\sum_ir_i\ket{\chi_i}\bra {\chi_i}\ket{\tilde r_i}\bra{\tilde r_i}\]</p><p>where \(\ket {\chi_i}\) are the eigenvalues of \(X\) and \(\ket {\tilde r_i}\) are the corresponding eigenvalues.</p>
<p>The extraction of \(i\)-th eigenvalues are then determined by measuring the expectation value of the eigenvector with eigenvalue \(r_i\):</p>
<p>\[\bra{\chi_i}M \ket{\chi_i}\]</p><p>This process, called quantum self-tomography, reveals eigenvalues and eigenvectors in time \(\mathcal O(R\log d)\), where \(R\) is the rank of \(X\), resulting in an exponential speedup over classical PCA.</p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here.</strong></a></p>
<h2 id="heading-sources">Sources:</h2>
<ul>
<li><p><a target="_blank" href="https://arxiv.org/pdf/1307.0401">Lloyd, Mohseni, Rebentrost, “Quantum principal component analysis”, 2014</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2010.00831">He, Li, Liu, Wang, “A Low Complexity Quantum Principal Component Analysis Algorithm”, 2021</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2501.07891v1">Nghiem, “New Quantum Algorithm for Principal Component Analysis”, 2025</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/0811.3171">Harrow, Hassidim, Lloyd, 2009, “Quantum algorithm for linear systems of equations“</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/0708.1879">Giovannetti, Lloyd, Maccone, 2008, “Quantum random access memory“</a></p>
</li>
</ul>
</p>]]></content:encoded></item><item><title><![CDATA[Post-Variational Quantum Neural Networks]]></title><description><![CDATA[Variational quantum circuit are among the most promising methods for dealing with optimization problems, combinatorial optimization and quantum machine learning. However, despite their popularity, many of the ansatze upon which such circuit relies su...]]></description><link>https://amm.zanotp.com/pvqnn</link><guid isPermaLink="true">https://amm.zanotp.com/pvqnn</guid><category><![CDATA[quantum neural networks]]></category><category><![CDATA[quantum computing]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sun, 02 Feb 2025 23:54:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Gow8svoRZBA/upload/131f6fb153edfbe6d52e7787693477bb.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Variational quantum circuit are among the most promising methods for dealing with optimization problems, combinatorial optimization and quantum machine learning. However, despite their popularity, many of the ansatze upon which such circuit relies suffer of well documented barren plateau problem [3] as the quantum hardware noise or circuit depth increases. Moreover the training landscape doesn’t in generally correspond to any well-characterized optimization program, therefore making the investigation difficult. Because of that problem, multiple alternatives to variational quantum circuits have been studied. [7] for example, proposed to use classical combinations of quantum states to solve linear systems with near-term quantum computers and the idea of using combinations of quantum states and systematically generate ansatze has proven a viable alternative to variational solutions that can circumvent the barren plateau problem, and has found application in quantum eigen solvers [4], semidefinite programming [6] and simulations [5].</p>
<p>This blog post explores an alternative to variational quantum models, called post-variational quantum models and in particular post-variational quantum neural networks, a quantum machine learning model based on ensemble strategies which doesn’t rely on a single trainable circuit but on a classical combination of fixed circuits.</p>
<h2 id="heading-variational-circuits">Variational Circuits</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738512468827/067bdc3e-c1b1-4f19-bffd-872477aa2889.png" alt class="image--center mx-auto" /></p>
<p>Variational quantum circuits (see the picture above, coming from [1]) are hybrid methods referring to a large class of circuits operating on pure states and are also known as quantum neural networks when applied to machine learning tasks.</p>
<p>Such circuit operates in the following manner:</p>
<ul>
<li><p>encoding data \(x\) into a n-qubit quantum state \(\rho(x) \in \mathcal M_{2^n\times 2^n}\)</p>
</li>
<li><p>a parametrized circuit \(U(\theta) \) (also called ansatz) is the applied on the encoded state, with parameters \(\theta \in R^{d}\), resulting in:</p>
</li>
</ul>
<p>$$\rho(x, \theta) = U(\theta)\rho(x)U(\theta)^\dagger$$</p><ul>
<li>an estimation of the results of such circuits is then constructed with an observable \(O\):</li>
</ul>
<p>$$E_\theta(x)= tr(O\rho(x, \theta))$$</p><ul>
<li>the parameters \(\theta\) are optimized using a gradient based optimization relying on the gradient of the variational quantum circuit (typically computed with <a target="_blank" href="https://amm.zanotp.com/computing-gradients-of-quantum-circuits-using-parameter-shift-rule">parameter shift rule</a>).</li>
</ul>
<p>The main challenges of variational quantum circuits are the following:</p>
<ul>
<li><p>despite some problem inspired ansatze exist, defining problem agnostic ansatze that are expressive enough to represent a useful function but don’t suffer of the barren plateau problem is a challenge and an open research direction</p>
</li>
<li><p>implementing continuous parameterized rotations on real hardware is limited by the precision of control electronics.</p>
</li>
</ul>
<h2 id="heading-post-variational-circuits">Post-Variational Circuits</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738513556949/eeff5ecc-d681-440d-8bb5-35f1c65bc055.png" alt class="image--center mx-auto" /></p>
<p>Because of the general difficulty and lack of training guarantees provided by variational algorithms, [1] has proposed a new approach, named “post-variational“. This approach (see the picture above, coming from [1]) basically replaces the parametrized quantum circuit of the variational approach with fixed circuits (parametrized only by the input data) and find an optimal combination of such the result of such circuits.</p>
<p>The idea is therefore to combine observable and ansatz in a single parametrized observable:</p>
<p>$$\mathcal D(\theta) = U(\theta)^\dagger OU (\theta)$$</p><p>Since any observable can be expressed as a linear combination of Hermitian matrices, one can express the observable \(\mathcal D(\theta)\) as linear combinations:</p>
<p>$$\mathcal D(\theta) = \sum_{i=1}^q a_i(\theta)\mathcal D_i$$</p><p>This comes from the fact that \(U(\theta)\) can be written as the product of unitary matrices:</p>
<p>\[U(\theta) = \prod_{i=1}^s U_i(\theta_i)\]</p><p>and because of Stone’s theorem, \(U(\theta)\) can be written as:</p>
<p>\[U(\theta) = \prod_{i=1}^s W_ie^{j\theta_iH_i}V_i\]</p><p>where \(j\) is the imaginary unit, \(W_i\) and \(V_i\) are fixed matrices and \(H_i\) are Hermitian matrices.</p>
<p>Thanks to the Baker–Campbell–Hausdorff identity one can represent \(U(\theta)\) as:</p>
<p>\[\prod_{i=1}^sV_i^\dagger\left(\sum_{k=0}^\infty \frac{[(j\theta_iH_i)^k, W_i^\dagger O W_i]} {k!} \right)V_i=\prod_{i=1}^s\sum_{k=0}^\infty\frac{\theta_i^k}{k!}V_i^\dagger[(jH_i)^k, W_i^\dagger O W_i]V_i\]</p><p>where \([(X)^n, Y) = [X, \dots, [X,[X, Y]]]\). Since \(jH_i\) is anti-Hermitian, \([(j\theta_iH_i)^k, W_i^\dagger O W_i]\) is Hermitian for all \(i\), which allows to rewrite \(U(\theta)\) as a weighted polynomial sum of Hermitian matrices against \(\theta\) which allows the following:</p>
<p>\[\mathcal D(\theta) = U(\theta)^\dagger OU(\theta)\sum_{i=1}^q a_i(\theta)\mathcal D_i\]</p><p>Moreover, since any Hermitian operator can be expressed in a basis of Pauli matrices:</p>
<p>\[H \in M_{2\times2}(C)^{\otimes n} \in \text{span}({X; Y; Z; I}^{\otimes n})\]</p><p>then \(\mathcal D(\theta)\) pertains to the same space and therefore at most \(4^n\) terms are necessary to express the optimal answer. Therefore, considering that a variational quantum circuit takes \(\mathcal O(poly(s))\)parameters to express the optimal solution, while the post-variational approach takes \(\mathcal O(4^n)\), the variational approach has an advantage, coming from the fact that it is to generate different observables on higher orders of \(\theta\), something that classical computers cannot achieve.</p>
<p>However, in order to get an approximate solution, one can restrict the number of Hermitian terms used in the post-variational approach to \(\mathcal O(poly(s))\), renouncing to some expressibility.</p>
<h3 id="heading-estimation-of-the-parameters-in-the-post-variational-setting">Estimation of the parameters in the post-variational setting</h3>
<p>The estimation in the post variational setting is:</p>
<p>\[E_\theta = tr\left(\mathcal D(\theta) \rho(x)\right)= \sum_{i=1}^qa_i(\theta)tr\left(D_i\rho(x)\right)\]</p><p>One can consider \(\sum_{i=1}^qa_i(\theta)tr\left(D_i\rho(x)\right)\) as a function \(\mathcal H_\theta:\) \(R^q \rightarrow R\) s.t.:</p>
<p>\[E_\theta = \mathcal H_\theta\left(\left\{tr\left(D_i\rho(x)\right)\right\}_{i=1}^q\right)\]</p><p>and exploiting the universal approximation theorem, the function \(\mathcal H_\theta\) can be approximated by a neural network.</p>
<h3 id="heading-design-principles-of-post-variational-quantum-circuits">Design principles of post-variational quantum circuits</h3>
<p>So far the only challenge pertaining to the post-variational design mentioned is the exponential amount of possible circuits. However the post-variational setting has another major challenge: the heuristic choice of fixed circuits and observables. the authors of [1] describe multiple strategies to decide the observables \(\mathcal D_i\).</p>
<h4 id="heading-ansatz-expansion">Ansatz expansion</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738521125836/24503657-d53a-422b-b87a-66cd552e9333.png" alt class="image--center mx-auto" /></p>
<p>The first strategy outlined is to replace a problem agnostic parametrized ansatz \(U(\theta)\)coming from a variational quantum circuit with an ensemble of fixed ansatze \(\{U_\alpha\}_{\alpha=1}^p\). The authors use truncated Taylor polynomial expansions of the variational parameters to generate fixed ansatze for the model and use <a target="_blank" href="https://amm.zanotp.com/computing-gradients-of-quantum-circuits-using-parameter-shift-rule">parameter-shift rules</a> to find derivatives of the trace-induced measurements of parameterized quantum circuits.</p>
<p>Therefore the full Taylor expansion of \(U^\dagger(\theta)OU(\theta)\) can be expressed a linear combination of \(U^\dagger(\theta')OU(\theta')\) where \(\theta \in \{0,\pm\frac \pi 2\}^k\). For a truncation of order \(R\), the number of circuit required is:</p>
<p>\[\sum_{j=0}^R{k\choose j}2^j \in \mathcal O(2^Rk^R)\]</p><p>which scales fast if a deep ansatz is chosen. To reduce the number of circuits required, one can adopt pruning techniques.</p>
<h4 id="heading-observable-construction">Observable construction</h4>
<p>The observable construction strategy decomposes the parametrized observable \(\mathcal D(\theta)\) agains the basis of quantum observables s.t.:</p>
<p>\[\mathcal D(\theta^*)\rightarrow \mathcal D(\alpha)= \sum_{P\in\{X; Y; Z; I\}^{\otimes n}}\alpha_P P\]</p><p>The real problem of this strategy is that it scales exponentially with the number of qubits used in the system, therefore an heuristic selection is necessary. Considering all Pauli observables withing a locality \(L\) is considered a good heuristic, being most physical Hamiltonian local as well. If the target observable is \(L\)-local, one can exploit the classical shadows method [2] to reduce the number of measurement required while obtaining the same additive error term. In the case that the observables are the complete set of L-local Paulis, the number of observables required is:</p>
<p>\[\sum_{j=0}^L{n\choose j}3^j \in \mathcal O(3^ln^L)\]</p><p>while if the classical shadow method us used, the number of random measurement of the circuit is:</p>
<p>\[\mathcal O(3^LL\log n)\]</p><h4 id="heading-hybrid-approach">Hybrid approach</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738522315813/350cc2d4-3d2a-4e76-8a4e-1ae7ffc96f51.png" alt class="image--center mx-auto" /></p>
<p>One might want to use ansatz circuits during the construction of observables, in order to increase the expressivity of the model. A strategy might therefore combine both the ansatz expansion strategy and the observable construction strategy.</p>
<p>The idea is that, instead of directly expanding \(U(\theta)\) in \(\mathcal D(\theta) = U^\dagger (\theta)OU(\theta)\), the ansatz is split into two unitaries:</p>
<p>\[U(\theta)=U_B(\theta)U_A(\theta)\]</p><p>and therefore:</p>
<p>\[\mathcal D(\theta) =U_A^\dagger(\theta)U_B^\dagger(\theta)OU_B(\theta)U_A(\theta)\]</p><p>Let \(\mathcal D'(\theta) = U^\dagger_B(\theta) OU_B(\theta)\), it can be decomposed into a linear combination of Paulis using the observable construction strategy. On the other hand the remaining Ansatz \(U_A(\theta)\) can be expanded using the ansatz expansion method. Last pruning techniques can be used to reduce the number of circuits.</p>
<h4 id="heading-numerically-comparing-post-variational-and-variational-quantum-neural-network">Numerically comparing Post-Variational and Variational Quantum Neural Network</h4>
<p>The following example demonstrates how to employ the post-variational quantum neural network on the classical machine learning task of image classification. The example comes from the <a target="_blank" href="https://pennylane.ai/qml/demos/tutorial_post-variational_quantum_neural_networks">Pennylane documentation</a>.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pennylane <span class="hljs-keyword">as</span> qml
<span class="hljs-keyword">from</span> pennylane <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> jax
<span class="hljs-keyword">from</span> jax <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> jnp
<span class="hljs-keyword">import</span> optax
<span class="hljs-keyword">from</span> itertools <span class="hljs-keyword">import</span> combinations
<span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> load_digits
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> sklearn.neural_network <span class="hljs-keyword">import</span> MLPClassifier
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> log_loss
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> matplotlib.colors
<span class="hljs-keyword">import</span> warnings
warnings.filterwarnings(<span class="hljs-string">"ignore"</span>)
np.random.seed(<span class="hljs-number">42</span>)

<span class="hljs-comment"># Load the digits dataset with features (X_digits) and labels (y_digits)</span>
X_digits, y_digits = load_digits(return_X_y=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># Create a boolean mask to filter out only the samples where the label is 2 or 6</span>
filter_mask = np.isin(y_digits, [<span class="hljs-number">2</span>, <span class="hljs-number">6</span>])

<span class="hljs-comment"># Apply the filter mask to the features and labels to keep only the selected digits</span>
X_digits = X_digits[filter_mask]
y_digits = y_digits[filter_mask]

<span class="hljs-comment"># Split the filtered dataset into training and testing sets with 10% of data reserved for testing</span>
X_train, X_test, y_train, y_test = train_test_split(
    X_digits, y_digits, test_size=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">42</span>
)

<span class="hljs-comment"># Normalize the pixel values in the training and testing data</span>
<span class="hljs-comment"># Convert each image from a 1D array to an 8x8 2D array, normalize pixel values, and scale them</span>
X_train = np.array([thing.reshape([<span class="hljs-number">8</span>, <span class="hljs-number">8</span>]) / <span class="hljs-number">16</span> * <span class="hljs-number">2</span> * np.pi <span class="hljs-keyword">for</span> thing <span class="hljs-keyword">in</span> X_train])
X_test = np.array([thing.reshape([<span class="hljs-number">8</span>, <span class="hljs-number">8</span>]) / <span class="hljs-number">16</span> * <span class="hljs-number">2</span> * np.pi <span class="hljs-keyword">for</span> thing <span class="hljs-keyword">in</span> X_test])

<span class="hljs-comment"># Adjust the labels to be centered around 0 and scaled to be in the range -1 to 1</span>
<span class="hljs-comment"># The original labels (2 and 6) are mapped to -1 and 1 respectively</span>
y_train = (y_train - <span class="hljs-number">4</span>) / <span class="hljs-number">2</span>
y_test = (y_test - <span class="hljs-number">4</span>) / <span class="hljs-number">2</span>
</code></pre>
<p>To visualize some of the digits:</p>
<pre><code class="lang-python">fig, axes = plt.subplots(nrows=<span class="hljs-number">2</span>, ncols=<span class="hljs-number">3</span>, layout=<span class="hljs-string">"constrained"</span>)
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">2</span>):
    <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(<span class="hljs-number">3</span>):
      axes[i][j].matshow(X_train[<span class="hljs-number">2</span>*(<span class="hljs-number">2</span>*j+i)])
      axes[i][j].axis(<span class="hljs-string">'off'</span>)
fig.subplots_adjust(hspace=<span class="hljs-number">0.0</span>)
fig.tight_layout()
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738523659827/a6f57ed5-c67a-4f42-a5a0-543f5ab1e951.png" alt class="image--center mx-auto" /></p>
<p>Now it’s time to train the QML models:</p>
<ul>
<li><p>first will embed our data through a series of rotation gates</p>
</li>
<li><p>will then have an ansatz of rotation gates with parameters’ weights</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">feature_map</span>(<span class="hljs-params">features</span>):</span>
    <span class="hljs-comment"># Apply Hadamard gates to all qubits to create an equal superposition state</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(features[<span class="hljs-number">0</span>])):
        qml.Hadamard(i)

    <span class="hljs-comment"># Apply angle embeddings based on the feature values</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(features)):
        <span class="hljs-comment"># For odd-indexed features, use Z-rotation in the angle embedding</span>
        <span class="hljs-keyword">if</span> i % <span class="hljs-number">2</span>:
            qml.AngleEmbedding(features=features[i], wires=range(<span class="hljs-number">8</span>), rotation=<span class="hljs-string">"Z"</span>)
        <span class="hljs-comment"># For even-indexed features, use X-rotation in the angle embedding</span>
        <span class="hljs-keyword">else</span>:
            qml.AngleEmbedding(features=features[i], wires=range(<span class="hljs-number">8</span>), rotation=<span class="hljs-string">"X"</span>)

<span class="hljs-comment"># Define the ansatz (quantum circuit ansatz) for parameterized quantum operations</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">ansatz</span>(<span class="hljs-params">params</span>):</span>
    <span class="hljs-comment"># Apply RY rotations with the first set of parameters</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">8</span>):
        qml.RY(params[i], wires=i)

    <span class="hljs-comment"># Apply CNOT gates with adjacent qubits (cyclically connected) to create entanglement</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">8</span>):
        qml.CNOT(wires=[(i - <span class="hljs-number">1</span>) % <span class="hljs-number">8</span>, (i) % <span class="hljs-number">8</span>])

    <span class="hljs-comment"># Apply RY rotations with the second set of parameters</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">8</span>):
        qml.RY(params[i + <span class="hljs-number">8</span>], wires=i)

    <span class="hljs-comment"># Apply CNOT gates with qubits in reverse order (cyclically connected)</span>
    <span class="hljs-comment"># to create additional entanglement</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">8</span>):
        qml.CNOT(wires=[(<span class="hljs-number">8</span> - <span class="hljs-number">2</span> - i) % <span class="hljs-number">8</span>, (<span class="hljs-number">8</span> - i - <span class="hljs-number">1</span>) % <span class="hljs-number">8</span>])
</code></pre>
<p>We first test the performance of a shallow variational algorithm on the digits dataset:</p>
<pre><code class="lang-python">dev = qml.device(<span class="hljs-string">"default.qubit"</span>, wires=<span class="hljs-number">8</span>)


<span class="hljs-meta">@qml.qnode(dev)</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">circuit</span>(<span class="hljs-params">params, features</span>):</span>
    feature_map(features)
    ansatz(params)
    <span class="hljs-keyword">return</span> qml.expval(qml.PauliZ(<span class="hljs-number">0</span>))


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">variational_classifier</span>(<span class="hljs-params">weights, bias, x</span>):</span>
    <span class="hljs-keyword">return</span> circuit(weights, x) + bias


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">square_loss</span>(<span class="hljs-params">labels, predictions</span>):</span>
    <span class="hljs-keyword">return</span> np.mean((labels - qml.math.stack(predictions)) ** <span class="hljs-number">2</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">accuracy</span>(<span class="hljs-params">labels, predictions</span>):</span>
    acc = sum([np.sign(l) == np.sign(p) <span class="hljs-keyword">for</span> l, p <span class="hljs-keyword">in</span> zip(labels, predictions)])
    acc = acc / len(labels)
    <span class="hljs-keyword">return</span> acc


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">cost</span>(<span class="hljs-params">params, X, Y</span>):</span>
    predictions = [variational_classifier(params[<span class="hljs-string">"weights"</span>], params[<span class="hljs-string">"bias"</span>], x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> X]
    <span class="hljs-keyword">return</span> square_loss(Y, predictions)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">acc</span>(<span class="hljs-params">params, X, Y</span>):</span>
    predictions = [variational_classifier(params[<span class="hljs-string">"weights"</span>], params[<span class="hljs-string">"bias"</span>], x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> X]
    <span class="hljs-keyword">return</span> accuracy(Y, predictions)


np.random.seed(<span class="hljs-number">0</span>)
weights = <span class="hljs-number">0.01</span> * np.random.randn(<span class="hljs-number">16</span>)
bias = jnp.array(<span class="hljs-number">0.0</span>)
params = {<span class="hljs-string">"weights"</span>: weights, <span class="hljs-string">"bias"</span>: bias}
opt = optax.adam(<span class="hljs-number">0.05</span>)
batch_size = <span class="hljs-number">7</span>
num_batch = X_train.shape[<span class="hljs-number">0</span>] // batch_size
opt_state = opt.init(params)
X_batched = X_train.reshape([<span class="hljs-number">-1</span>, batch_size, <span class="hljs-number">8</span>, <span class="hljs-number">8</span>])
y_batched = y_train.reshape([<span class="hljs-number">-1</span>, batch_size])


<span class="hljs-meta">@jax.jit</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">update_step_jit</span>(<span class="hljs-params">i, args</span>):</span>
    params, opt_state, data, targets, batch_no = args
    _data = data[batch_no % num_batch]
    _targets = targets[batch_no % num_batch]
    _, grads = jax.value_and_grad(cost)(params, _data, _targets)
    updates, opt_state = opt.update(grads, opt_state)
    params = optax.apply_updates(params, updates)
    <span class="hljs-keyword">return</span> (params, opt_state, data, targets, batch_no + <span class="hljs-number">1</span>)


<span class="hljs-meta">@jax.jit</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">optimization_jit</span>(<span class="hljs-params">params, data, targets</span>):</span>
    opt_state = opt.init(params)
    args = (params, opt_state, data, targets, <span class="hljs-number">0</span>)
    (params, opt_state, _, _, _) = jax.lax.fori_loop(<span class="hljs-number">0</span>, <span class="hljs-number">200</span>, update_step_jit, args)
    <span class="hljs-keyword">return</span> params


params = optimization_jit(params, X_batched, y_batched)
var_train_acc = acc(params, X_train, y_train)
var_test_acc = acc(params, X_test, y_test)

print(<span class="hljs-string">"Training accuracy: "</span>, var_train_acc)
print(<span class="hljs-string">"Testing accuracy: "</span>, var_test_acc)

<span class="hljs-comment"># Training accuracy:  0.7484472049689441</span>
<span class="hljs-comment"># Testing accuracy:  0.6944444444444444</span>
</code></pre>
<p>The observable construction heuristic:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">local_pauli_group</span>(<span class="hljs-params">qubits: int, locality: int</span>):</span>
    <span class="hljs-keyword">assert</span> locality &lt;= qubits, <span class="hljs-string">f"Locality must not exceed the number of qubits."</span>
    <span class="hljs-keyword">return</span> list(generate_paulis(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-string">""</span>, qubits, locality))

<span class="hljs-comment"># This is a recursive generator function that constructs Pauli strings.</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_paulis</span>(<span class="hljs-params">identities: int, paulis: int, output: str, qubits: int, locality: int</span>):</span>
    <span class="hljs-comment"># Base case: if the output string's length matches the number of qubits, yield it.</span>
    <span class="hljs-keyword">if</span> len(output) == qubits:
        <span class="hljs-keyword">yield</span> output
    <span class="hljs-keyword">else</span>:
        <span class="hljs-comment"># Recursive case: add an "I" (identity) to the output string.</span>
        <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> generate_paulis(identities + <span class="hljs-number">1</span>, paulis, output + <span class="hljs-string">"I"</span>, qubits, locality)

        <span class="hljs-comment"># If the number of Pauli operators used is less than the locality, add "X", "Y", or "Z"</span>
        <span class="hljs-comment"># systematically builds all possible Pauli strings that conform to the specified locality.</span>
        <span class="hljs-keyword">if</span> paulis &lt; locality:
            <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> generate_paulis(identities, paulis + <span class="hljs-number">1</span>, output + <span class="hljs-string">"X"</span>, qubits, locality)
            <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> generate_paulis(identities, paulis + <span class="hljs-number">1</span>, output + <span class="hljs-string">"Y"</span>, qubits, locality)
            <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> generate_paulis(identities, paulis + <span class="hljs-number">1</span>, output + <span class="hljs-string">"Z"</span>, qubits, locality)
</code></pre>
<p>For each image sample, we measure the output of the quantum circuit using the \(k\)-local observables sequence, and perform logistic regression on these outputs:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Initialize lists to store training and testing accuracies for different localities.</span>
train_accuracies_O = []
test_accuracies_O = []

<span class="hljs-keyword">for</span> locality <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">4</span>):
    print(str(locality) + <span class="hljs-string">"-local: "</span>)

    <span class="hljs-comment"># Define a quantum device with 8 qubits using the default simulator.</span>
    dev = qml.device(<span class="hljs-string">"default.qubit"</span>, wires=<span class="hljs-number">8</span>)

    <span class="hljs-comment"># Define a quantum node (qnode) with the quantum circuit that will be executed on the device.</span>
<span class="hljs-meta">    @qml.qnode(dev)</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">circuit</span>(<span class="hljs-params">features</span>):</span>
        <span class="hljs-comment"># Generate all possible Pauli strings for the given locality.</span>
        measurements = local_pauli_group(<span class="hljs-number">8</span>, locality)

        <span class="hljs-comment"># Apply the feature map to encode classical data into quantum states.</span>
        feature_map(features)

        <span class="hljs-comment"># Measure the expectation values of the generated Pauli operators.</span>
        <span class="hljs-keyword">return</span> [qml.expval(qml.pauli.string_to_pauli_word(measurement)) <span class="hljs-keyword">for</span> measurement <span class="hljs-keyword">in</span> measurements]

    <span class="hljs-comment"># Vectorize the quantum circuit function to apply it to multiple data points in parallel.</span>
    vcircuit = jax.vmap(circuit)

    <span class="hljs-comment"># Transform the training and testing datasets by applying the quantum circuit.</span>
    new_X_train = np.asarray(vcircuit(jnp.array(X_train))).T
    new_X_test = np.asarray(vcircuit(jnp.array(X_test))).T

    <span class="hljs-comment"># Train a Multilayer Perceptron (MLP) classifier on the transformed training data.</span>
    clf = MLPClassifier(early_stopping=<span class="hljs-literal">True</span>).fit(new_X_train, y_train)

    <span class="hljs-comment"># Print the log loss for the training data.</span>
    print(<span class="hljs-string">"Training loss: "</span>, log_loss(y_train, clf.predict_proba(new_X_train)))

    <span class="hljs-comment"># Print the log loss for the testing data.</span>
    print(<span class="hljs-string">"Testing loss: "</span>, log_loss(y_test, clf.predict_proba(new_X_test)))

    <span class="hljs-comment"># Calculate and store the training accuracy.</span>
    acc = clf.score(new_X_train, y_train)
    train_accuracies_O.append(acc)
    print(<span class="hljs-string">"Training accuracy: "</span>, acc)

    <span class="hljs-comment"># Calculate and store the testing accuracy.</span>
    acc = clf.score(new_X_test, y_test)
    test_accuracies_O.append(acc)
    print(<span class="hljs-string">"Testing accuracy: "</span>, acc)
    print()

locality = (<span class="hljs-string">"1-local"</span>, <span class="hljs-string">"2-local"</span>, <span class="hljs-string">"3-local"</span>)
train_accuracies_O = [round(value, <span class="hljs-number">2</span>) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> train_accuracies_O]
test_accuracies_O = [round(value, <span class="hljs-number">2</span>) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> test_accuracies_O]
x = np.arange(<span class="hljs-number">3</span>)
width = <span class="hljs-number">0.25</span>

<span class="hljs-comment"># Create a bar plot to visualize the training and testing accuracies.</span>
fig, ax = plt.subplots(layout=<span class="hljs-string">"constrained"</span>)
<span class="hljs-comment"># Training accuracy bars:</span>
rects = ax.bar(x, train_accuracies_O, width, label=<span class="hljs-string">"Training"</span>, color=<span class="hljs-string">"#FF87EB"</span>)
<span class="hljs-comment"># Testing accuracy bars:</span>
rects = ax.bar(x + width, test_accuracies_O, width, label=<span class="hljs-string">"Testing"</span>, color=<span class="hljs-string">"#70CEFF"</span>)
ax.bar_label(rects, padding=<span class="hljs-number">3</span>)
ax.set_xlabel(<span class="hljs-string">"Locality"</span>)
ax.set_ylabel(<span class="hljs-string">"Accuracy"</span>)
ax.set_title(<span class="hljs-string">"Accuracy of different localities"</span>)
ax.set_xticks(x + width / <span class="hljs-number">2</span>, locality)
ax.legend(loc=<span class="hljs-string">"upper left"</span>)
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738524022642/fd73254e-3624-40bc-886e-c8229ded11c2.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-python"><span class="hljs-number">1</span>-local:
Training loss:  <span class="hljs-number">0.4592314401681531</span>
Testing loss:  <span class="hljs-number">0.5045886276497531</span>
Training accuracy:  <span class="hljs-number">0.8074534161490683</span>
Testing accuracy:  <span class="hljs-number">0.7222222222222222</span>

<span class="hljs-number">2</span>-local:
Training loss:  <span class="hljs-number">0.43242776810519556</span>
Testing loss:  <span class="hljs-number">0.5718358099121</span>
Training accuracy:  <span class="hljs-number">0.860248447204969</span>
Testing accuracy:  <span class="hljs-number">0.7222222222222222</span>

<span class="hljs-number">3</span>-local:
Training loss:  <span class="hljs-number">0.42526261814808347</span>
Testing loss:  <span class="hljs-number">0.574942133390183</span>
Training accuracy:  <span class="hljs-number">0.9316770186335404</span>
Testing accuracy:  <span class="hljs-number">0.75</span>
</code></pre>
<p>The ansatz expansion approach:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">deriv_params</span>(<span class="hljs-params">thetas: int, order: int</span>):</span>
    <span class="hljs-comment"># This function generates parameter shift values for calculating derivatives</span>
    <span class="hljs-comment"># of a quantum circuit.</span>
    <span class="hljs-comment"># 'thetas' is the number of parameters in the circuit.</span>
    <span class="hljs-comment"># 'order' determines the order of the derivative to calculate (1st order, 2nd order, etc.).</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_shifts</span>(<span class="hljs-params">thetas: int, order: int</span>):</span>
        <span class="hljs-comment"># Generate all possible combinations of parameters to shift for the given order.</span>
        shift_pos = list(combinations(np.arange(thetas), order))

        <span class="hljs-comment"># Initialize a 3D array to hold the shift values.</span>
        <span class="hljs-comment"># Shape: (number of combinations, 2^order, thetas)</span>
        params = np.zeros((len(shift_pos), <span class="hljs-number">2</span> ** order, thetas))

        <span class="hljs-comment"># Iterate over each combination of parameter shifts.</span>
        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(shift_pos)):
            <span class="hljs-comment"># Iterate over each possible binary shift pattern for the given order.</span>
            <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(<span class="hljs-number">2</span> ** order):
                <span class="hljs-comment"># Convert the index j to a binary string of length 'order'.</span>
                <span class="hljs-keyword">for</span> k, l <span class="hljs-keyword">in</span> enumerate(<span class="hljs-string">f"<span class="hljs-subst">{j:<span class="hljs-number">0</span>{order}</span>b}"</span>):
                    <span class="hljs-comment"># For each bit in the binary string:</span>
                    <span class="hljs-keyword">if</span> int(l) &gt; <span class="hljs-number">0</span>:
                        <span class="hljs-comment"># If the bit is 1, increment the corresponding parameter.</span>
                        params[i][j][shift_pos[i][k]] += <span class="hljs-number">1</span>
                    <span class="hljs-keyword">else</span>:
                        <span class="hljs-comment"># If the bit is 0, decrement the corresponding parameter.</span>
                        params[i][j][shift_pos[i][k]] -= <span class="hljs-number">1</span>

        <span class="hljs-comment"># Reshape the parameters array to collapse the first two dimensions.</span>
        params = np.reshape(params, (<span class="hljs-number">-1</span>, thetas))
        <span class="hljs-keyword">return</span> params

    <span class="hljs-comment"># Start with a list containing a zero-shift array for all parameters.</span>
    param_list = [np.zeros((<span class="hljs-number">1</span>, thetas))]

    <span class="hljs-comment"># Append the generated shift values for each order from 1 to the given order.</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, order + <span class="hljs-number">1</span>):
        param_list.append(generate_shifts(thetas, i))

    <span class="hljs-comment"># Concatenate all the shift arrays along the first axis to create the final parameter array.</span>
    params = np.concatenate(param_list, axis=<span class="hljs-number">0</span>)

    <span class="hljs-comment"># Scale the shift values by π/2.</span>
    params *= np.pi / <span class="hljs-number">2</span>

    <span class="hljs-keyword">return</span> params

n_wires = <span class="hljs-number">8</span>
dev = qml.device(<span class="hljs-string">"default.qubit"</span>, wires=n_wires)

<span class="hljs-meta">@jax.jit</span>
<span class="hljs-meta">@qml.qnode(dev, interface="jax")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">circuit</span>(<span class="hljs-params">features, params, n_wires=<span class="hljs-number">8</span></span>):</span>
    feature_map(features)
    ansatz(params)
    <span class="hljs-keyword">return</span> qml.expval(qml.PauliZ(<span class="hljs-number">0</span>))
</code></pre>
<p>For each image sample, measure the outputs of each parameterised circuit for each feature, and feed the outputs into a multilayer perceptron:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Initialize lists to store training and testing accuracies for different derivative orders.</span>
train_accuracies_AE = []
test_accuracies_AE = []

<span class="hljs-comment"># Loop through different derivative orders (1st order, 2nd order, 3rd order).</span>
<span class="hljs-keyword">for</span> order <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">4</span>):
    print(<span class="hljs-string">"Order number: "</span> + str(order))

    <span class="hljs-comment"># Generate the parameter shifts required for the given derivative order.</span>
    to_measure = deriv_params(<span class="hljs-number">16</span>, order)

    <span class="hljs-comment"># Transform the training dataset by applying the quantum circuit with the</span>
    <span class="hljs-comment"># generated parameter shifts.</span>
    new_X_train = []
    <span class="hljs-keyword">for</span> thing <span class="hljs-keyword">in</span> X_train:
        result = circuit(thing, to_measure.T)
        new_X_train.append(result)

    <span class="hljs-comment"># Transform the testing dataset similarly.</span>
    new_X_test = []
    <span class="hljs-keyword">for</span> thing <span class="hljs-keyword">in</span> X_test:
        result = circuit(thing, to_measure.T)
        new_X_test.append(result)

    <span class="hljs-comment"># Train a Multilayer Perceptron (MLP) classifier on the transformed training data.</span>
    clf = MLPClassifier(early_stopping=<span class="hljs-literal">True</span>).fit(new_X_train, y_train)

    <span class="hljs-comment"># Print the log loss for the training data.</span>
    print(<span class="hljs-string">"Training loss: "</span>, log_loss(y_train, clf.predict_proba(new_X_train)))

    <span class="hljs-comment"># Print the log loss for the testing data.</span>
    print(<span class="hljs-string">"Testing loss: "</span>, log_loss(y_test, clf.predict_proba(new_X_test)))

    <span class="hljs-comment"># Calculate and store the training accuracy.</span>
    acc = clf.score(new_X_train, y_train)
    train_accuracies_AE.append(acc)
    print(<span class="hljs-string">"Training accuracy: "</span>, acc)

    <span class="hljs-comment"># Calculate and store the testing accuracy.</span>
    acc = clf.score(new_X_test, y_test)
    test_accuracies_AE.append(acc)
    print(<span class="hljs-string">"Testing accuracy: "</span>, acc)
    print()

locality = (<span class="hljs-string">"1-order"</span>, <span class="hljs-string">"2-order"</span>, <span class="hljs-string">"3-order"</span>)
train_accuracies_AE = [round(value, <span class="hljs-number">2</span>) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> train_accuracies_AE]
test_accuracies_AE = [round(value, <span class="hljs-number">2</span>) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> test_accuracies_AE]
x = np.arange(<span class="hljs-number">3</span>)
width = <span class="hljs-number">0.25</span>
fig, ax = plt.subplots(layout=<span class="hljs-string">"constrained"</span>)
rects = ax.bar(x, train_accuracies_AE, width, label=<span class="hljs-string">"Training"</span>, color=<span class="hljs-string">"#FF87EB"</span>)
ax.bar_label(rects, padding=<span class="hljs-number">3</span>)
rects = ax.bar(x + width, test_accuracies_AE, width, label=<span class="hljs-string">"Testing"</span>, color=<span class="hljs-string">"#70CEFF"</span>)
ax.bar_label(rects, padding=<span class="hljs-number">3</span>)
ax.set_xlabel(<span class="hljs-string">"Order"</span>)
ax.set_ylabel(<span class="hljs-string">"Accuracy"</span>)
ax.set_title(<span class="hljs-string">"Accuracy of different derivative orders"</span>)
ax.set_xticks(x + width / <span class="hljs-number">2</span>, locality)
ax.legend(loc=<span class="hljs-string">"upper left"</span>)
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738524176263/f17979db-ac09-4eb7-97a2-af007b24ceea.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-python">Order number: <span class="hljs-number">1</span>
Training loss:  <span class="hljs-number">0.6917395840118673</span>
Testing loss:  <span class="hljs-number">0.6898366117810784</span>
Training accuracy:  <span class="hljs-number">0.5093167701863354</span>
Testing accuracy:  <span class="hljs-number">0.5555555555555556</span>

Order number: <span class="hljs-number">2</span>
Training loss:  <span class="hljs-number">0.6326009058014004</span>
Testing loss:  <span class="hljs-number">0.6157803899808801</span>
Training accuracy:  <span class="hljs-number">0.7018633540372671</span>
Testing accuracy:  <span class="hljs-number">0.6666666666666666</span>

Order number: <span class="hljs-number">3</span>
Training loss:  <span class="hljs-number">0.5815839249054562</span>
Testing loss:  <span class="hljs-number">0.6016181640099203</span>
Training accuracy:  <span class="hljs-number">0.7142857142857143</span>
Testing accuracy:  <span class="hljs-number">0.6944444444444444</span>
</code></pre>
<p>Regarding the hybrid strategy:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Initialize matrices to store training and testing accuracies for different</span>
<span class="hljs-comment"># combinations of locality and order.</span>
train_accuracies = np.zeros([<span class="hljs-number">4</span>, <span class="hljs-number">4</span>])
test_accuracies = np.zeros([<span class="hljs-number">4</span>, <span class="hljs-number">4</span>])

<span class="hljs-comment"># Loop through different derivative orders (1st to 3rd) and localities (1-local to 3-local).</span>
<span class="hljs-keyword">for</span> order <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">4</span>):
    <span class="hljs-keyword">for</span> locality <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">4</span>):
        <span class="hljs-comment"># Skip invalid combinations where locality + order exceeds 3 or equals 0.</span>
        <span class="hljs-keyword">if</span> locality + order &gt; <span class="hljs-number">3</span> <span class="hljs-keyword">or</span> locality + order == <span class="hljs-number">0</span>:
            <span class="hljs-keyword">continue</span>
        print(<span class="hljs-string">"Locality: "</span> + str(locality) + <span class="hljs-string">" Order: "</span> + str(order))

        <span class="hljs-comment"># Define a quantum device with 8 qubits using the default simulator.</span>
        dev = qml.device(<span class="hljs-string">"default.qubit"</span>, wires=<span class="hljs-number">8</span>)

        <span class="hljs-comment"># Generate the parameter shifts required for the given derivative order and transpose them.</span>
        params = deriv_params(<span class="hljs-number">16</span>, order).T

        <span class="hljs-comment"># Define a quantum node (qnode) with the quantum circuit that will be executed on the device.</span>
<span class="hljs-meta">        @qml.qnode(dev)</span>
        <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">circuit</span>(<span class="hljs-params">features, params</span>):</span>
            <span class="hljs-comment"># Generate the Pauli group for the given locality.</span>
            measurements = local_pauli_group(<span class="hljs-number">8</span>, locality)
            feature_map(features)
            ansatz(params)
            <span class="hljs-comment"># Measure the expectation values of the generated Pauli operators.</span>
            <span class="hljs-keyword">return</span> [qml.expval(qml.pauli.string_to_pauli_word(measurement)) <span class="hljs-keyword">for</span> measurement <span class="hljs-keyword">in</span> measurements]

        <span class="hljs-comment"># Vectorize the quantum circuit function to apply it to multiple data points in parallel.</span>
        vcircuit = jax.vmap(circuit)

        <span class="hljs-comment"># Transform the training dataset by applying the quantum circuit with the</span>
        <span class="hljs-comment"># generated parameter shifts.</span>
        new_X_train = np.asarray(
            vcircuit(jnp.array(X_train), jnp.array([params <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(X_train))]))
        )
        <span class="hljs-comment"># Reorder the axes and reshape the transformed data for input into the classifier.</span>
        new_X_train = np.moveaxis(new_X_train, <span class="hljs-number">0</span>, <span class="hljs-number">-1</span>).reshape(
            <span class="hljs-number">-1</span>, len(local_pauli_group(<span class="hljs-number">8</span>, locality)) * len(deriv_params(<span class="hljs-number">16</span>, order))
        )

        <span class="hljs-comment"># Transform the testing dataset similarly.</span>
        new_X_test = np.asarray(
            vcircuit(jnp.array(X_test), jnp.array([params <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(X_test))]))
        )
        <span class="hljs-comment"># Reorder the axes and reshape the transformed data for input into the classifier.</span>
        new_X_test = np.moveaxis(new_X_test, <span class="hljs-number">0</span>, <span class="hljs-number">-1</span>).reshape(
            <span class="hljs-number">-1</span>, len(local_pauli_group(<span class="hljs-number">8</span>, locality)) * len(deriv_params(<span class="hljs-number">16</span>, order))
        )

        <span class="hljs-comment"># Train a Multilayer Perceptron (MLP) classifier on the transformed training data.</span>
        clf = MLPClassifier(early_stopping=<span class="hljs-literal">True</span>).fit(new_X_train, y_train)

        <span class="hljs-comment"># Calculate and store the training and testing accuracies.</span>
        train_accuracies[order][locality] = clf.score(new_X_train, y_train)
        test_accuracies[order][locality] = clf.score(new_X_test, y_test)

        print(<span class="hljs-string">"Training loss: "</span>, log_loss(y_train, clf.predict_proba(new_X_train)))
        print(<span class="hljs-string">"Testing loss: "</span>, log_loss(y_test, clf.predict_proba(new_X_test)))
        acc = clf.score(new_X_train, y_train)
        train_accuracies[locality][order] = acc
        print(<span class="hljs-string">"Training accuracy: "</span>, acc)
        acc = clf.score(new_X_test, y_test)
        test_accuracies[locality][order] = acc
        print(<span class="hljs-string">"Testing accuracy: "</span>, acc)
        print()

<span class="hljs-comment"># Locality: 1 Order: 1</span>
<span class="hljs-comment"># Training loss:  0.29433122335335293</span>
<span class="hljs-comment"># Testing loss:  0.48158001426002656</span>
<span class="hljs-comment"># Training accuracy:  0.8944099378881988</span>
<span class="hljs-comment"># Testing accuracy:  0.7777777777777778</span>

<span class="hljs-comment"># Locality: 2 Order: 1</span>
<span class="hljs-comment"># Training loss:  0.32784353109905134</span>
<span class="hljs-comment"># Testing loss:  0.571967578071357</span>
<span class="hljs-comment"># Training accuracy:  0.8664596273291926</span>
<span class="hljs-comment"># Testing accuracy:  0.75</span>

<span class="hljs-comment"># Locality: 1 Order: 2</span>
<span class="hljs-comment"># Training loss:  0.20260000718215349</span>
<span class="hljs-comment"># Testing loss:  0.5550612230165831</span>
<span class="hljs-comment"># Training accuracy:  0.9409937888198758</span>
<span class="hljs-comment"># Testing accuracy:  0.75</span>
</code></pre>
<p>Plotting all the post-variational strategies together:</p>
<pre><code class="lang-python"><span class="hljs-keyword">for</span> locality <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">4</span>):
    train_accuracies[locality][<span class="hljs-number">0</span>] = train_accuracies_O[locality - <span class="hljs-number">1</span>]
    test_accuracies[locality][<span class="hljs-number">0</span>] = test_accuracies_O[locality - <span class="hljs-number">1</span>]
<span class="hljs-keyword">for</span> order <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">4</span>):
    train_accuracies[<span class="hljs-number">0</span>][order] = train_accuracies_AE[order - <span class="hljs-number">1</span>]
    test_accuracies[<span class="hljs-number">0</span>][order] = test_accuracies_AE[order - <span class="hljs-number">1</span>]

train_accuracies[<span class="hljs-number">3</span>][<span class="hljs-number">3</span>] = var_train_acc
test_accuracies[<span class="hljs-number">3</span>][<span class="hljs-number">3</span>] = var_test_acc

cvals = [<span class="hljs-number">0</span>, <span class="hljs-number">0.5</span>, <span class="hljs-number">0.85</span>, <span class="hljs-number">0.95</span>, <span class="hljs-number">1</span>]
colors = [<span class="hljs-string">"black"</span>, <span class="hljs-string">"#C756B2"</span>, <span class="hljs-string">"#FF87EB"</span>, <span class="hljs-string">"#ACE3FF"</span>, <span class="hljs-string">"#D5F0FD"</span>]
norm = plt.Normalize(min(cvals), max(cvals))
tuples = list(zip(map(norm, cvals), colors))
cmap = matplotlib.colors.LinearSegmentedColormap.from_list(<span class="hljs-string">""</span>, tuples)


locality = [<span class="hljs-string">"top qubit\n Pauli-Z"</span>, <span class="hljs-string">"1-local"</span>, <span class="hljs-string">"2-local"</span>, <span class="hljs-string">"3-local"</span>]
order = [<span class="hljs-string">"0th Order"</span>, <span class="hljs-string">"1st Order"</span>, <span class="hljs-string">"2nd Order"</span>, <span class="hljs-string">"3rd Order"</span>]

fig, axes = plt.subplots(nrows=<span class="hljs-number">1</span>, ncols=<span class="hljs-number">2</span>, layout=<span class="hljs-string">"constrained"</span>)
im = axes[<span class="hljs-number">0</span>].imshow(train_accuracies, cmap=cmap, origin=<span class="hljs-string">"lower"</span>)

axes[<span class="hljs-number">0</span>].set_yticks(np.arange(len(locality)), labels=locality)
axes[<span class="hljs-number">0</span>].set_xticks(np.arange(len(order)), labels=order)
plt.setp(axes[<span class="hljs-number">0</span>].get_xticklabels(), rotation=<span class="hljs-number">45</span>, ha=<span class="hljs-string">"right"</span>, rotation_mode=<span class="hljs-string">"anchor"</span>)
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(locality)):
    <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(len(order)):
        text = axes[<span class="hljs-number">0</span>].text(
            j, i, np.round(train_accuracies[i, j], <span class="hljs-number">2</span>), ha=<span class="hljs-string">"center"</span>, va=<span class="hljs-string">"center"</span>, color=<span class="hljs-string">"black"</span>
        )
axes[<span class="hljs-number">0</span>].text(<span class="hljs-number">3</span>, <span class="hljs-number">3</span>, <span class="hljs-string">'\n\n(VQA)'</span>, ha=<span class="hljs-string">"center"</span>, va=<span class="hljs-string">"center"</span>, color=<span class="hljs-string">"black"</span>)

axes[<span class="hljs-number">0</span>].set_title(<span class="hljs-string">"Training Accuracies"</span>)

locality = [<span class="hljs-string">"top qubit\n Pauli-Z"</span>, <span class="hljs-string">"1-local"</span>, <span class="hljs-string">"2-local"</span>, <span class="hljs-string">"3-local"</span>]
order = [<span class="hljs-string">"0th Order"</span>, <span class="hljs-string">"1st Order"</span>, <span class="hljs-string">"2nd Order"</span>, <span class="hljs-string">"3rd Order"</span>]

im = axes[<span class="hljs-number">1</span>].imshow(test_accuracies, cmap=cmap, origin=<span class="hljs-string">"lower"</span>)

axes[<span class="hljs-number">1</span>].set_yticks(np.arange(len(locality)), labels=locality)
axes[<span class="hljs-number">1</span>].set_xticks(np.arange(len(order)), labels=order)
plt.setp(axes[<span class="hljs-number">1</span>].get_xticklabels(), rotation=<span class="hljs-number">45</span>, ha=<span class="hljs-string">"right"</span>, rotation_mode=<span class="hljs-string">"anchor"</span>)
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(locality)):
    <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(len(order)):
        text = axes[<span class="hljs-number">1</span>].text(
            j, i, np.round(test_accuracies[i, j], <span class="hljs-number">2</span>), ha=<span class="hljs-string">"center"</span>, va=<span class="hljs-string">"center"</span>, color=<span class="hljs-string">"black"</span>
        )
axes[<span class="hljs-number">1</span>].text(<span class="hljs-number">3</span>, <span class="hljs-number">3</span>, <span class="hljs-string">'\n\n(VQA)'</span>, ha=<span class="hljs-string">"center"</span>, va=<span class="hljs-string">"center"</span>, color=<span class="hljs-string">"black"</span>)

axes[<span class="hljs-number">1</span>].set_title(<span class="hljs-string">"Test Accuracies"</span>)
fig.tight_layout()
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738524282437/d2386546-e2f4-4b3a-9298-bb603623a186.png" alt class="image--center mx-auto" /></p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here.</strong></a></p>
<h2 id="heading-sources">Sources:</h2>
<ol>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2307.10560#page=13&amp;zoom=100,249,193">https://arxiv.org/pdf/2307.10560</a></p>
</li>
<li><p><a target="_blank" href="https://www.nature.com/articles/s41567-020-0932-7">https://www.nature.com/articles/s41567-020-0932-7</a></p>
</li>
<li><p><a target="_blank" href="https://www.nature.com/articles/s41467-018-07090-4">https://www.nature.com/articles/s41467-018-07090-4</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2009.11001">https://arxiv.org/pdf/2009.11001</a></p>
</li>
<li><p><a target="_blank" href="https://journals.aps.org/pra/abstract/10.1103/PhysRevA.104.042418">https://journals.aps.org/pra/abstract/10.1103/PhysRevA.104.042418</a></p>
</li>
<li><p><a target="_blank" href="https://journals.aps.org/pra/abstract/10.1103/PhysRevA.105.052445">https://journals.aps.org/pra/abstract/10.1103/PhysRevA.105.052445</a></p>
</li>
<li><p><a target="_blank" href="https://iopscience.iop.org/article/10.1088/1367-2630/ac325f">https://iopscience.iop.org/article/10.1088/1367-2630/ac325f</a></p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Learning Parametric Partial Differential Equations using Fourier Neural Operator]]></title><description><![CDATA[A variety of problems in applied science revolve about solving systems of parametrized partial differential equations. Such systems often exhibit complex and non-linear behaviour and mesh-based methods might therefore require an incredibly fine discr...]]></description><link>https://amm.zanotp.com/fno</link><guid isPermaLink="true">https://amm.zanotp.com/fno</guid><category><![CDATA[neural operators]]></category><category><![CDATA[parametric-pde]]></category><category><![CDATA[pde]]></category><category><![CDATA[scientific-computing]]></category><category><![CDATA[neural networks]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sun, 29 Dec 2024 01:17:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/GzvP-5L2M4A/upload/4c8bc0897ff0a19e6ca491fc998cb110.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A variety of problems in applied science revolve about solving systems of parametrized partial differential equations. Such systems often exhibit complex and non-linear behaviour and mesh-based methods might therefore require an incredibly fine discretization to precisely capture the model. However, traditional numerical solvers come with a trade-off since coarse grids are fast but less accurate, while fine grids are accurate but slow, which means that such problems might pose a non-trivial issue to traditional solvers.</p>
<p>For this reason, researchers developed a whole new family of methods which can directly learn the trajectory of the family of equations from the data, resulting in faster approximation of the solution of such PDEs.</p>
<p>This post discusses a specific data-driven approach to approximate parametric PDEs using a specific neural architecture called Fourier Neural Operator (FNO) and concludes presenting a quantum-enhanced version of the FNO.</p>
<h2 id="heading-learning-neural-operators">Learning Neural Operators</h2>
<p>To properly understand Fourier Neural Operator, a brief introduction to neural operators is necessary. The idea underlying neural operators is to learn mesh-free, infinite dimensional operators with neural networks which is able to transfer solution between different meshes, doesn’t need to be trained for each parameter and doesn’t require any aprioristic knowledge of the PDE.</p>
<p>Let \(D\) be a bounded domain in \(R^d\) and let \(U(D, R^{d_u})\) and \(\Lambda(D, R^{d_\lambda})\) be two separable Banach spaces of function where \(R^{d_u}\) and \(R^{d_\lambda}\) are the codomains.</p>
<p>Assume the PDE one wish to solve is:</p>
<p>$$\mathcal L(u, x, \lambda) = 0$$</p><p>where:</p>
<ul>
<li><p>\(\mathcal L\) is the differential operator</p>
</li>
<li><p>\(u \in U\) is the solution function</p>
</li>
<li><p>\(x\) is is the spatial-temporal variable</p>
</li>
<li><p>\(\lambda \in \Lambda\) is the function parametrizing the PDE</p>
</li>
</ul>
<p>Moreover let \(G^\dagger: \Lambda \rightarrow U\) be a map which arise as the solution operators of parametric PDEs. Also let \(\{\lambda_j, u_j\}^n_{j=1}\) be observations (potentially noisy) s.t.</p>
<p>$$G^\dagger(\lambda_j) = u_j$$</p><p>The goal of the neural operator is to approximate \(G^\dagger\) with</p>
<p>$$G_\theta: \Lambda \times \Theta \rightarrow U$$</p><p>where \(\Theta\) is a finite-dimensional space.</p>
<p>Similarly to a finite-dimensional setting, one can now define a cost function</p>
<p>$$C : U \times U\rightarrow R$$</p><p>and seek a minimizer of the problem s.a.:</p>
<p>$$\text{min}{\theta \in \Theta}E\lambda\left( C\left(G_\theta\left(\lambda\right), G^\dagger\left(\lambda\right)\right)\right)$$</p><p>Of course, learning a neural operators is therefore much different than learning the solution to a PDE with a fixed parameter \(\lambda\). The large majority of methods to approximate PDEs (including traditional methods and machine learning approaches) would therefore reveal impractical if the solution of the PDE is required for different instances of the parameter \(\lambda\), that’s where neural operator’s approach offers a computational advantage.</p>
<p>Moreover, to work numerically with the data \(\{\lambda_j, u_j\}^n_{j=1}\), since both \(\lambda_j\) and \(u_j\) are in general functions, we assume to have access to point-wise evaluations of the two functions. Therefore, let \(D_j = \{x_1, \dots, x_n\}\) be a n-point discretization of the domain \(D\) and assume one have access to point-wise evaluations of the functions \(\lambda_j\) and \(u_j\) over \(D_j\).</p>
<h3 id="heading-defining-the-neural-operator">Defining the Neural Operator</h3>
<p>As proposed in [Li], the neural operator is an iterative architecture which updates the function \(v_j: D \rightarrow R^{d_v}\). The idea is to:</p>
<ul>
<li>represent the input \(\lambda \in \Lambda\) in a higher dimensional representation with the local transformation \(P\):</li>
</ul>
<p>$$v_0(x) = P(\lambda(x))$$</p><ul>
<li>then the function \(v_j\) is updated as follows:</li>
</ul>
<p>$$v_{t+1}(x):=\sigma\left(W v_t(x)+\left(\mathcal{K}(\lambda , \theta) v_t\right)(x)\right), \quad \forall x \in D$$</p><p>where:</p>
<ul>
<li><p>\(\sigma\) is a non-linear activation function</p>
</li>
<li><p>\(W: R^{d_v} \rightarrow R^{d_v}\) is the bias term applied on the spatial domain</p>
</li>
<li><p>\(K: \Lambda \times \Theta_k \rightarrow \mathcal L\left(U\left(D, R^{d_v} \right), U\left(D, R^{d_v} \right)\right)\) is the kernel integral transformation and is parametrized by \(\theta \in \Theta_k\)</p>
</li>
</ul>
<p>The kernel integral transformation moreover is the following:</p>
<p>$$\left(\mathcal{K}(\lambda , \theta) v_t\right)(x):=\int_D \kappa(x, y, \lambda(x), \lambda(y) , \theta) v_t(y) \mathrm{d} y, \quad \forall x \in D$$</p><p>where \(k\) is a neural network parametrized by \(\theta \in \Theta_k\) and represent the kernel function. It’s worth noticing that while the kernel function is linear the operator can learn non-linear operators thanks to the non-linear activation functions, analogously to standard neural networks.</p>
<h3 id="heading-defining-the-fourier-neural-operator">Defining the Fourier Neural Operator</h3>
<p>Let \(\mathcal F\) be the Fourier transform of a function \(f: D \rightarrow R^{d_v}\) and let \(\mathcal F^{-1}\) be the inverse of the Fourier transform. By imposing (i.e. making \(k\) a convolutional operator):</p>
<p>$$k(x,y,\lambda(x), \lambda(y), \theta) = k(x-y, \theta)$$</p><p>the kernel integral transformation becomes:</p>
<p>$$\left(\mathcal{K}(\lambda, \theta) v_t\right)(x)=\mathcal{F}^{-1}\left(\mathcal{F}\left(\kappa_\theta\right) \cdot \mathcal{F}\left(v_t\right)\right)(x), \quad \forall x \in D$$</p><p>and if the parametrization \(k_\theta\) happens directly in Fourier space:</p>
<p>$$\left(\mathcal{K}(\lambda, \theta) v_t\right)(x)=\mathcal{F}^{-1}\left(R_\theta \cdot \mathcal{F}\left(v_t\right)\right)(x), \quad \forall x \in D$$</p><p>as shown in the following picture (taken from [3]):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735422870702/64fb36cd-1ae9-4eb3-acbd-14db2cd0066b.png" alt class="image--center mx-auto" /></p>
<p>If we enforce that \(k_\theta\) is periodic, it admits a Fourier series expansion, which can be truncated at a maximum number of modes \(m_{\text{max}}\) and therefore \(R_\theta\) can be parametrized with a (\((m_{\text{max}} \times d_v \times d_v)\)-tensor.</p>
<p>Furthermore, once \(D \) is discretized in \(n\) points, \(v_t \in R^{n\times d_v}\). Moreover, since \(v_t\) convolves with a function with only \(m_{\text{max}}\) modes, we can truncate the the highest modes in order to have \(F(v_t) \in C^{m_{\text max} \times d_v}\).</p>
<p>Therefore, the multiplication for the weight tensor \(R_\theta \in C^{m_{\text{max}} \times d_v \times d_v}\) is:</p>
<p>$$R\cdot(\mathcal F v_t)_{m, l} = \sum_{j=1}^{d_v}R_{m, l, j}(Fv_t)_{m, j} \quad m=1,\dots, m_\text{max}, \quad j = 1, \dots, d_v$$</p><h3 id="heading-invariance-to-discretization">Invariance to discretization</h3>
<p>It’s worth noticing that the Fourier layers are discretization-invariants since they learn from and evaluate functions which are discretized in an arbitrary way, which allows zero-shot super-resolution.</p>
<h2 id="heading-accelerating-the-fourier-neural-operator-quantum-fourier-operator">Accelerating the Fourier Neural Operator: Quantum Fourier Operator</h2>
<p>Since the weight tensor contains \(m_{\text{max}}&lt; n\) modes and since the complexity of inner product is \(O(m_\text{max})\), the most relevant source of computations comes from the Fourier transform and its inverse. Fourier transform complexity is in fact \(O(n^2)\) in general, but since the model deals with truncated series, the complexity is actually \(O(nm_{\text{max}})\). Therefore, substituting the Fourier transform with the fast Fourier transform (FFT), assuming a uniform discretization, can provide a speedup, being the complexity of FFT \(O(n \log n)\).</p>
<p>Another direction is exploit a quantum-enhanced method based on Quantum Fourier Transform to have more efficient Fourier layers.</p>
<h3 id="heading-data-encoding-in-the-unary-basis">Data encoding in the unary basis</h3>
<p>The idea underlying Quantum Fourier Operator (QFO) is to substitute the Fourier layers defined above with a new layer exploiting quantum algorithms. Of course to make this possible the matrix \(P(\lambda(x))\) (which I’ll refer to s \(A\) from now on) has to be encoded in a quantum state to serve as the input of the new Fourier layer. The idea is to encode the data according to amplitude-encoded states, choosing as basis \(\ket {e_i}\) the quantum states with a Hamming state of 1:</p>
<p>$$\ket {e_i} = \ket {0\dots010\dots0}$$</p><p>Therefore given a generic \(R^{n\times m} \) matrix \(M\), it’s quantum encoding is:</p>
<p>$$\ket{M}=\frac 1{|M|}\sum_{i=1}^n \sum_{j=1}^m a_{i,j} \ket{e_i}\ket{e_j }$$</p><p>the circuit to load such state was developed in [7] and assuming an ideal connectivity, the circuit has depth \(O(\log(m) + 2m \log(n))\).</p>
<h3 id="heading-unary-qft">Unary QFT</h3>
<p>Inspired by the butterfly-shaped diagram of the FFT, one can define a unitary matrix which performs the quantum analogue of the FFT on the unitary basis whose matrix is:</p>
<p>$$F_n=\frac 1 {\sqrt n}\left(\begin{array}{ccccc}1 &amp; 1 &amp; 1 &amp; \cdots &amp; 1 \\ 1 &amp; \omega &amp; \omega^2 &amp; \cdots &amp; \omega^{(n-1)} \\ 1 &amp; \omega^2 &amp; \omega^4 &amp; \cdots &amp; \omega^{2(n-1)} \\ \vdots &amp; \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\ 1 &amp; \omega^{n-1} &amp; \omega^{2 n-2} &amp; \cdots &amp; \omega^{(n-1)^2}\end{array}\right)$$</p><p>where \(\omega^k = e^{i\frac{2\pi k}{n}}\).</p>
<p>Such transformation can be implemented using phase gates and RBS gates as shown in the following picture (picture from [1]) and the depth of the resulting circuit is \(O(\log n)\):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735430630060/6768521c-ad76-4000-b12b-58aa074439d1.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-trainable-linear-transform-with-quantum-orthogonal-layers">Trainable linear transform with Quantum Orthogonal Layers</h3>
<p>It’s now necessary to define the quantum analogue of the learnable part of the classical Fourier layer and to perform some matrix multiplication. Quantum Orthogonal Layers (from [8]) are therefore a natural choice, being parametrized and hamming-weight preserving transformations (which is a characteristics necessary to preserve since the Inverse Unitary Fourier Transform only works on the unitary basis). Several circuits in this setting exists, the butterfly circuit (which has the same layout as the one used for the Unary-QFT and that is represented in the following picture) is chosen, having \(O(n\log n) \) parametrized gates, where \(n \) is the dimension of the input vector.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735431323901/ffdea70a-1409-43e7-a8d2-2e791c939a40.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-quantum-fourier-layer">Quantum Fourier Layer</h3>
<p>Based on the above building blocks, one can define 3 quantum circuits to substitute the classical Fourier layer:</p>
<ul>
<li><p>the sequential circuit</p>
</li>
<li><p>the parallel circuit</p>
</li>
<li><p>the composite circuit</p>
</li>
</ul>
<p>The goal of all those circuit is to reproduce the result of the classical Fourier layer, which in quantum formalism is (assuming the input matrix \(A\) was normalized):</p>
<p>$$\ket y = \sum_i \ket{y_i}\ket{e_i} = \sum_i \sum_j y_{i j}\ket{e_i}\ket{e_j}$$</p><p>where:</p>
<p>$$y_{i,j} = IFT\left(\left[ w_{il}m_{il}, m_{ik}\right]\right)_j$$</p><p>with:</p>
<ul>
<li><p>\(w\) being an element of \(W\)</p>
</li>
<li><p>\(m\) being an element of \(A\)</p>
</li>
</ul>
<h4 id="heading-the-sequential-quantum-fourier-layer">The sequential Quantum Fourier Layer</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735432073899/8f9c0eb8-57c5-467d-a3d0-8913a26cfe3e.png" alt class="image--center mx-auto" /></p>
<p>The sequential circuit starts by encoding the input matrix \(A\), resulting in:</p>
<p>$$\ket{\psi_0}= \sum_i \sum_j a_{ij}\ket{e_i}\ket{e_j }$$</p><p>Then to \(\ket{\psi_0}\) the Unary-QFT is applied on the second register:</p>
<p>$$\ket{\psi_1}= \sum_i \ket{e_i}\text{QFT}(\sum_j a_{ij}\ket{e_j }) =\sum_i \ket{e_i}(\sum_j \hat a_{ij}\ket{e_j })$$</p><p>where \(\hat a_{ij}\) is the row-wise Fourier transfor of \(A\).</p>
<p>After that, the trainable linear transform with quantum orthogonal layers made by \(K\) matrix multiplications has to be defined. Using the circuit depicted above (the butterfly circuit), in this sequential approach, the \(K\) parametrized quantum circuit \(P_1, \dots, P_k\) are applied sequentially on the first register.</p>
<p>After this the I-QFT is applied, resulting in the desired state.</p>
<h4 id="heading-the-parallel-quantum-fourier-layer">The parallel Quantum Fourier Layer</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735433269557/f7a6f271-7023-420d-95b2-f6175e8874f9.png" alt class="image--center mx-auto" /></p>
<p>For the sequential QFL, the depth complexity of the learnable part is linear in the number of modes, which might eventually hinder learning, because of the multiplicative noise model for NISQ machines. To reduce the depth complexity and to make the algorithm more noise-resistant, an interesting modification requires to parallelise the butterfly circuits.</p>
<h4 id="heading-the-composite-quantum-fourier-layer">The composite Quantum Fourier Layer</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735433329466/7f82f848-d35a-43ad-bf40-bbebfce5e600.png" alt class="image--center mx-auto" /></p>
<p>However the parallelized quantum circuit discussed in the above section requires \(K\times (d_v + n)\) independent qubits where \(K \) is the number of modes, which might end up being more than the available qubit resources.</p>
<p>However, one can replace the \(K\) parametrized circuits with a single parametrized circuit \(B\), as long as this new subcircuit is hamming-weight preserving and is built as:</p>
<p>$$B = \bigotimes_i B_i$$</p><p>where \(B_i\) correspond to the block diagonal unitary for subspace with hamming weigh \(i\).</p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here.</strong></a></p>
<h2 id="heading-sources">Sources:</h2>
<ol>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2306.15415">https://arxiv.org/pdf/2306.15415</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/abs/2108.08481">https://arxiv.org/pdf/2108.08481</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/abs/2010.08895">https://arxiv.org/pdf/2010.08895</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/1910.03193">https://arxiv.org/pdf/1910.03193</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2005.03180">https://arxiv.org/pdf/2005.03180</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2009.11992">https://arxiv.org/pdf/2009.11992</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2012.04145">https://arxiv.org/pdf/2012.04145</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/2212.07389">https://arxiv.org/pdf/2212.07389</a></p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Solving linear systems of equations using quantum computers]]></title><description><![CDATA[Linear systems of equations lie at the heart of many scientific and engineering problems, from machine learning to optimization and physics simulations. Classical methods like Gaussian elimination or iterative methods are powerful but can be ineffici...]]></description><link>https://amm.zanotp.com/hhl</link><guid isPermaLink="true">https://amm.zanotp.com/hhl</guid><category><![CDATA[hhl]]></category><category><![CDATA[linear-equations]]></category><category><![CDATA[quantum computing]]></category><category><![CDATA[linear algebra ]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Mon, 30 Sep 2024 06:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/FPskb1X15wk/upload/a2049950b3959480bac7c38d415019fa.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Linear systems of equations lie at the heart of many scientific and engineering problems, from machine learning to optimization and physics simulations. Classical methods like Gaussian elimination or iterative methods are powerful but can be inefficient for large, complex systems.</p>
<p>In this blog post, I will explore one of the most famous quantum algorithms (called HHL) that offer a potential speedup in solving linear systems. I will delve into its complexity, underlying assumptions and describe two interesting applications.</p>
<h2 id="heading-quantum-linear-system-problem-versus-linear-system-problem">Quantum linear system problem versus linear system problem</h2>
<p>In order to better understand the limitations of quantum algorithms like HHL, it’s essential to distinguish between a Quantum Linear System Problem (QLSP) and a classical Linear System Problem (LSP).</p>
<p>A typical linear system problem (LSP) is represented as:</p>
<p>$$Ax=b$$</p><p>where:</p>
<ul>
<li><p>\(A\) is a matrix</p>
</li>
<li><p>\(b\) is a known vector</p>
</li>
<li><p>\(x\) is the unknown vector we aim to solve for</p>
</li>
</ul>
<p>On the other hand, a QLSP deals with a quantum state version of the same concept, represented as:</p>
<p>$$A\ket x = \ket b$$</p><p>where:</p>
<ul>
<li><p>\(A\) is still a matrix</p>
</li>
<li><p>\(\ket b\) is a known quantum state</p>
</li>
<li><p>\(\ket x\) is the unknown quantum state we wish to find</p>
</li>
</ul>
<p>Although both problems appear similar, the difference lies in how the information is represented and manipulated. In a classical system, the vector \(b\) is readily available, and solving for \(x\) gives a concrete solution that can be directly used. In contrast, in the quantum setting, \(\ket b\) is a quantum state, and the solution \(\ket x\) is also a quantum state. The main challenge here is that quantum states aren’t directly accessible (any measurement of \(\ket x\) collapses the state and only provides a probabilistic result), which means that extracting useful information from the quantum solution requires multiple measurements or sophisticated post-processing.</p>
<p>Understanding these differences is crucial when assessing the complexity and feasibility of quantum solvers such as HHL, particularly when applied to real-world problems where error correction and measurement limitations play a significant role.</p>
<h2 id="heading-hhl">HHL</h2>
<p>In this section, we introduce the Harrow-Hassidim-Lloyd (HHL) algorithm, one of the most interesting applications of the quantum phase estimation algorithm, which can be used to “solve” sparse linear linear systems, i.e. a system involving a matrix in which most of the elements are zero.</p>
<p>$$HHL: \ket b \rightarrow \ket {A^{-1}b}$$</p><p>In the next section the following assumptions will be true:</p>
<ul>
<li><p>\(A\) is a sparse and Hermitian matrix</p>
</li>
<li><p>the quantum state \(\ket b\) doesn’t have to be implemented from \(b\)</p>
</li>
<li><p>the problem requires to find \(\ket x\) instead of \(x\)</p>
</li>
</ul>
<p>and the next section deal with:</p>
<ul>
<li><p>the Quantum Phase Estimation algorithm</p>
</li>
<li><p>the workflow of HHL</p>
</li>
<li><p>complexity analysis of the HHL algorithm</p>
</li>
<li><p>what happens to the quantum advantage when the above assumptions fail to hold</p>
</li>
<li><p>a brief discussion of a couple of noteworthy applications of HHL</p>
</li>
</ul>
<p>Please also note that many versions of the HHL algorithms have been proposed and this post only describe and deals with its simplest version.</p>
<h3 id="heading-background-quantum-phase-estimation">Background: Quantum Phase Estimation</h3>
<p>One of the most useful quantum subroutines, called Quantum Phase Estimation, aims to estimate the phase \(\phi\) of an eigenvalue \(e^{2i\pi\phi}\) of the corresponding eigenvector \(\ket \psi\)of a unitary operator \(U\).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727926937112/b226c02d-ddeb-41ee-ad7c-0f2b71b7e1c1.png" alt class="image--center mx-auto" /></p>
<p>The QPE algorithm, depicted in the circuit above, shares similarities with Shor’s algorithm because Shor's algorithm can be seen as a specific application of QPE for integer factorization and the goal of QPE is to encode an estimation of the phase \(\phi\) into a binary representation like:</p>
<p>$$\phi = 0.\phi_1\phi_2\dots\phi_{n-1}\phi_n$$</p><p>QPE archive the result by phase encoding the binary representation of \(\phi\) using controlled \(U\) gates in order to get the following state:</p>
<p>$$\left(\bigotimes_{j=1}^n \frac 1{\sqrt 2}(\ket 0 + e^{2i\pi0.\phi_j\dots \phi_n}\ket 1)\right) \otimes \ket\psi$$</p><p>and then applying the inverse of the Quantum Fourier Transform to go from the phase space to the state space and, before measuring, the result is:</p>
<p>$$\left(\bigotimes_{j=1}^n \ket {\phi_j}\right) \otimes \ket\psi = \ket {\hat \phi} \otimes \ket \psi$$</p><p>where \(\hat \phi\) is the estimation of \(\phi\).</p>
<h3 id="heading-hhl-workflow">HHL workflow</h3>
<p><img src="https://www.researchgate.net/publication/358996216/figure/fig2/AS:1139147511205889@1648605339927/Quantum-circuit-of-the-HHL-algorithm.png" alt="Quantum circuit of the HHL algorithm | Download Scientific Diagram" class="image--center mx-auto" /></p>
<p>The above <a target="_blank" href="https://www.researchgate.net/figure/Quantum-circuit-of-the-HHL-algorithm_fig2_358996216">picture</a> depicts the circuit of the HHL algorithm. One may notice that the algorithm can be broken down into 3 parts mainly:</p>
<ul>
<li><p>a QPE</p>
</li>
<li><p>a controlled rotation</p>
</li>
<li><p>an inverse QFE</p>
</li>
</ul>
<p>Assuming the input state \(\ket b\) is already prepared, the first block is used to find the phase of \(\{\lambda_i\}\) the eigenvalues of the matrix \(U = e^{i tA}\), and the approximated result is stored in the middle register.</p>
<p>At the end of the QPE, what we have is:</p>
<p>$$\ket 0 \otimes\left(\sum_i a_i \ket {u_i} \otimes \ket{\hat\lambda_i}\right)$$</p><p>where \(\sum_i a_i\ket {u_i}\) is \(\ket b\) expressed in terms of \(\ket {u_i}\), the eigenvalues of \(U\) and \(\hat \lambda_i\) is the binary approximation of the phase of \(U\).</p>
<p>Then a controlled rotation gate is applied, which corresponds to the following transformation:</p>
<p>$$\ket 0 \otimes\left( \sum_i a_i \ket {u_i} \otimes \ket{\hat\lambda_i}\right) \rightarrow \left(\sqrt{1-\left(\frac c\lambda\right)^2}\ket 0 + \frac c\lambda \ket 1\right)\otimes\left(\sum_i a_i \ket {u_i} \otimes \ket{\hat\lambda_i}\right)$$</p><p>where \(c\) is a normalization constant.</p>
<p>The last block, the inverse QPE, is used to go from the state above to:</p>
<p>$$\ket q\otimes\left(\sum_i a_i \ket {u_i} \otimes \ket{\hat\lambda_i}\right) \rightarrow \ket q \otimes\left(\sum_i a_i \ket {u_i}\right)\otimes \ket{0}$$</p><p>where \(\ket q \equiv  \left(\sqrt{1-\left(\frac c\lambda\right)^2}\ket 0 + \frac c\lambda \ket 1\right)\).</p>
<p>Notably, if the fist qubit (the top register) is measured we have two cases:</p>
<ul>
<li>if \(\ket q\) collapses into 1: once the middle register is measure as well, the result is:</li>
</ul>
<p>$$∝\sum_i a_i \ket u_i \otimes \frac c{\lambda_i}$$</p><p>which is proportional to \(\ket {A^{-1}b}\) because of the spectral decomposition of \(A\).</p>
<p>In fact \(A = \sum_i \lambda_i u_iu_i^\dagger\) and (by the properties of spectral decomposition) \(A^{-1} = \sum_i \lambda_i^{-1} u_iu_i^\dagger\) hence \(A^{-1}b  = \sum_i a_i \lambda_i^{-1} u_i\) being \(u_i^\dagger u_i =1\) (for the properties of quantum states).</p>
<ul>
<li>if \(\ket q\) collapses into 0, one may run again the program</li>
</ul>
<h3 id="heading-complexity-analysis">Complexity analysis</h3>
<p>Let:</p>
<ul>
<li><p>\(k \) the conditional number (ratio of the largest and smallest absolute values of eigenvalues of \(A\))</p>
</li>
<li><p>\(\epsilon \) the error from the output state \(\ket {A^{-1}b}\)</p>
</li>
<li><p>\(s\) the maximum number of non-zero elements in each row of the matrix \(A\)</p>
</li>
<li><p>\(N\) the size of the matrix</p>
</li>
</ul>
<p>In fact, simulating \(e^{-iAt}\), if \(A\) is \(s\)-sparse, can be done with error \(\epsilon\) in \(O(\log(N)s^2t\epsilon^{-1})\), which is required in the QPE process. One may then perform O(k) Quantum Amplitude Amplification repetitions to amplify the probability of measuring \(1\), since \(C=O(\frac 1k)\) and if \(\lambda \leq 1\), the probability of measuring \(1\) is \(\Omega(\frac 1{k^2})\).</p>
<p>Putting all together, then the computational complexity of the original HHL algorithm is:</p>
<p>$$O(\log(N)k^2s^2\epsilon^{-1})$$</p><p>however many improvements have been made and the computational complexity of the currently most efficient HHL algorithm is:</p>
<p>$$O\left(poly(\log(sk\epsilon^{-1}))sk\right )$$</p><p>and if we assume \(s = O(poly\left(\log(N)\right))\), the algorithm (focusing only on \(N\)) runs in:</p>
<p>$$O(poly\left(\log(N)\right))$$</p><p>which represents an exponential speedup in the matrix dimension compared to the best conjugate gradient method, whose complexity is:</p>
<p>$$O \left(Nsκlog\left(\frac 1\epsilon\right)\right)$$</p><p>However this holds on very specific assumptions and the next section deals with what happens if some of the assumptions are not met.</p>
<h3 id="heading-loss-of-quantum-advantage-and-near-term-feasibility-of-hhl">Loss of quantum advantage and near term feasibility of HHL</h3>
<p>The computational complexity above, is based on the assumptions that:</p>
<ul>
<li><p>\(\ket b\) is already available</p>
</li>
<li><p>doesn’t consider that \(\ket {A^{-1}b} \) should be read out</p>
</li>
</ul>
<p>Note that if this input/output overhead takes \(O(N)\), the exponential speedup is lost.</p>
<p>The computational cost of encoding \(b\) in \(\ket b\) is:</p>
<p>$$O(N)$$</p><p>if \(b\) is a simple bitstring and in general is:</p>
<p>$$O\left(2^N\right)$$</p><p>for a generic superposition, which results in the loss of the exponential speedup.</p>
<p>Moreover, also reading out the output solution state \(\ket {A^{-1}b}\) into a classical bitstring \(A^{-1}b\) requires \(O(N)\), offsetting the exponential acceleration.</p>
<h3 id="heading-hhl-in-solving-linear-differential-equations">HHL in solving linear differential equations</h3>
<p>One of the main applications of the HHL algorithm is solving linear differential equations. Quantum computers in fact can simulate quantum systems (which are described by a restricted type of linear differential equations), and using HHL it’s possible to solve general inhomogeneous sparse linear differential equations.</p>
<p>A first-order ordinary differential equation may be written as:</p>
<p>$$\frac {\delta x(t)}{\delta t}=A(t)x(t) + b(t)$$</p><p>where \(A(t)\) is a \(N\times N\) matrix we assume to be sparse and \(x(t)\) and \(b(t)\) are \(N\)-components vectors.</p>
<p>A similar system can be the output of a conversion process from any linear differential equation with higher-order derivatives or from the discretization of partial differential equations.</p>
<p>A bunch of different methods involving HHL can be used to solve the above DE, however the workflow is roughly the same:</p>
<ul>
<li><p>discretize the differential equation and get a system of algebraic equation</p>
</li>
<li><p>use HHL to find the solution of the system</p>
</li>
</ul>
<p>In fact, one may apply a discretization scheme to the DE, for example the Euler method, to map the DE to a difference equation:</p>
<p>$$\frac{x_{i+1} + x_i}h= A(t_i)x_i + b(t_i)$$</p><p>and it is straightforward to see that this methods results in the following linear system:</p>
<p>$$Ax=b$$</p><p>where \(x\) is the vector of blocks \(x_i\), and \(b\) also contains the value of \(x_0\).</p>
<p>To learn more about this please see <a target="_blank" href="https://arxiv.org/pdf/1010.2745">Berry, (2014), “High-order quantum algorithm for solving linear differential equations”</a>.</p>
<h3 id="heading-hhl-in-solving-least-square-curve-fitting">HHL in solving least-square curve fitting</h3>
<p>Another interesting application for HHL is least squares fitting. The goal in least squares fitting is to find a continuous function to approximate a discrete set of \(N\) points \(\{x_i, y_i\}\). The function has to be linear in the parameter \(\theta \) but can be non linear in \(x\), e.g.:</p>
<p>$$f(\theta, x) = \sum_i \theta_if_i(x)$$</p><p>The optimal parameters can be found by minimizing an error function such as the mean squared error:</p>
<p>$$E = |y - f(\theta, x)|^2$$</p><p>which can be expressed in matrix for as:</p>
<p>$$E= |y- F\theta|^2$$</p><p>where \(F_{ij}=f_j(x_i)\). The best fitting parameter can be found using Moore– Penrose pseudoinverse as:</p>
<p>$$\theta^* = \left(F^\dagger F\right)^{-1}F^\dagger y$$</p><p>Finding the best \(\theta\) then involves 3 subroutines:</p>
<ul>
<li><p>performing the pseudo–inverse using the HHL algorithm and quantum matrix multiplication</p>
</li>
<li><p>an algorithm for estimating the fit quality</p>
</li>
<li><p>an algorithm for learning the fit-parameters \(\theta\)</p>
</li>
</ul>
<p>To learn more about this please consider reading <a target="_blank" href="https://arxiv.org/pdf/1204.5242">Wiebe, Brown, LLoyd, (2012), “Quantum Data-Fitting“</a>.</p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here.</strong></a></p>
<hr />
<h2 id="heading-sources">Sources</h2>
<ul>
<li><p><a target="_blank" href="https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.103.150502">Harrow, Hassidim, Lloyd, (2009), “Quantum algorithm for linear systems of equations“</a></p>
</li>
<li><p><a target="_blank" href="https://epubs.siam.org/doi/10.1137/16M1087072">Childs, Kothari, Somma, (2017), “Quantum Algorithm for Systems of Linear Equations with Exponentially Improved Dependence on Precision“</a></p>
</li>
</ul>
<ul>
<li><a target="_blank" href="https://epubs.siam.org/doi/10.1137/16M1087072">Duan, Yuan, Yu, Huang, Hsieh, (2020), “A survey o</a><a target="_blank" href="https://www.sciencedirect.com/science/article/pii/S037596012030462X\)">n HHL algorithm: From theory to application in quantum machine learning”</a></li>
</ul>
<ul>
<li><a target="_blank" href="https://arxiv.org/pdf/2404.19067">Zheng, Liu, Stein, Li, Mulmenstadt, Chen, Li, (2024), “An Early Investigation of the HHL Quantum Linear Solver for Scientific Applications”</a></li>
</ul>
<ul>
<li><p><a target="_blank" href="https://arxiv.org/pdf/1010.2745">Berry (2014), “High-order quantum algorithm for solving linear differential equations”</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/1204.5242">Wiebe, Brown, LLoyd (2012), “Quantum Data-Fitting“</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Computing Gradients of Quantum Circuits using Parameter Shift Rule]]></title><description><![CDATA[Variational Quantum Algorithms (VQAs) are regarded as one of the most promising approaches for leveraging near-term quantum devices as they combine the power of quantum circuits with classical optimization to solve problems in chemistry, material sci...]]></description><link>https://amm.zanotp.com/computing-gradients-of-quantum-circuits-using-parameter-shift-rule</link><guid isPermaLink="true">https://amm.zanotp.com/computing-gradients-of-quantum-circuits-using-parameter-shift-rule</guid><category><![CDATA[quantum computing]]></category><category><![CDATA[Quantum Machine Learning]]></category><category><![CDATA[optimization]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Thu, 19 Sep 2024 06:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/9aCD5kzPwa8/upload/158d7639e5daed3b0852c05f04fd7545.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Variational Quantum Algorithms (VQAs) are regarded as one of the most promising approaches for leveraging near-term quantum devices as they combine the power of quantum circuits with classical optimization to solve problems in chemistry, material science, machine learning, and beyond. A key challenge in the successful implementation of VQAs, such as the Variational Quantum Eigensolver (VQE) and Quantum Approximate Optimization Algorithm (QAOA), is the ability to efficiently optimize the parameters that govern quantum circuits. This is where the shift rule plays a critical role.</p>
<p>The shift rule provides an analytical and efficient method for computing gradients of parameterized quantum circuits with respect to their parameters. These gradients are essential for guiding the classical optimizer in the search for the optimal parameters that minimize the cost function. Without an accurate and efficient gradient computation method, the optimization would either be too slow or too noisy, leading to suboptimal results. In this context, the shift rule not only enhances the accuracy of the optimization but also enables its scalability, making it a foundational tool for advancing VQA methodologies in near-term quantum computing.</p>
<p>This blog post briefly discuss VQAs and focuses and one of the various shift rule implementations.</p>
<h2 id="heading-variational-quantum-algorithms">Variational Quantum Algorithms</h2>
<p>Variational Quantum Algorithms (VQAs) are hybrid quantum-classical approaches that aim to solve optimization problems by minimizing a cost function over a parameterized quantum circuit. The quantum circuit, often referred to as a parameterized quantum circuit (PQC), depends on a set of tunable parameters \({\theta} = (\theta_1, \theta_2, \dots, \theta_n)\). The goal is to find the optimal values of these parameters that minimize a given cost function \( C(\theta)\), which is typically the expectation value of a problem-specific quantum observable.</p>
<p>Mathematically, the cost function can be expressed as:</p>
<p>$$C(\theta) = \bra {\psi(\theta)} H\ket{\psi(\theta)}$$</p><p>where</p>
<ul>
<li><p>\(\ket {\psi(\theta)}\)is the quantum state produced by the PQC,</p>
</li>
<li><p>\(H\) is the problem Hamiltonian</p>
</li>
</ul>
<p>Two of the most famous and important VQAs are QAOA and VQEs. Next paragraph is devoted to give an introduction to both.</p>
<h3 id="heading-variational-quantum-eigensolver">Variational Quantum Eigensolver</h3>
<p>The Variational Quantum Eigensolver (VQE) is a VQA that combines the variational principle from quantum mechanics with classical optimization techniques. VQE is particularly useful for problems in quantum chemistry and materials science, where finding the ground state energy of a Hamiltonian is a central task.</p>
<p>In a VQE setting, the cost function is represented by the energy expectation value, formulated as:</p>
<p>$$E(\theta)=\bra{\psi(\theta)}H\ket{\psi(\theta)}$$</p><p>According to the <strong>variational principle</strong>, this energy is always greater than or equal to the ground state energy \(E_0\)​, meaning:</p>
<p>$$E(\theta)\geq E_0 ​$$</p><p>Thus, the goal is to minimize \(E(\theta)\) by adjusting the parameters \(\theta\) to find the state \(\ket{\psi(\theta)}\) that approximates the ground state.</p>
<p>The workflow is the following:</p>
<ul>
<li><p>define a parametrized unitary \(U(\theta)\) to prepare \(\ket{\psi(\theta)}\)</p>
</li>
<li><h5 id="heading-measure-the-expectation-value-of-the-systems-hamiltonian-h-computing-ethetabrapsithetahketpsitheta-or-the-gradient-of-the-expectation-value-with-shift-rule-or-with-numerical-methods">measure the expectation value of the system’s Hamiltonian \(H\) computing \( E(\theta)=\bra{\psi(\theta)}H\ket{\psi(\theta)}\) or the gradient of the expectation value (with shift rule or with numerical methods)</h5>
</li>
<li><p>provide the measured expectation value \(E(\theta)\) or its gradient to a classical optimization algorithm and iterate until the minimum energy is found.</p>
</li>
</ul>
<h3 id="heading-quantum-approximate-optimization-algorithm">Quantum Approximate Optimization Algorithm</h3>
<p>The Quantum Approximate Optimization Algorithm (QAOA) is a hybrid quantum-classical algorithm designed to solve combinatorial optimization problems, such as the Max-Cut problem. QAOA is again a VQA so it uses a parameterized quantum circuit to encode the problem and iteratively improves the solution by tuning the parameters. It has been proposed as a promising algorithm for near-term quantum devices due to its robustness against noise and hardware limitations.</p>
<p>QAOA aims to minimize a classical objective function, \(C(z)\) where \(z \in \{0,1\}^n\) represents the solution to the optimization problem encoded as a binary string. The goal is to find the string \(\hat z\) that minimizes the objective function.</p>
<p>The algorithm constructs a quantum state that approximates the solution by evolving under two Hamiltonians: a problem Hamiltonian \(H_C\)​ and a mixer Hamiltonian \(H_M\).</p>
<p>The former is meant to represent the cost function of the classical problem, for example for the max-cut problem the classical cost function is:</p>
<p>$$C(z)= \sum_{⟨i,j⟩}w_{ij}(1−z_iz_j)$$</p><p>where \(w_{ij}\) is the weight of the edge between vertices \(i\) and \(j\), \(\langle i,j⟩\)denotes the set of all pairs of vertices that are connected by an edge and \(z_i\) is the binary variable representing the partition of vertex \(i\).</p>
<p>The corresponding problem Hamiltonian is formulated in terms of quantum operators as:</p>
<p>$$H_C\sum_{ ⟨i,j⟩}w_{ij}\frac {1-Z_iZ_j} 2$$</p><p>where \(Z_i\) is Pauli-Z operator acting on qubit \(i\).</p>
<p>Coming to the mixer Hamiltonian, it should encourage transitions between different solutions by applying a mixing operator that typically consists of Pauli-X gates. The mixer Hamiltonian can defined as:</p>
<p>$${H}_M = \sum_{i} {X}_i$$</p><p>where \(X_i\) is the Pauli-X operator acting on qubit \(i\), responsible for flipping the bit \(z_i\).</p>
<p>QAOA workflow alternates between evolving under the problem and mixer Hamiltonians for a series of \(p\) layers, which results in the following quantum state:</p>
<p>$$\ket{ψ(γ,β)}=∏_{j=1}^p​e^{−iβ_j​H_M}​e^{-iγ_j​H_C}​H^{⊗n}\ket0^{⊗n}$$</p><p>where:</p>
<ul>
<li><p>\(\gamma\) are parameter controlling the controlling the evolution under the problem Hamiltonian</p>
</li>
<li><p>\(\beta\) are the parameters controlling the evolution under the mixer Hamiltonian</p>
</li>
<li><p>\(H^{⊗n}\ket0^{⊗n}\) is the n-fold Hadamard applied on a n-qubit register.</p>
</li>
</ul>
<p>The goal of QAOA is to optimize the parameters \(\gamma\) and \(\beta\) such that the quantum state maximizes the expectation value of the objective function \(C(\gamma, \beta)\), which corresponds to the expectation value of the problem Hamiltonian:</p>
<p>$$C(\gamma, \beta)=\bra{ψ(γ,β)} H_C \ket{ψ(γ,β)}$$</p><p>And similarly to VQE, by iteratively updating \(\gamma\) and \(\beta\) using a classical optimizer, the algorithm converges to an approximate solution to the optimization problem.</p>
<h2 id="heading-gradient-computation-using-the-shift-rule">Gradient Computation Using the Shift Rule</h2>
<p>As we have seen in the above example, VQA sometimes requires the quantum circuit to evaluate the gradient of the cost function rather than the function itself. Shift rule is useful because it avoids approximations typically used in numerical differentiation (like finite differences).</p>
<p>The next sections describe one of the formulations proposed for shift rule.</p>
<p>We are interested in the expectation value of an observable \(H\) (e.g., a Hamiltonian) with respect to the quantum state \(\ket{\psi(\theta)}\), which is the state prepared by applying \(U(\theta)\) to the initial state \(\ket0^{\otimes n}\). The expectation value is:</p>
<p>$$C(θ)=\bra{ψ(θ)}H\ket{ψ(θ)}=\bra0U^\dagger(θ)HU(θ)\ket0$$</p><p>The goal is to compute the derivative w.r.t. \(\theta\) of \(C(θ)\), so that this information can be supplied to a classical optimizer.</p>
<p>The derivative is</p>
<p>$$\frac{∂C(θ)}{∂_i}​=\frac{\partial〈ψ| U^† ˆQU |ψ〉} {\partial \theta_i}= 〈ψ| U^† Q(\frac{\partial G}{\partial \theta_i}) |ψ〉+ 〈ψ|(\frac{\partial G}{\partial \theta_i})^\dagger QU |ψ〉$$</p><p>assuming the parameter \(\theta_i\) only affects a single gate.</p>
<h3 id="heading-parameter-shift-rule-for-gates-with-generators-with-two-distinct-eigenvalues">Parameter-shift rule for gates with generators with two distinct eigenvalues</h3>
<p>If we are given a parametrized quantum gate \(U \) with a parameter \(\theta\) of the form:</p>
<p>$$U(θ_k)=e^{−iθ_kG}$$</p><p>where \(G\) is a Hermitian operator, it’s trivial to prove that:</p>
<p>$$\frac{\partial U(θ_k)}{\partial \theta_k}=-iGU(\theta_k)$$</p><p>and substituting into the derivative of the circuit equation what we get is:</p>
<p>$$\frac{∂C(θ)}{∂θ_k}​= -i〈ψ| U^† HGU(\theta_k) |ψ〉-i 〈ψ|HU(\theta_k)^\dagger G^\dagger |ψ〉$$</p><p>If \(G\) has at most two unique eigenvalues \(\pm r\).</p>
<p>Moreover, one may prove with some algebra that for every operators A, B and Hermitian observable Q:</p>
<p>$$\bra \psi A^\dagger Q B\ket \psi + \bra \psi B^\dagger Q A\ket \psi = \frac 12(\bra \psi (A+B)^\dagger Q (A+B)\ket \psi + \bra \psi (A-B)^\dagger Q (A-B)\ket \psi)$$</p><p>therefore, using \(A = I\) and \(B = -ir^{-1}G\) in our problem:</p>
<p>$$\frac{∂C(θ)}{∂θ}​= \frac r2(\bra \psi (I-ir^{-1}G)^\dagger H (I-ir^{-1}G)\ket \psi + \bra \psi (I+ir^{-1}G)^\dagger H(I+ir^{-1}G)\ket \psi)$$</p><p>It is also possible to show that for such special \(G\):</p>
<p>$$U(\frac \pi {4r}) = \frac{I-ir^{-1}G}{\sqrt{2}}$$</p><p>Hence the partial derivative of the cost function can be estimated place either the gate \(U(\frac \pi{\sqrt 2})\) or \(U(-\frac \pi{\sqrt 2})\) after the gate to be differentiated.</p>
<p>However, since:</p>
<p>$$U(a)U(b) = G(a+b)$$</p><p>when \(G \) is a unitarily generated one parameter gate, by substitution this leads to the parameter shift rule:</p>
<p>$$\frac{∂C(θ)}{∂θ_k}​= r(〈ψ| U^†(\theta_k +s) HGU^†(\theta_k +s) |ψ〉- 〈ψ|HU(\theta_k-s)^\dagger U^†(\theta_k -s) |ψ)$$</p><p>which is equivalent to:</p>
<p>$$\frac{∂C(θ)}{∂θ_k}​= r(C(\theta +s) - C(\theta-s))$$</p><p>where \(s = \frac{\pi}{4r}\).</p>
<p>If the parameter \(\theta_k\) appears in more than a single gate in the circuit, the derivative is obtained using the product rule by shifting the parameter in each gate separately and summing the results.</p>
<h3 id="heading-differentiation-of-general-gates-via-linear-combination-of-unitaries">Differentiation of general gates via linear combination of unitaries</h3>
<p>If the \(U(\theta) \) doesn’t has the above mentioned form, we may evaluate the partial derivative of the cost function using an ancilla qubit. The idea is to express \(U(\theta)\) as a linear combination of unitary matrices \(A_j\).</p>
<p>The derivative then becomes:</p>
<p>$$\partial_{\theta_k} C(\theta) = \sum_j a_j (\bra \psi U^\dagger H A_j\ket \psi+ \bra \psi A_j^\dagger H U\ket \psi)$$</p><p>where \(a_j\) are real values.</p>
<p>The circuit requires to be instantiated in the:</p>
<p>$$\ket + \ket \psi$$</p><p>state and then the controlled \(U\) is applied upon the \(\ket \psi\) state (conditioned on \(\ket \psi\) being in \(\ket 0\) state), followed by a controlled \(A_k\) on the same state (conditioned on \(\ket \psi\) being in \(\ket 1\) state), which results in:</p>
<p>$$\frac 1{\sqrt{2}}(\ket 0 U\ket\psi + \ket 1 A_k\ket\psi)$$</p><p>Once another Hadamard gate is applied on the ancilla, the resulting state is:</p>
<p>$$\frac 14 (\ket 0 [G+A_k]\ket \psi + \ket 1 [G-A_k]\ket \psi)$$</p><p>One can than estimate \(p_0\) and \(p_1\), respectively the probability of getting the \((G+A_k) \ket\psi\) and \((G-A_k) \ket\psi\) once the ancilla is measured.</p>
<p>With those probabilities the following cost functions can be defined:</p>
<p>$$C_0 = \frac 1{4p_0} \ket \psi (U+A_k)^\dagger H (U+A_k)\ket \psi$$</p><p>and similarly:</p>
<p>$$C_1 = \frac 1{4p_1} \ket \psi (U-A_k)^\dagger H (U-A_k)\ket \psi$$</p><p>and therefore:</p>
<p>$$\bra \psi U^\dagger H A_k\ket \psi + \bra \psi A_k^\dagger H U\ket \psi = 2(p_0 C_0 - p_1C_1)$$</p><p>which means that we can repeat all the steps done for gates with generators with two distinct eigenvalues.</p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="https://amm.zanotp.com/contact"><strong>here</strong>.</a></p>
<h2 id="heading-sources">Sources:</h2>
<ul>
<li><p>Mitarai, Negoro, Kitagawa, Fujii, (2019), “Quantum Circuit Learning K“, <a target="_blank" href="https://arxiv.org/pdf/1803.00745">https://arxiv.org/pdf/1803.00745</a></p>
</li>
<li><p>Schuld, Bergholm, Gogolin, Izaac, Killoran, (2018) “Evaluating analytic gradients on quantum hardware Maria“, <a target="_blank" href="https://arxiv.org/pdf/1811.11184">h</a><a target="_blank" href="https://amm.zanotp.com/contact">ttps</a><a target="_blank" href="https://arxiv.org/pdf/1811.11184">://arxiv.org/pdf/1811.11184</a></p>
</li>
<li><p>Kottmann, Killoran, “Evaluating analytic gradients of pulse programs on quantum computers”, <a target="_blank" href="https://arxiv.org/pdf/2309.16756">https://arxiv.org/pdf/2309.16756</a></p>
</li>
<li><p>Stein, Wiebe, Ding, Bo, Kowalski, Baker, Ang, Li, “EQC: Ensembled Quantum Computing for Variational Quantum Algorithms”, <a target="_blank" href="https://dl.acm.org/doi/pdf/10.1145/3470496.3527434">https://dl.acm.org/doi/pdf/10.1145/3470496.3527434</a></p>
</li>
<li><p>He, “Computing the gradients with respect to all parameters of a quantum neural network using a single circuit”, <a target="_blank" href="https://arxiv.org/pdf/2307.08167">https://arxiv.org/pdf/2307.08167</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Quantum kernels in solving differential equations]]></title><description><![CDATA[Machine learning is often guided by the balance between bias and variance, i.e. if a model is too simple, it struggles to capture the underlying relationships between inputs and outputs and, on the other hand, a model that's too complex might excel d...]]></description><link>https://amm.zanotp.com/qk-de</link><guid isPermaLink="true">https://amm.zanotp.com/qk-de</guid><category><![CDATA[Kernel]]></category><category><![CDATA[#differential-equations]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sun, 15 Sep 2024 06:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/eSnOeMaH6RI/upload/b80726db36b33e9e375ab50355db6185.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Machine learning is often guided by the balance between bias and variance, i.e. if a model is too simple, it struggles to capture the underlying relationships between inputs and outputs and, on the other hand, a model that's too complex might excel during training but falter when faced with new, unseen data.</p>
<p>Ideally, we aim for models that are both quick to train and sophisticated enough to discover meaningful patterns and one of the most intriguing methods to achieve this balance is through kernel methods which allow for the training of simple linear models that maintain low bias and low variance, by mapping the data into a higher-dimensional feature space, making them both efficient and effective. In this post, after a brief introduction, we'll discuss the fascinating world of quantum kernels (which as the name suggests are the quantum analogue of kernel) and explore how these powerful tools can be applied to solve differential equations.</p>
<h2 id="heading-kernel-methods">Kernel methods</h2>
<p>Kernel methods are a cornerstone of machine learning, providing a powerful way to handle non-linear relationships between output (\(y\)) and input data(\(x\)). The core idea is to transform the data into a higher-dimensional feature space where linear models can be applied effectively. This transformation is not done explicitly, instead, the so-called kernel trick allows to compute inner products between data points as if they were in the higher-dimensional space, without ever needing to compute their actual coordinates in that space. This results in computational efficiency, even when working with very high-dimensional spaces.</p>
<p>We can use kernel methods in a regression context, i.e. to find a function \(f\) s.t.:</p>
<p>$$f(x_i) \approx y_i$$</p><p>where \(x_i\) are the input features and \(y_i\) are the corresponding output labels.</p>
<p>For example one may assume the \(f\) has to be linear, i.e.</p>
<p>$$f(x) = a+x b$$</p><p>where \(a\) and \(b\) are just vectors that can be learned by minimizing an error function, for example the mean square error:</p>
<p>$$MSE = \frac 1n \sum_i (f(x_i)-y_i)^2$$</p><p>Of course assuming a linear relation between the input and the output isn’t necessarily true and while such a linear model is a really simple model, it could struggle with non linear data.</p>
<p>To address such non-linear relationships, kernel methods extend the idea of linear models by mapping the input space \(x\) to a higher-dimensional feature space \(\phi(x)\) choosing \(\phi(\cdot)\) s.t. \(y\) is then linear in \(\phi(x)\) and linear models can be used.</p>
<p>However the strength of kernel methods lies in the already mentioned kernel trick: instead of explicitly calculating the new feature space, kernel methods relies on the so called kernel functions:</p>
<p>$$K(x_i, x_j) = \langle\phi(x_i), \phi(x_j)\rangle$$</p><p>which is the used to define the kernelized learning task, which could potentially be something like:</p>
<p>$$f(x) = \sum_i a_i K(x, x_i) + b$$</p><p>which is linear in the newly mapped feature space. At that point an error function can be defined to find the parameters of the model.</p>
<p>As you can see the essence of the kernel methods relies in their ability to handle non-linear data using linear models in a higher-dimensional space without explicitly mapping data to that space.</p>
<p>One last point: in functional analysis, Mercer theorem guarantees that, if some condition are met, a dot product in some Hilbert space corresponds to a kernel. This is of paramount importance as one may define a kernel function from an inner product as we will do later in the article to get a quantum kernel.</p>
<h3 id="heading-kernel-methods-in-python">Kernel methods in Python</h3>
<p>To make it easier for the reader to understand kernel methods, I prepared a simple python example where linear regression and a particular kernel method is applied on non-linear data:</p>
<pre><code class="lang-plaintext">import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Create non-linear data
np.random.seed(42)
X = np.sort(5 * np.random.rand(50, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Fit Linear Regression
linear_reg = LinearRegression()
linear_reg.fit(X, y)
y_pred_linear = linear_reg.predict(X)

# Fit Support Vector Regression (SVR) with RBF Kernel
svr_rbf = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr_rbf.fit(X, y)
y_pred_svr = svr_rbf.predict(X)

# Plot the results
plt.scatter(X, y, color='darkorange', label='Data')
plt.plot(X, y_pred_linear, color='cornflowerblue', label='Linear Regression', linewidth=2)
plt.plot(X, y_pred_svr, color='green', label='SVR with RBF Kernel', linewidth=2)
plt.legend()
plt.show()
</code></pre>
<p>The result of the above code is:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726937450018/39959eb2-6d78-40cd-9bf6-b6b71ce45739.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-quantum-kernel-methods">Quantum kernel methods</h2>
<p>Quantum kernel methods take the idea behind classicalMkernels to the next level, using quantum mechanics to tackle more complex, high-dimensional data. The basic idea is to encode information into quantum states and calculate similarities in the new space. This section discusses quantum kernel methods, starting from the encoding of the classical variables into quantum states and then discussing the evaluation of quantum kernel functions and how to take derivatives of quantum kernel functions.</p>
<h3 id="heading-encoding">Encoding</h3>
<p>We start introducing the concept of quantum kernel methods from that of quantum kernel function. As already mentioned, a kernel function is a function \(k\) mapping two variables \(x_i\) \(x_j\) \(\in\) \(A\) to the complex space:</p>
<p>$$k: A \times A\rightarrow \mathbb{C}$$</p><p>and a quantum kernel function is a kernel function that can be evaluated on a quantum computer. For example a valid quantum kernel can be:</p>
<p>$$k(x_i, x_j) \equiv \bra {\psi(x_i)}\ket {\psi(x_j)}$$</p><p>where the \(\psi(x_i)\) indicates a quantum states encoded by the classical variable \(x_i\).</p>
<p>To define quantum kernel functions properly, it's essential to understand what encoding into quantum states entails and how to do it efficiently which is equivalent to the crucial task of designing an efficient feature map. The idea is to use a parametrized quantum circuit \(U\) s.t.:</p>
<p>$$U(x_i)\ket 0 = \ket {\psi(x_i)}$$</p><p>A simple example of \(U\) is:</p>
<p>$$\bigotimes_j R_{p,j}[\phi(x_i)]$$</p><p>where \(R_{p,j}[\phi(x_i)]\) represents the rotation on qubit \(j\) of angle \(\phi(x_i)\)over about a Pauli operator \(q\). The selection of the feature map is crucial, as it must be expressive enough to capture the problem's solution while remaining trainable.</p>
<h3 id="heading-evaluation">Evaluation</h3>
<p>After mapping the classical variables into the feature space, the next step is to efficiently implement the quantum kernel function in a quantum circuit. Given that a convenient quantum kernel is defined as:</p>
<p>$$k(x_i,x_j) \equiv |\bra {\psi(x_i)}\ket {\psi(x_j)}|^2$$</p><p>the question becomes: which quantum circuit can perform this computation?</p>
<p>Naively, one can exploit the fact that:</p>
<p>$$k(x_i,x_j) \equiv |\bra {\psi(x_i)}\ket {\psi(x_j)}|^2= \bra 0U^\dagger(x_j)U(x_i)\ket0\bra 0U^\dagger(x_i)U(x_j)\ket0$$</p><p>which can be implemented by applying \(U(x_j)\) followed by \(U(x_i)\) on \(\ket 0 ^{\otimes n} \) , where \(n\) is the number of qubits necessary to encode \(\ket {\psi(x)}\), followed by measurements. The kernel function value is determined by measuring all qubits and calculating the probability of remaining in the zero state, which is found by taking the ratio of times \(\ket 0\) is found.</p>
<p>Alternatively, one may use the coherent SWAP test (while this would require \(2n +1\) qubits). This involves preparing the \(\ket {\psi(x_i)}\)and \(\ket {\psi(x_j)}\) states on two separate registers and then performing a swap controlled by an ancilla in superposition. The advantage of this method is that it only requires to measure one single ancilla qubit, however it needs more qubits than the naïve method above.</p>
<p>Last, it is possible to use two evaluations of the Hadamard test (which requires \(n+1\) qubits) which can be exploited to compute the real and the imaginary part of:</p>
<p>$$\bra 0U^\dagger(x_i)U(x_j)\ket0$$</p><p>which are then used to evaluate the kernel as:</p>
<p>$$Re( \bra 0U^\dagger(x_i)U(x_j)\ket0 )^2 + Im( \bra 0U^\dagger(x_i)U(x_j)\ket0 )^2$$</p><h3 id="heading-computing-derivatives">Computing derivatives</h3>
<p>Since this article involves solving differential equations, we need to be able to compute the derivatives of the kernel function on a quantum computer. This will be important for the definition of the loss function and for the optimization of the parameters of the solution.</p>
<p>Let’s first assume that:</p>
<p>$$\nabla_{p,q} k(x_i,x_j) \equiv \frac{\partial^{p+q} k(x_i,x_j)}{\partial^p x_i\partial^qx_j}$$</p><p>We can consider the kernel function formulation acting on a single register defined above (the one we called naïve):</p>
<p>$$k(x_i, x_j)=\bra 0U^\dagger(x_j)U(x_i)\ket0\bra 0U^\dagger(x_i)U(x_j)\ket0$$</p><p>and build the following quantum model:</p>
<p>$$f(x) = \bra 0U^\dagger(x)MU(x)\ket0$$</p><p>where \(M\) is a measurement operator and take its derivative, which, by parameter shifting rule (a method for estimating the gradients of a parameterized quantum circuit, which will be the focus of one of the upcoming articles) is:</p>
<p>$$\partial_xf=\sum_{i=1}^n\frac{f(x+\frac \pi 2_i) - f(x-\frac \pi 2_i)}2$$</p><p>where \(n \) is the number of gates depending on \(x\) in \(U(x)\) and \(f(x\pm\frac \pi 2_i)\) is the evaluation of \(f(x)\) where the i-th gate depending on \(x\) is shifted by \(\pm\frac \pi 2\). Higher order derivatives can be implemented iterating the parameter shifting rule.</p>
<p>Alternatively the computation of derivatives is also permitted by the Hadamard test. Starting once again from:</p>
<p>$$k(x_i, x_j)=\bra 0U^\dagger(x_j)U(x_i)\ket0\bra 0U^\dagger(x_i)U(x_j)\ket0$$</p><p>by product rule the first derivative is:</p>
<p>$$\bra 0U^\dagger(x_j)\partial_{x_i}U(x_i)\ket0\bra 0U^\dagger(x_i)U(x_j)\ket0 + \bra 0U^\dagger(x_j)U(x_i)\ket0\bra 0\partial_{x_i}U^\dagger(x_i)U(x_j)\ket0$$</p><p>and the first term can be computed via Swap test while the second one via one or more Hadamard tests.</p>
<h2 id="heading-solving-differential-equations">Solving differential equations</h2>
<p>This section deals with describing how quantum kernels can be used as solvers for differential equations. The first subsection introduces a regression method and the second subsection applies it to solve differential equation.</p>
<h3 id="heading-mixed-model-regression">Mixed model regression</h3>
<p>In the context of mixed model regression, the trial function (the functional form we give to the solution) is:</p>
<p>$$f_a(x) = b + \sum_{j=1}^na_j k(x, y_j)$$</p><p>where \(k(\cdot)\) is the quantum kernel function, \(\{y_j\}\) is a set of evaluation points, \(a\) and \(b\)are tunable coefficients. It’s worth noticing that being \(a\) the parameter to be optimized and since \(k(x, y_j)\) is independent of \(a\), one can compute each \(k(x, y_j)\) before starting the optimization procedure. Moreover, one of the advantages of mixed model regression is that its structure remains consistent across different problems, whereas in other classes of kernel regressions (e.g. support vector regression), the model form can vary.</p>
<h3 id="heading-differential-equations">Differential equations</h3>
<p>Let the following differential equation for the sake of simplicity:</p>
<p>$$DE(f,x, \partial _xf)= \partial_x f - g(f, x) = 0$$</p><p>with\(f(x_0) = f_0\), where \(g\) is a smooth function, we want to use mixed model regression and support vector regression to solve this DE.</p>
<p>Starting from mixed model regression, a proper choice for the loss function is:</p>
<p>$$\mathcal L(a) = \sum_i \left [ DE\left(f_a,x_i, \partial _xf_a(x_i)\right] \right ) + (f_a(x_0) - f_0)^2$$</p><p>where:</p>
<p>$$f_a(x) = b + \sum_{j=1}^na_j k(x, y_j)$$</p><p>Since the kernel and it’s derivatives are independent of \(a\), they can be evaluated only once and, using an appropriate optimization technique, it’s possible to find the optimal weights \(a\). Moreover in some cases such optimization problem is convex. The resulting function is then a suitable approximation to the solution of the differential equation.</p>
<p>In the case the differential equation is different from the one in the above example, one may rely on the same framework, just adjusting for generalized optimization (i.e. by defining a suitable loss function and by minimizing the sum of different loss functions in the case of systems of differential equations).</p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="https://amm.zanotp.com/contact">here</a>.</p>
<h2 id="heading-sources">Sources</h2>
<ul>
<li><p><a target="_blank" href="https://journals.aps.org/pra/pdf/10.1103/PhysRevA.107.032428">Paine, Annie E., Vincent E. Elfving, and Oleksandr Kyriienko. "Quantum Kernel Methods for Solving Regression Problems and Differential Equations." <em>Physical Review</em>, 2023</a></p>
</li>
<li><p><a target="_blank" href="https://link.springer.com/article/10.1007/s42484-019-00007-4">Mengoni, Riccardo, and Alessandra Di Pierro. "Kernel Methods in Quantum Machine Learning." <em>Springer Nature</em>, 2019</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/abs/2101.11020.">Schuld, Maria. "Supervised Quantum Machine Learning Models Are Kernel Methods." <em>arXiv</em>, 2021</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Quantum Boltzmann Machines]]></title><description><![CDATA[Quantum Boltzmann Machines (QBMs) are at the cutting edge of quantum machine learning, offering a novel extension of classical Boltzmann machines through the lens of quantum mechanics. These models take advantage of quantum principles to push the bou...]]></description><link>https://amm.zanotp.com/qbm</link><guid isPermaLink="true">https://amm.zanotp.com/qbm</guid><category><![CDATA[quantum boltzmann machines]]></category><category><![CDATA[quantum computing]]></category><category><![CDATA[Quantum Machine Learning]]></category><category><![CDATA[Boltzman Machine]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Wed, 07 Aug 2024 21:15:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/v99lFTVp_ws/upload/f290e5b97e99d9a78423817ef5713b9e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Quantum Boltzmann Machines (QBMs) are at the cutting edge of quantum machine learning, offering a novel extension of classical Boltzmann machines through the lens of quantum mechanics. These models take advantage of quantum principles to push the boundaries of what classical Boltzmann machines can achieve.</p>
<p>In this blog post, we'll explore the core concepts behind QBMs and investigate how they leverage the distinctive features of quantum computing to advance probabilistic modelling and learning and we'll examine how QBMs build on classical methods and the potential they hold for transforming data analysis and problem-solving in machine learning.</p>
<h2 id="heading-classical-boltzmann-machines">Classical Boltzmann Machines</h2>
<p>Before diving into Quantum Boltzmann Machines (QBMs), it's useful to understand classical Boltzmann Machines (BMs) since QBMs are inspired by and build upon the concepts of classical BMs.</p>
<p>The concept of Boltzmann Machines was introduced by Geoffrey Hinton and Terrence Sejnowski in the mid-1980s and the model is named after the physicist Ludwig Boltzmann, whose work on statistical mechanics inspired the probabilistic framework of the BM.</p>
<p>The impact of Boltzmann Machines has been broad and significant, for example in image and speech recognition, Restricted Boltzmann Machines (RBMs, an evolution of BMs) have excelled at unsupervised feature learning, improving classification accuracy. In recommendation systems, they’ve been used to predict user preferences and RMBs applications extend also to natural language processing for learning text representations and robotics for sensor fusion, combining data from multiple sources.</p>
<h3 id="heading-classical-architectures">Classical architectures</h3>
<p>Basically, a Boltzmann Machine consists of a collection of binary units, also known as neurons, organized into two layers: visible units and hidden units. The visible units represent the observed variables, while the hidden units capture the latent or hidden variables that explain the relationships in the data.</p>
<p><a target="_blank" href="https://www.andreaperlato.com/aipost/boltzmann-machine/"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722963181167/3fb29083-f436-4182-95ef-6e58e220f259.png" alt class="image--center mx-auto" /></a></p>
<p>Due to the exponential growth in the number of connections with an increase in nodes in a classical Boltzmann Machine (BM), the Restricted Boltzmann Machine (RBM) is often preferred. The RBM simplifies the architecture by restricting connections:</p>
<ul>
<li><p>Hidden nodes are not connected to each other</p>
</li>
<li><p>Visible nodes are also not connected to each other</p>
</li>
</ul>
<p>In RBMs, connections exist only between visible and hidden nodes, which makes the network more manageable and easier to train compared to the fully connected structure of classical BMs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722963884893/6be35bf6-3f3d-4d4d-adb8-875fdd5c9182.png" alt class="image--center mx-auto" /></p>
<p>The objective of Boltzmann Machines is to model a probability distribution over a set of binary variables (units) in a way that captures the complex relationships and dependencies between these variables. This is achieved through the concept of energy.</p>
<p>The idea behind the energy based learning lies in the energy function, defined as</p>
<p>$$E(v, h) = - \sum_{i} b_i v_i - \sum_{j} c_j h_j - \sum_{i,j} W_{ij} v_i h_j$$</p><p>where:</p>
<ul>
<li><p>\(b_i\) is the bias of the i-th visible node</p>
</li>
<li><p>\(v_i\) is the i-th visible node</p>
</li>
<li><p>\(c_j\) is the bias of the j-th hidden node</p>
</li>
<li><p>\(h_j\) is the j-th hidden node</p>
</li>
<li><p>\(W_{ij}\) is the weight of the connection between visible i-th unit and j-th hidden unit</p>
</li>
</ul>
<p>To each configuration (\(v, \space h\)) a probability is assigned according to the following function:</p>
<p>$$P(v, h) = \frac{e^{-E(v, h)}}{\sum_{v, h} e^{-E(v, h)}}$$</p><p>and the goal is to minimizing the difference between the distribution defined by the BM and the true data distribution, i.e. to maximize the likelihood of the training data under the model:</p>
<p>$$\mathcal{L} = \sum_{v} P_{\text{data}}(v) \log P_{\text{model}}(v)$$</p><h3 id="heading-training">Training</h3>
<p>In Boltzmann Machines, error adjustment cannot be achieved using a gradient descent process like in traditional neural networks, where weights are adjusted by backpropagating the error through the network. This is because BMs are undirected networks, meaning there is no distinction between input and output layers. As a result, BMs lack the concept of "backpropagation" since there is no directed flow of information to guide the adjustment of weights.</p>
<p>In fact, the algorithm typically used to train BMs is called Contrastive Divergence and is completely different in nature from backpropagation, since it approximates the gradient of the log-likelihood function by using a technique involving Gibbs sampling.</p>
<p>The algorithm is an iterative algorithm made by the following 2 steps:</p>
<ul>
<li><p>Perform Gibbs sampling to approximate the distribution of the hidden and visible units based on their conditional probabilities</p>
</li>
<li><p>Compute the gradient and update the weights.</p>
</li>
</ul>
<h4 id="heading-gibbs-sampling">Gibbs sampling</h4>
<p>Given a set of variables \(X=\{X_1, \dots X_n\}\) Gibbs sampling aims to sample from the joint distribution \(P(X)\) which sometimes can be challenging, Therefore Gibbs sampling uses the conditional distributions \(P(X_i|X_{/i})\) where \(X_{/i}\) denotes all the variables except \(X_{i}\).</p>
<p>Therefore using this technique</p>
<p>$$X_i^{(t+1)} \sim P(X_i \mid X_1^{(t)}, X_2^{(t)}, \ldots, X_{i-1}^{(t)}, X_{i+1}^{(t)}, \ldots, X_n^{(t)})$$</p><p>In the context of BMs this translates to sampling the hidden units as</p>
<p>$$P(h_j = 1 \mid v) = \sigma \left( \sum_{i} W_{ij} v_i + c_j \right)$$</p><p>and the visible units as</p>
<p>$$P(v_i = 1 \mid h) = \sigma \left( \sum_{j} W_{ij} h_j + b_i \right)$$</p><p>where \(\sigma(x) = \frac{1}{1 + \exp(-x)}\).</p>
<h4 id="heading-compute-the-gradient-and-update-the-weights">Compute the gradient and update the weights</h4>
<p>The gradient is then computed as:</p>
<p>$$\frac{\partial \mathcal{L}}{\partial W_{ij}} = \frac{\partial}{\partial W_{ij}} \left( \sum_{v} P_{\text{data}}(v) \log P_{\text{model}}(v) \right)$$</p><p>which, after some calculations results in:</p>
<p>$$\frac{\partial \mathcal{L}}{\partial W_{ij}} = \langle v_i h_j \rangle_{\text{data}} - \langle v_i h_j \rangle_{\text{model}}$$</p><p>where:</p>
<ul>
<li><p>\(\langle v_i h_j \rangle_{\text{model}} = \sum_{v, h} P_{\text{model}}(v, h) v_i h_j\)</p>
</li>
<li><p>\(\langle v_i h_j \rangle_{\text{data}} = \sum_{v} P_{\text{data}}(v) v_i h_j\)</p>
</li>
</ul>
<p>And the weights are updated iteratively as:</p>
<p>$$W_{ij}^{t+1} = W_{ij}^{t}+\epsilon \left( \langle v_i h_j \rangle_{\text{data}} - \langle v_i h_j \rangle_{\text{model}} \right)$$</p><p>where \(\epsilon\) is the learning rate.</p>
<h1 id="heading-quantum-boltzmann-machines">Quantum Boltzmann machines</h1>
<p>The concept of leveraging quantum mechanics for machine learning tasks has seen significant advancements over the past decade, with the Quantum Boltzmann Machine (QBMs) emerging as one of the results of these studies. In particular Mohammad Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko, <a target="_blank" href="https://arxiv.org/pdf/1601.02036">arxiv:1601.02036</a> (2016) has developed a quantum probabilistic model based on Boltzmann distribution of a quantum Hamiltonian, which exploits quantum effects both in the model and in the training process.</p>
<p>The problem QBMs is exactly the same as BMs: finding the biases and weights parameters that better approximate a sample distribution by maximizing the log-likelihood, as defined above.</p>
<p><a target="_blank" href="https://arxiv.org/pdf/1601.02036"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723062369067/cca73e6d-f9a1-4409-9e87-575e8eb5bab3.png" alt class="image--center mx-auto" /></a></p>
<h3 id="heading-quantum-architecture">Quantum architecture</h3>
<p>As for classical Boltzmann Machines, our treatment of QBMs starts from the energy function:</p>
<p>$$E(\xi) = -\sum_{i} \xi_i \beta_i - \sum_{i,j} W_{ij} \xi_i \xi_j$$</p><p>where:</p>
<ul>
<li>\(\xi_i\) represents a node (hidden or visible) which is a \(2^n \times 2^n\) matrix defined as (\(I\)is the identity matrix, \(\sigma_z\) is Pauli Z matrix, \(N\) is the number of nodes):</li>
</ul>
<p>$$\xi_i \equiv \overbrace{I \otimes \ldots \otimes I}^{i-1} \otimes \sigma_i^z \otimes \overbrace{I \otimes \ldots \otimes I}^{N-i}$$</p><ul>
<li>\(\beta\) represents the biases (for both hidden and visible nodes)</li>
</ul>
<p>Using the energy function we define the Boltzmann distribution as:</p>
<p>$$P(\xi) = \frac{e^{-E(\xi)}}{\sum e^{-E(\xi)}}$$</p><p>where the matrix exponentiation is defined trough Taylor expansion:</p>
<p>$$e^{- E(\xi)}=\sum_{k=0}^{\infty} \frac{1}{k!}\left( - E(\xi) \right) ^k$$</p><p>Let also the partition function \(Z\) be:</p>
<p>$$Z=Tr[e^{-E(\xi)}]$$</p><p>then the density matrix is:</p>
<p>$$\rho = Z^{-1}e^{-E(\xi)}$$</p><p>which represents the Boltzmann probability of the \(2^N\) elements. Therefore to get the marginal probability distribution over the visible variables \(\ket v\) we just need to trace over the hidden variables, i.e.:</p>
<p>$$P_v = Tr[(\ket v \bra v \otimes I_h)\rho]$$</p><p>which is the analogous of:</p>
<p>$$P(v, h) = \frac{e^{-E(v, h)}}{\sum_{v, h} e^{-E(v, h)}}$$</p><p>At this point, if we include in the Hamiltonian a new element representing a transverse field defined as:</p>
<p>$$\nu_i \equiv \overbrace{I \otimes \ldots \otimes I}^{i-1} \otimes \sigma_i^x \otimes \overbrace{I \otimes \ldots \otimes I}^{N-i}$$</p><p>where \(\sigma_x\) is the Pauli X matrix, the resulting Hamiltonian is:</p>
<p>$$E(\xi, \nu) = -\sum_{i} \xi_i \beta_i - \sum_{i,j} W_{ij} \xi_i \xi_j -\sum_i \Gamma_i\nu_i$$</p><p>where \(\Gamma_i\) is a parameter.</p>
<p>This new Hamiltonian is special since every eigenstate of \(  E(\xi, \nu)\) is a superposition of the classical states \(\ket{v, h}\). Hence using the following density matrix defined above with this new Hamiltonian, each measurement in the \(\sigma_z\) basis results in a classical output in \( \{1, -1\}\), and the probability of each output is given by \(P_v\).</p>
<h3 id="heading-training-1">Training</h3>
<p>As in the classical formulation, the scope of QMBs is to find the \(W\) and \(b\) (from now on I'm referencing these parameters as \(\theta\)) so that \(P_v\) (what before was called \(P_{\text{model}}\)) is close to \(P_\text{data}\).</p>
<p>Again this is achieved by minimizing the negative log-likelihood \(\mathcal{L}\):</p>
<p>$$\mathcal{L}=-\sum_{\mathbf{v}} P_{\text {data }} \log \frac{\operatorname{Tr}\left[(\ket v \bra v \otimes I_h) e^{-E(\xi, \nu)}\right]}{\operatorname{Tr}\left[e^{-E(\xi, \nu)}\right]}$$</p><p>whose gradient is:</p>
<p>$$\partial_\theta \mathcal{L}=\sum_{\mathbf{v}} P_{\text {data }}\left(\frac{\operatorname{Tr}\left[(\ket v \bra v \otimes I_h) \partial_\theta e^{-E(\xi, \nu)}\right]}{\operatorname{Tr}\left[(\ket v \bra v \otimes I_h) e^{-E(\xi, \nu)}\right]}-\frac{\operatorname{Tr}\left[\partial_\theta e^{-E(\xi, \nu)}\right]}{\operatorname{Tr}\left[e^{-E(\xi, \nu)}\right]}\right)$$</p><p>Ideally we would use some sampling techniques to estimate efficiently the gradient, however, since \(E(\xi, \nu)\) and \(\partial_\theta E(\xi, \nu)\) don't commute, we don't have a trivial solution.</p>
<p>In fact one can prove that:</p>
<p>$$\frac{\operatorname{Tr}\left[\partial_\theta e^{-E(\xi, \nu)}\right]}{\operatorname{Tr}\left[e^{-E(\xi, \nu)}\right]}= - \operatorname{Tr}[\rho {\partial_\theta}{E(\xi, \nu)}]$$</p><p>and that:</p>
<p>$$\frac{\operatorname{Tr}\left[(\ket v \bra v \otimes I_h) \partial_\theta e^{-E(\xi, \nu)}\right]}{\operatorname{Tr}\left[(\ket v \bra v \otimes I_h) e^{-E(\xi, \nu)}\right]}=-\int_0^1 d t \frac{\operatorname{Tr}\left[(\ket v \bra v \otimes I_h) e^{-t E(\xi, \nu)} \partial_\theta E(\xi, \nu)e^{-(1-t) E(\xi, \nu)}\right]}{\operatorname{Tr}\left[(\ket v \bra v \otimes I_h)e^{-E(\xi, \nu)}\right]}$$</p><p>and it can be shown that, while the first term can be estimated efficiently, the second one cannot be efficiently estimated using sampling, which makes the computational cost of training QBMs impractical. Is this the end of the story? Actually no, since the introduction of an upper bound on the \(\mathcal{L} \) (which is a common practice in machine learning) results in the so-called bound-based Quantum Boltzmann Machines (BQBMs), that provides a work around to the computational impracticability of QBMs, as discussed in next section.</p>
<h4 id="heading-bound-based-quantum-boltzmann-machines">Bound-based Quantum Boltzmann Machines</h4>
<p>A very famous inequality, called Golden-Thompson inequality states that for any Hermitian matrix \(A\) and \(B\) the following is true:</p>
<p>$$\operatorname{Tr}\left( e^Ae^B\right) \geq \operatorname{Tr}\left( e^{A+B}\right)$$</p><p>Therefore we know that:</p>
<p>$$P_v = \frac{\operatorname{Tr}[e^{\log{(\ket v \bra v \otimes I_h)}}e^{-E(\xi, \nu)}]}{\operatorname{Tr}[e^{-E(\xi, \nu)}]} \geq \frac{\operatorname{Tr}[e^{H_\xi}]}{\operatorname{Tr}[e^{-E(\xi, \nu)}]}$$</p><p>where \(H_\xi \equiv \log{(\ket v \bra v \otimes I_h)}-E(\xi, \nu)\) and is a peculiar Hamiltonian since it assigns an infinite energy penalty for any state s.t. the visible qubits register is different from \(\ket v\). Mathematically, this means that the probability of the system being in any state other than \(\ket v\) is zero because the Boltzmann factor approaches zero for infinite energy.</p>
<p>This means, in other words, that every qubits \(\xi_i\) is clamped to the corresponding classical value \(v_i\).</p>
<p>From the Golden-Thompson inequality we can also derive that:</p>
<p>$$\mathcal{L} \le \hat{\mathcal{L}} \equiv \sum P_\text{data} \log \frac{\operatorname{Tr}[e^{H_\xi}]}{\operatorname{Tr}[e^{-E(\xi, \nu)}]}$$</p><p>and we can now minimize \(\hat{\mathcal L}\), the upper bound of \({\mathcal L}\), using the gradient and we get the following rules to update the bias \(\beta_i\):</p>
<p>$$\beta_i^{t+1}= \beta_i^{t} + \epsilon \left(\sum P_\text{data} \frac{\operatorname{Tr}[e^{-H_\xi}]\sigma_i^z}{\operatorname{Tr}[e^{-H_\xi}]} - \operatorname{Tr}(\rho \sigma_i^z)\right)$$</p><p>and the weight \(w_{ij}\):</p>
<p>$$W_{ij}^{t+1}= W_{ij}^{t} + \epsilon \left(\sum P_\text{data} \frac{\operatorname{Tr}[e^{-H_\xi}]\sigma_i^z\sigma_j^z}{\operatorname{Tr}[e^{-H_\xi}]} - \operatorname{Tr}(\rho \sigma_i^z\sigma_j^z)\right)$$</p><p>One may also think about training the \(\Gamma\), however this results in vanishing \(\Gamma\)i.e. learning the transverse field is unfeasible in this upper bound setting.</p>
<h4 id="heading-semi-restricted-quantum-boltzmann-machines">Semi-restricted Quantum Boltzmann Machines</h4>
<p>Untill now we never posed any restrictions on the structure of the QBM, and in particular we assumed a fully connected architecture. Similarly to classical RBMs, semi-restricted Quantum Boltzmann Machines (srQBMs) are quantum neural networks similar to QBMs whose hidden layer has lateral connectivity.</p>
<p><a target="_blank" href="https://arxiv.org/pdf/1601.02036"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723062329107/d1f43f04-3380-4315-a1c5-832325d0cf8b.png" alt class="image--center mx-auto" /></a></p>
<p>Note that, unlike classical RBMs, we are assuming the lateral connection in the only in the hidden layer.</p>
<p>A similar architecture in fact allows us to apply contrastive divergence learning algorithms since the clamped Hamiltonian is then:</p>
<p>$$H_\xi = - \sum_i \left(\Gamma_i \sigma_i^x + (b_i+ \sum_j W_{ij} v_j)\sigma_i^z \right)$$</p><p>as the hidden qubits are uncoupled during the parameters learning phase.</p>
<p>Based on the Hamiltonian one can show that expectations can be computed efficiently as:</p>
<p>$$\frac{\operatorname{Tr}[e^{-H_\xi}]\sigma_i^z}{\operatorname{Tr}[e^{-H_\xi}]}= \frac{b_i+ \sum_j W_{ij} v_j}{\sqrt{\Gamma_i^2 + (b_i+ \sum_j W_{ij} v_j)^2}}$$</p><p>which is equivalent to the classical RBMs expression as \(\Gamma_i \rightarrow 0\).</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>While being still experimental, Quantum Boltzmann Machines represent an exciting advancement in the field of quantum machine learning and the journey towards fully realizing the potential of Quantum Boltzmann Machines is ongoing, requiring collaboration across disciplines and continuous innovation. While it is important to temper expectations with the recognition of current limitations, the foundational work being done is not just an academic exercise but a step towards unlocking new possibilities in science and industry.</p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here.</strong></a></p>
<hr />
<h2 id="heading-sources">Sources</h2>
<ul>
<li><p>Mohammad Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko, <a target="_blank" href="https://arxiv.org/pdf/1601.02036">arxiv:1601.02036</a> (2016)</p>
</li>
<li><p>G. E. Hinton, S. Osindero, Y-W. Teh, A fast learning algorithm for deep belief nets, Neural Comput. 18, 1527– 1554 (2006)</p>
</li>
<li><p>R. Salakhutdinov, G. E. Hinton, Deep Boltzmann machines, AISTATS 2009</p>
</li>
<li><p>Miguel Carreira-Perpinan, Geoffrey Hinton, On Contrastive Divergence Learning (2005)</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Quantum Finance: Option Pricing]]></title><description><![CDATA[Financial institutions deal daily with a spectrum of computationally intensive challenges. These include for example forecasting tasks, such as pricing and risk estimation, detecting anomalous transactions, analysing customer preferences and optimiza...]]></description><link>https://amm.zanotp.com/qf-option-pricing</link><guid isPermaLink="true">https://amm.zanotp.com/qf-option-pricing</guid><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sun, 28 Jul 2024 16:10:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/7fnNrzSly7Q/upload/d9571062de9260caa0f996bdd915ca4e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Financial institutions deal daily with a spectrum of computationally intensive challenges. These include for example forecasting tasks, such as pricing and risk estimation, detecting anomalous transactions, analysing customer preferences and optimization problems like portfolio selection, devising optimal trading strategies, and hedging. Even if the relentless advancement in mathematical finance and computational science, fueled by both the financial industry and the scientific community, has armed institutions with a sophisticated toolkit (stochastic modelling, advanced optimization algorithms, and machine learning models) to tackle these problems, the complexity and volume of financial data continue to push the limits of computing.</p>
<p>This has sparked a surge of interest in quantum computing within both academic and industrial circles and this computing paradigm, with its fundamentally different approach to processing information, is believed to offers the potential to revolutionize problem-solving in finance in both the long and near term.</p>
<p>This field of research is often referred to as Quantum Finance, and this blog post marks the first in a series dedicated to the topic. In particular, this article discusses the speedup quantum computers can provide to an extremely important financial task: option pricing.</p>
<h2 id="heading-options">Options</h2>
<p>Options are financial derivatives that give the buyer the right, but not the obligation, to buy (call option) or sell (put option) an underlying asset at a predetermined price (called the strike price) within a specified time frame, after paying a premium for this right. The concept of options dates back to ancient Greece, where contracts resembling options were used for olive harvests by Thales of Miletus. However, the modern options market was established in 1973 with the creation of the Chicago Board Options Exchange, which standardized option contracts and brought greater transparency and accessibility to options trading.</p>
<p>Options play a pivotal role in financial markets for several reasons. They allow investors to hedge against potential losses in other investments, provide opportunities for speculation on the future direction of asset prices (buying a call (put) option represent a long (short) strategy), and offer ways to enhance portfolio returns through strategies like covered calls or protective puts.</p>
<p>There are various types of options, each with unique characteristics. The two primary categories are European and American options. European options can only be exercised at the expiration date, making their pricing models simpler and often less expensive. In contrast, American options can be exercised at any point up to and including the expiration date, offering greater flexibility but also more complex pricing models due to the additional considerations of when the option might be exercised. Options can also be categorized as vanilla or exotic. Vanilla options refer to the standard call and put options with straightforward payoff structures. These are the most common types of options traded. In contrast, exotic options, or non-vanilla options, have more complex features and payoff structures. Examples include:</p>
<ul>
<li><p><strong>Barrier Options:</strong> Options that are activated or extinguished if the underlying asset reaches a certain price</p>
</li>
<li><p><strong>Asian Options:</strong> Options where the payoff depends on the average price of the underlying asset over a specific period, rather than the price at expiration</p>
</li>
<li><p><strong>Lookback Options:</strong> Options that allow the holder to "look back" over time to determine the optimal exercise price.</p>
</li>
</ul>
<p>Beyond this complexity, accurate pricing of options is crucial for maintaining fair trading practices, effective risk management, and overall market stability. Inaccurate pricing can in fact introduce significant risks, as it may lead to arbitrage opportunities where traders exploit price discrepancies for guaranteed profits, potentially causing market imbalances. Additionally, mispricing can result in inadequate hedging strategies, exposing investors and financial institutions to unforeseen losses and undermining confidence in the financial system. Thus, precision in option pricing is essential to mitigate these risks and ensure the smooth functioning of financial markets.</p>
<h2 id="heading-classical-strategies-for-option-pricing">Classical strategies for Option Pricing</h2>
<p>Option pricing fundamentally depends on projecting the future value of the underlying asset, as the option's worth is derived from the underlying asset's price movements. To accurately price an option, it's essential to understand how the underlying security might evolve over time. Three primary factors influence the pricing of an option: the current price of the underlying asset, the time value of the option, and the implied volatility of the underlying asset:</p>
<ul>
<li><p><strong>The price of the underlying asset</strong> is the most critical factor affecting the option's premium. The premium reflects the right to buy or sell the underlying asset, and a higher-priced asset demands a higher premium compared to a lower-priced asset. This ensures that investors have sufficient incentive to purchase options on assets with varying price levels</p>
</li>
<li><p><strong>The time value</strong> of an option pertains to the duration between the purchase date and the option's expiration date. The longer the time until expiration, the higher the chance that the underlying asset will reach the strike price, making longer-term options more expensive than shorter-term ones with the same strike price</p>
</li>
<li><p><strong>Implied volatility</strong> is another crucial component in option pricing. As the perceived volatility of the underlying asset increases, so does the option's price. This is because higher volatility means a greater potential for significant price swings, offering more opportunities for profit to the option holder but also posing higher risk to the seller. Consequently, the seller demands a higher premium to compensate for this increased risk.</p>
</li>
</ul>
<h3 id="heading-black-scholes-merton-model">Black-Scholes-Merton model</h3>
<p>One of the most famous approach to pricing involves taking advantage of the stochastic nature of financial markets and modelling their dynamics with a formula we can analytically solve. Given the appropriate assumptions, this approach results in the famous Black-Scholes-Merton model:</p>
<p>$$C = S_0 N(d_1) - K e^{-rT} N(d_2)$$</p><p>where:</p>
<ul>
<li><p>\(C\) is the price of the European vanilla call option</p>
</li>
<li><p>\(S_0\) is the price of the underlying asset</p>
</li>
<li><p>\(K\) is the strike price of the option</p>
</li>
<li><p>\(r\) is the risk-free interest rate</p>
</li>
<li><p>\(T\) is the time to maturity of the option</p>
</li>
<li><p>\(N\) is the cumulative distribution function of the standard normal distribution</p>
</li>
<li><p>\(d_1 = \frac{\ln\left(\frac{S_0}{K}\right) + \left(r + \frac{\sigma^2}{2}\right)T}{\sigma \sqrt{T}}\)</p>
</li>
<li><p>\(d_2 = d_1 - \sigma \sqrt{T}\)</p>
</li>
</ul>
<p>The model however has some limitations, including assuming an arbitrage-free market, constant \(r\) over time, a constant volatility of the underlying asset over time, stocks not paying dividends, market being frictionless (no taxes and no transaction costs) to name a few. Because of these drawbacks, there are alternative analytical approaches to option pricing that address some of the unrealistic assumptions.</p>
<h3 id="heading-monte-carlo-methods">Monte Carlo methods</h3>
<p>What I want to discuss a little further however are the Monte Carlo methods, as the quantum approach to option pricing discussed below is, in many ways, a quantum analogue of the Monte Carlo pricing method.</p>
<p>Basically, Monte Carlo methods involve simulating a large number of possible paths that the underlying asset price might take over the life of the option. These simulations use random sampling to model the stochastic processes that govern asset price movements and each simulated path is used to calculate the payoff of the option. The average of these payoffs is then discounted back to the present value to obtain the option price.</p>
<p>Therefore Monte Carlo method is made of 3 steps:</p>
<ul>
<li><p><strong>Model specification</strong>: define the dynamics of the stochastic process governing the underlying asset's price dynamics</p>
</li>
<li><p><strong>Path simulation</strong>: generate a large number of random price paths for the underlying asset using the specified stochastic process. The idea is that each path is a possible future trajectory of the asset's price</p>
</li>
<li><p><strong>Payoff computation and discounting</strong>: compute the payoff of the option for each simulated path and discount the payoff to the present value using the risk-free interest rate. Than compute the average of the discounted payoffs to estimate the option price.</p>
</li>
</ul>
<p>The strength of the Monte Carlo approach lies in its versatility as a generic methodology, making it especially valuable when the Black-Scholes-Merton model is inapplicable, such as in the pricing of path-dependent options.</p>
<p>It is also possible to derive the convergence rate of the method which, thanks to Central Limit Theorem, is \(O(\frac 1{\sqrt{N}})\) where \(N\) is the number of simulated paths. This means that to reduce the error by half, one must increase the number of simulations fourfold.</p>
<h2 id="heading-quantum-strategies-for-option-pricing">Quantum strategies for Option Pricing</h2>
<p>As next section will discuss, the quantum analogue to the Monte Carlo methodology offers a convergence rate is \(O(\frac 1n)\), providing a quadratic improvement over classical methods. While this speedup may seem modest, it is particularly significant for hedge funds and institutional investors who price large portfolios of options, often overnight. Even a slight improvement in convergence rates in fact can translate into substantial computational time savings, amounting to many hours. The quantum analogue to Monte Carlo methods is, similarly to the classical counterpart, made by three steps:</p>
<ul>
<li><p>representing the probability distribution of the random variable identifying the option's underlying and any other source of uncertainty</p>
</li>
<li><p>build a circuit that computes the payoff based on the random variable</p>
</li>
<li><p>compute the expected value of the payoff and discount the result (this can be done classically and I am omitting this part not being particularly complicated).</p>
</li>
</ul>
<h3 id="heading-encoding-the-probability-distribution-in-a-quantum-register">Encoding the probability distribution in a quantum register</h3>
<p>The fist step, loading the distribution of the random variable identifying the option's underlying and any other source of uncertainty (\(X\) from now on), requires a quantum circuit that given the \(\{S_i\}\) asset price and the corresponding probabilities \(\{p_i\} \) (assuming the state space is discretized into \(2^n\) states, where \(n\) is the number of qubits of the register) is able to create the following state:</p>
<p>$$\ket{\psi}n = \sum{i=0}^{2^n-1} \sqrt{p_i} \ket{S_i}$$</p><p>The efficiency of encoding a probability distribution into a quantum state depends on the nature of the distribution. It <a target="_blank" href="https://arxiv.org/abs/quant-ph/0208112">has been demonstrated</a> that log-concave probability distributions, such as the log-normal distribution assumed by the Black-Scholes-Merton model, can be efficiently encoded into a quantum state if the distribution is log-normal. To load states that do not share this property, it is possible to exploit the power of <a target="_blank" href="https://www.nature.com/articles/s41534-019-0223-2">quantum Generative Adversarial Networks</a> (qGAN), which are able to load a distribution in \(O(\text{poly}(n))\) gates rather than \(O(2^n)\) gates. While the details are beyond the scope of this article, it's worth noting that qGANs (quantum Generative Adversarial Networks) are hybrid quantum-classical algorithms. They consist of a classical neural network, known as the discriminator, and a variational quantum circuit, known as the quantum generator. The training process of a qGAN involves alternating optimization of the discriminator's parameters (\(\theta\)) and the generator's parameters (\(\gamma\)). After the training process the output of the process is:</p>
<p>$$\ket{\psi(\gamma)}n= \sum{i=0}^{2^n-1} \sqrt{p_i(\gamma)}\ket{i}_n$$</p><p>where \(p_i(\gamma)\) approximates the underlying distribution of the training data.</p>
<h3 id="heading-computing-the-payoff">Computing the payoff</h3>
<p>Once the distribution is loaded, we need to compute the payoff function \(f\). Being an option's payoff piecewise linear (depending on whether the option is exercised or not), we only need to consider function of the form:</p>
<p>$$f(i) = f_0 + f_1(i)$$</p><p>and by using controlled Y-rotations it is possible to efficiently create the following operator:</p>
<p>$$R: \ket{i}_n\ket{0} \rightarrow \ket{i}_n\otimes(cos[f(i)]\ket{0} + sin[f(i)] \ket{1})$$</p><p>which, applied on a register representing a previously encoded distribution, results in:</p>
<p>$$R_\ket{\psi}n= \sum_{i=0}^{2^n-1} \sqrt{(p_i)}\ket{i}_n \otimes (cos[\tilde f(i)]\ket{0} + sin[\tilde f(i)]\ket{1})$$</p><p>where \(\tilde f(i) = 2c \frac{f(i) -f_{min}}{f_{max}-f-{min}} - c + \frac \pi 4\), with \(c \in[0, 1]\) and \(f_{min}\) (\(f_{max}\)) is \(\text{min}_if(i) \) (\(\text{max}_if(i)\)).</p>
<p>Consequently, the probability of measuring \(\ket{1}\) in the second register is:</p>
<p>$$P_1 = \sum p_i \space sin^2(\tilde f(i))$$</p><p>which can be approximate (with a third grade truncation error) by</p>
<p>$$P_1 \approx \sum p_i (2c \frac{f(i) -f_{min}}{f_{max}-f-{min}}-c + \frac 12) = (2c \frac{E(f(i)) -f_{min}}{f_{max}-f-{min}}-c + \frac 12)$$</p><p>all the values are known but \(E(f(i))\). To compute the expected value of the payoff function, a quantum algorithm known as quantum amplitude estimation (QAE) is required.</p>
<h3 id="heading-quantum-amplitude-estimation">Quantum amplitude estimation</h3>
<p>Quantum amplitude estimation is the algorithm responsible for providing the quadratic speedup compared to classical Monte Carlo methods. Basically, assuming an operator \(A\) s.t.</p>
<p>$$A\ket{0}_{n+1}: \sqrt{1-p}\ket{\psi_0}_n \otimes \ket{0}+ \sqrt{p}\ket{\psi_1}_n \otimes \ket{1}$$</p><p>where \(p\) is unknown. QAE aims to estimate \(p\), i.e. the probability of measuring \(\ket{1}\) in the second register.</p>
<p>The idea is to build the operator \(Q\) s.t.</p>
<p>$$Q = AS_0A^\dagger S_{\psi_0}$$</p><p>where:</p>
<ul>
<li><p>\(S_0= 1- 2 \ket{0}\bra{0}\)</p>
</li>
<li><p>\(S_{\psi_0}= 1- 2 \ket{\psi_0}\ket{0}\bra{{\psi_0}}\bra{0}\), representing a rotation of \(2\theta_p\) grades w.r.t. to \(\text{span}(\ket {\psi_0}\ket{0})\)</p>
</li>
</ul>
<p>and it can be shown that \(sin^2(\theta_p) = p\).</p>
<p><a target="_blank" href="https://arxiv.org/pdf/1905.02666"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722176862815/4f462fd3-6175-4cce-8afd-5431f091fd3f.png" alt class="image--center mx-auto" /></a></p>
<p>Therefore, applying quantum phase estimation (shown in the above picture) on \(m\) sampling qubits, i.e. applying an m-fold Hadamard gate, using the \(m\) qubits to control different powers of \(Q\), applying the inverse Quantum Fourier Transform and measuring the \(m\) qubits' state results in an integer \(k\). Also, there are other formulation of QAE that are more suitable to the NISQ era.</p>
<p>Then \(p\) is estimated as:</p>
<p>$$\hat p = \frac{k\pi}{2^m}$$</p><p>and such estimation is s.t. with probability of at least \(\frac 8{\pi^2}\) the following hold</p>
<p>$$|{p - \hat p}| \leq \frac \pi {2^m} - (\frac {\pi}{2^m})^2$$</p><p>with a convergence rate of \(O(\frac 1n)\) (while the convergence rate of classical Monte Carlo methods is \(O(\frac 1{\sqrt{N}})\)).</p>
<p>Therefore, applying the operator \(A\) to the state prepared by the qGAN results in:</p>
<p>$$A \ket{\psi}n= \sum{i=0}^{2^n-1} \sqrt{(1 -f(S_i)} \sqrt{p_i}\ket{S_i}n\ket{0} + \sum{i=0}^{2^n-1} \sqrt{f(S_i)} \sqrt{p_i}\ket{S_i}_n\ket{1}$$</p><p>and being the probability of measuring \(\ket 1\) (what we called \(p\) before)</p>
<p>$$\sum p_i f(S_i) = E[f(S_i)]$$</p><p>it is possible to recover the undiscounted expected value of the option's payoff, which allows to compute</p>
<p>$$P_1 \approx (2c \frac{E(f(i)) -f_{min}}{f_{max}-f-{min}}-c + \frac 12)$$</p><h3 id="heading-pricing-vanilla-options">Pricing Vanilla Options</h3>
<p>Therefore, reviewing the entire process for a vanilla option, we have that the payoff function for a call option is \(f_c\) is:</p>
<p>$$f_c(S_T) = max(S_T - K, 0)$$</p><p>while for a put the payoff function \(f_p\) is:</p>
<p>$$f_p(S_T) = max(K-S_T, 0)$$</p><p>where:</p>
<ul>
<li><p>\(S_T\) is the price at expiration date</p>
</li>
<li><p>\(K\) is the strike price.</p>
</li>
</ul>
<p>We already discussed hot to represent the linear part of the function, while to implement the \(max(\cdot)\) operator, it is necessary to implement a comparison circuit \(C\) between \(S_T\) and \(K\) (implemented using an ancillary register, Toffoli gates and CNOTs) performing the following transformation</p>
<p>$$C\ket\psi_n\ket 0 = \ket \phi_n = \sum{i &lt; K} \sqrt{p_i}\ket i_n\otimes \ket 0 + \sum_{i \geq K} \sqrt{p_i}\ket i_n \otimes \ket 1$$</p><p>To represent the payoff function for use in QAE, another ancillary qubit is required and (for a call vanilla option), we set</p>
<p>$$\tilde f(i) = \begin{cases} g_0 \space \text{if} \space K&gt;i\\ g_0 + g(i)\space \text{if} \space K\leq i\end{cases}$$</p><p>where \(g(i)\) is a linear function and \(g_0\) is to be defined.</p>
<p>Doing so we are able to reconstruct the following state:</p>
<p>$$\begin{align} R\ket{\phi}n\ket{0} &amp;= \sum_{i &lt; K} \sqrt{p_i} \ket{i}n \otimes \ket{0} \otimes \left( \cos[g_0]\ket{0} + \sin[g_0]\ket{1} \right) \notag \\ &amp;\quad + \sum_{i \geq K} \sqrt{p_i} \ket{i}_n \otimes \ket{0} \otimes \left( \cos[g_0 + g(i)]\ket{0} + \sin[g_0 + g(i)]\ket{1} \right) \end{align}$$</p><p>Using QAE, the probability of measuring \(\ket 1\) in the last qubit is</p>
<p>$$\sum_{i &lt; K} p_i \space sin^2(g_0) + \sum_{i \geq K} p_i \space sin^2(g_0 + g(i))$$</p><p>At this point it is necessary to define both \(g_0\) and \(g(i)\). To ensure the following</p>
<p>$$\begin{cases}f(i) = i - K \\ \tilde f(i) = g_0+g(i) \end{cases}$$</p><p>we get \(g(i)=\frac{2c (i-K)}{2^n-1 - K}\) and \(g_0 = \frac \pi 4 - c\).</p>
<p>By substitution and approximating as above</p>
<p>$$\begin{align} P_1 &amp; \approx \sum_{i &lt; K} p_i (\frac 12 - c) + \sum_{i \geq K} p_i(\frac 12-c + \frac{2c (i-K)}{2^n-1 - K})\\ &amp;=\frac 12-c + \frac{2c}{2^n-1 - K} \sum_{i \geq K} p_i(i-K) \end{align}$$</p><p>which is exactly \(E[f(i)]\) up to a constant and a scaling factor.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>While quantum computers and quantum finance are not yet fully realized, and the discussed quantum approaches to option pricing are not fully yet applicable, especially for complex probability distributions, it is possible to simulate these procedures using various software development kits (SDKs). For instance, Qiskit Finance offers comprehensive resources, including <a target="_blank" href="https://qiskit-community.github.io/qiskit-finance/tutorials/03_european_call_option_pricing.html">resources on pricing various types of options</a>, which I highly recommend.</p>
<hr />
<p>And that's it for this article. Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here</strong>.</a></p>
<h2 id="heading-sources">Sources:</h2>
<ul>
<li><p><a target="_blank" href="https://www.journals.uchicago.edu/doi/abs/10.1086/260062">Fischer Blac</a><a target="_blank" href="http://amm.zanotp.com/contact">k an</a><a target="_blank" href="https://www.journals.uchicago.edu/doi/abs/10.1086/260062">d Myron Scholes,</a><a target="_blank" href="http://amm.zanotp.com/contact">“The</a><a target="_blank" href="https://www.journals.uchicago.edu/doi/abs/10.1086/260062">pricing of options and corporate liabilities”, Journal of Political Economy 81, 637–654 (1973)</a></p>
</li>
<li><p><a target="_blank" href="https://www.jstor.org/stable/3003143?casa_token=e1cWjOoCXGoAAAAA%3AMlrMg4q2YD-xsVAnojztdcdyzzRgp2TT22CjQo6-Q1AE3jd8QHmUV6qDUCGF9T91bMGW8ff4KaWw6w2XYVS4WvS5-RA2xBW9eOpWoJ5Qb61qUawDbtgMdw">Robert C. Merton, “Theory of rational option</a><a target="_blank" href="http://amm.zanotp.com/contact">pri</a><a target="_blank" href="https://www.jstor.org/stable/3003143?casa_token=e1cWjOoCXGoAAAAA%3AMlrMg4q2YD-xsVAnojztdcdyzzRgp2TT22CjQo6-Q1AE3jd8QHmUV6qDUCGF9T91bMGW8ff4KaWw6w2XYVS4WvS5-RA2xBW9eOpWoJ5Qb61qUawDbtgMdw">cing”, The Bell Journal of Economics and Management Science 4, 141–183 (1973)</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/abs/quant-ph/9908083">Daniel S Abrams and Colin P Williams, “Fast quantum algorithms for numerical integrals and stochastic processes”, (1999)</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/abs/quant-ph/0208112">Lov Grover and Terry Rudolph, “Creating superpositions that correspond to efficiently integrable probability distributions”</a></p>
</li>
<li><p><a target="_blank" href="https://www.nature.com/articles/s41534-019-0223-2">Christa Zoufal, Aurélien Lucchi, and Stefan Woerner, “Quantum generative adversarial networks for learning and loading random distributions”, npj Quantum Information 5, 1–9 (2019)</a></p>
</li>
<li><p><a target="_blank" href="https://www.ams.org/books/conm/305/">Gilles Brassard, Peter Hoyer, Michele Mosca, and Alain Tapp, “Quantum Amplitude Amplification and Estimation”, Contemporary Mathematics 305 (2002), 10.1090/conm/305/05215</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/abs/1905.02666">Nikitas Stamatopoulos, Daniel J. Egger, Yue Sun, Christa Zoufal, Raban Iten, Ning Shen and Stefan Woerner, "Option Pricing using Quantum Computers" (2019)</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Physics informed neural networks for solving Partial Differential Equations]]></title><description><![CDATA[Introduction
Despite the grandiose name, Physics Informed Neural Networks (PINNs from now on) are simply neural networks trained to solve supervised learning tasks while adhering to any provided law of physics described by general nonlinear partial d...]]></description><link>https://amm.zanotp.com/pinn</link><guid isPermaLink="true">https://amm.zanotp.com/pinn</guid><category><![CDATA[pde]]></category><category><![CDATA[neural networks]]></category><category><![CDATA[Physics Informed Neural Network]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sun, 05 May 2024 17:05:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714929260771/d10709e1-2ad7-4491-829c-225e81406872.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>Despite the grandiose name, Physics Informed Neural Networks (PINNs from now on) are simply neural networks trained to solve supervised learning tasks while adhering to any provided law of physics described by general nonlinear partial differential equations (PDEs from now on). The resulting neural network acts as a universal function approximator that inherently incorporates any underlying physical laws as prior knowledge, making them suitable for solving PDEs.</p>
<p>This not only signifies a stark departure from traditional numerical methods (such as finite difference, finite volume, finite elements, etc.) but also marks a shift in how we approach modeling and understanding physical systems. In fact, unlike conventional numerical techniques that depend on discretization and iterative solvers, PINNs offer a more comprehensive and data-centric approach and, by combining the capabilities of neural networks with the principles of physical laws, PINNs hold the potential to open up new pathways for exploration and innovation across various scientific fields.</p>
<p>In this blog post, my goal is to discuss all the essential components required to address PINNs for PDEs. Therefore, the post is composed of 4 key parts:</p>
<ul>
<li><p>Firstly, I introduce PDEs and explain the necessity of relying on numerical methods;</p>
</li>
<li><p>Secondly, I provide a brief overview of neural networks;</p>
</li>
<li><p>Next, I delve into discussing PINNs;</p>
</li>
<li><p>Finally, I demonstrate how to implement PINNs using PyTorch.</p>
</li>
</ul>
<h2 id="heading-about-partial-differential-equations-pdes">About Partial Differential Equations (PDEs)</h2>
<p>PDEs serve as fundamental tools in describing physical phenomena and natural processes across various scientific domains, from physics and engineering to biology and finance. Unlike ordinary differential equations (ODEs), which involve only one independent variable, PDEs incorporate multiple independent variables, such as space and time. For example</p>
<p>$$\frac{\delta f}{\delta t}+\alpha \frac{\delta f}{\delta x}=0$$</p><p>is known as the advection equation, where:</p>
<ul>
<li><p>\(f(t,x)\) is a function of two independent variables \(x\) (space) and \(t\) (time);</p>
</li>
<li><p>\(\alpha\) is a constant;</p>
</li>
<li><p>\(\frac{\delta f}{\delta t}\) represents the rate of change of \(f\) with respect to time;</p>
</li>
<li><p>\(\frac{\delta f}{\delta x}\)represents the rate of change of \(f\) with respect to space.</p>
</li>
</ul>
<p>Physically, this equation describes how a quantity \(f\) evolves over time \(t\) as it is transported by a flow with constant speed \(\alpha\) in the \(x\)-direction. In other words, it describes how \(f\) moves along the \(x\)-axis with time, where the rate of change in time is proportional to the rate of change in space multiplied by the constant \(\alpha\).</p>
<p>A closed form general solution to the advection function can be derived and corresponds to:</p>
<p>$$f(t,x)=g(x-at)$$</p><p>for any differentiable function \(g(\cdot)\).</p>
<p>Not every PDE however admits a closed-form solution due to several reasons:</p>
<ul>
<li><p>Complexity: many PDEs describe highly intricate physical phenomena with nonlinear behavior, making it difficult to find analytical solutions. Nonlinear PDEs, in particular, often lack closed-form solutions because of their intricate interdependence between variables;</p>
</li>
<li><p>Boundary conditions: solution to a PDE often depends not only on the equation itself but also on the boundary and initial conditions. If these conditions are complex or not well-defined, finding a closed-form solution becomes exceedingly challenging;</p>
</li>
<li><p>Non-standard formulation: some PDEs might be in non-standard forms that don't lend themselves easily to analytical techniques. For example, PDEs with non-constant coefficients or with terms involving higher-order derivatives may not have straightforward analytical solutions.</p>
</li>
<li><p>Inherent nature of the problem: certain systems are inherently chaotic or exhibit behaviors that resist simple mathematical description. For such systems, closed-form solutions may not exist, or if they do, they might be highly unstable or impractical.</p>
</li>
</ul>
<p>For all the reasons mentioned, scientists usually depend on numerical methods to estimate the solution to PDEs (like PINNs, finite elements, finite volume, finite difference, and spectral method). In this post, I am only covering PINNs, while another post about other methods is on its way.</p>
<h2 id="heading-about-neural-networks">About neural networks</h2>
<p>Once the problem that PINNs aim to solve is clear, we will now discuss some essential topics about neural networks. Since this is a very broad subject, I will only cover the most important aspects.</p>
<p>Neural networks are algorithms inspired by the workings of the human brain. In our brains, neurons process incoming data, such as visual information from our eyes, to recognize and understand our surroundings. Similarly, neural networks operate by receiving input data (input layer), processing it to identify patterns (hidden layer), and producing an output based on this analysis (output layer). Therefore, a neural network is typically represented as shown in the following picture:</p>
<p><img src="https://www.ibm.com/content/dam/connectedassets-adobe-cms/worldwide-content/cdp/cf/ul/g/3a/b8/ICLH_Diagram_Batch_01_03-DeepNeuralNetwork.png" alt="Source: engineersplanet.com" /></p>
<p>The basic unit of computation in a neural network is the neuron. It receives input from other nodes or an external source and calculates an output. Each input is linked with a weight (w, which the network learns), and every neuron has a unique input known as the bias (b), which is always set to 1. The output from the neuron is calculated as the weighted sum of the inputs, which is then sent to the activation function (introducing non-linearity into the output).</p>
<p><img src="https://www.gabormelli.com/RKB/images/thumb/3/31/artificial-neuron-model.png/600px-artificial-neuron-model.png" alt="Artificial Neuron - GM-RKB" /></p>
<p>That said, a neural network comprises multiple interconnected neurons. While various architectures are tailored for specific issues, we will now concentrate on basic neural networks, also referred to as Feedforward neural networks (FNN).</p>
<h3 id="heading-learning-in-neural-networks">Learning in neural networks</h3>
<p>What is learnable in a neural networks are the weights and the bias and the learning process is divided in two parts:</p>
<ul>
<li><p>feed forward propogation;</p>
</li>
<li><p>backward propogation.</p>
</li>
</ul>
<p>In fact, learning occurs by adjusting connection weights after processing each piece of data, depending on the error in the output compared to the expected result.</p>
<h3 id="heading-feedforward-propagation">Feedforward propagation</h3>
<p>Feedforward propagation is the foundational process in neural networks where input data is processed through the layers to produce an output. This process is crucial for making predictions or classifications based on the given input. In feedforward propagation:</p>
<ul>
<li><p>the flows in a unidirectional manner from the input layer through the hidden layers (if any) to the output layer;</p>
</li>
<li><p>each neuron in a layer receives inputs from all neurons in the previous layer. The inputs are combined with weights and a bias term, and the result is passed through an activation function to produce the neuron's output;</p>
</li>
<li><p>this process is repeated for each layer until the output layer is reached.</p>
</li>
</ul>
<h3 id="heading-backward-propagation">Backward propagation</h3>
<p>Backward propagation, also known as backpropagation, is the process by which a neural network learns from its mistakes and adjusts its parameters (weights and biases) to minimize the difference between its predictions and the true targets. In backward propagation:</p>
<ul>
<li><p>after the output is generated through feedforward propagation, the network's performance is evaluated using a loss function, which measures the difference between the predicted output and the true target values;</p>
</li>
<li><p>the gradient of the loss function with respect to each parameter (weight and bias) in the network is computed using the chain rule of calculus. This gradient indicates the direction and magnitude of the change needed to minimize the loss function;</p>
</li>
<li><p>the gradients are then used to update the parameters of the network in the opposite direction of the gradient, a process known as gradient descent. This update step involves adjusting the parameters by a small amount proportional to the gradient and a learning rate hyperparameter.</p>
</li>
</ul>
<h3 id="heading-neural-networks-as-universal-function-approximators">Neural networks as universal function approximators</h3>
<p>Pivotal to Physics-Informed Neural Networks (PINNs) is a crucial theoretical concept concerning neural networks: their capability as universal function approximators. This fundamental property implies that neural networks can effectively approximate any continuous function with remarkable precision, given a sufficient number of neurons and an appropriate network configuration. Considering that the goal of PINNs is to estimate the solution to a Partial Differential Equation (PDE), which essentially involves approximating a function, this characteristic holds immense significance for the success and efficacy of PINNs in their predictive tasks.</p>
<h3 id="heading-about-physics-informed-neural-networks-pinns">About physics informed neural networks (PINNs)</h3>
<p>Once the basics of deep learning are clear, we can delve deeper into understanding Physics Informed Neural Networks (PINNs).</p>
<p>Physics Informed Neural Networks (PINNs) serve as universal function approximators, with their neural network architecture representing solutions to specific Partial Differential Equations (PDEs). The core concept behind PINNs, as implied by their name, involves integrating prior knowledge about the system's dynamics into the cost function. This integration allows for penalizing any deviations from the governing PDEs by the network's solution.</p>
<p>Moreover, PINNs necessitate addressing the differences between the network's predictions and the actual data points within the cost function. This process is crucial for refining the network's accuracy and ensuring that it aligns closely with the observed data, thereby enhancing the model's predictive capabilities.</p>
<p>Therefore the loss function is:</p>
<p>$$\text{total loss} = \text{data loss + physics loss}$$</p><p>and (once a norm \(|\cdot|\) is chosen) becomes:</p>
<p>$$\text{total loss} = \frac1n\sum |y_i - \hat y(x_i|\theta)| + \frac \lambda m\sum |f(x_j, \hat y(x_j|\theta))|$$</p><p>where:</p>
<ul>
<li><p>\(\hat y(x| \theta) \) is our neural network;</p>
</li>
<li><p>\(x_i \text{ and } y_i\) are the data;</p>
</li>
<li><p>\(f(x, g)=0\) is the PDE;</p>
</li>
<li><p>\(\lambda\) is an hyperparameter.</p>
</li>
</ul>
<p>and is equivalent to means squared error (MSE) when the chosen norm is the squared L2 norm.</p>
<p>Once we have this, we can train our PINN as a regular neural network.</p>
<h3 id="heading-a-comparison-between-pinns-and-classical-methods">A comparison between PINNs and classical methods</h3>
<p>Compared to the traditional numerical simulation approaches, PINNs have the following desiderable properties:</p>
<ul>
<li><p>PINNs are mesh-free, i..e they can handle complex domains, with a potential computational advantage;</p>
</li>
<li><p>PINNs are well-suited for modeling complex and nonlinear systems, because of the universa approximation theorem.</p>
</li>
</ul>
<h2 id="heading-pytorch-implementation">PyTorch implementation</h2>
<p>This final section focuses on a PyTorch example to apply the theory in practice.</p>
<p>While this blog post mainly discussed PDEs, before approximating a PDE with a PINN, I aim to approximate an ODE using a PINN.</p>
<h3 id="heading-approximating-an-ode-using-a-pinn-the-logistic-equation">Approximating an ODE using a PINN: the logistic equation</h3>
<p>The logistic equation is a differential equation used to model population growth in situations where resources are limited. It is often represented as:</p>
<p>$$\frac{dP}{dt} = rP\left(1 - \frac{P}{K}\right)$$</p><p>where:</p>
<ul>
<li><p>\(P\) represents the population size at time \(t\);</p>
</li>
<li><p>\(r\) is the intrinsic growth rate of the population;</p>
</li>
<li><p>\(K\) is the carrying capacity of the environment, representing the maximum population size that the environment can sustain.</p>
</li>
</ul>
<p>The analytical solution to the logistic equation is given by the logistic function:</p>
<p>$$P(t) = \frac{K}{1 + \left(\frac{K-P_0}{P_0}e^{-rt}\right)}$$</p><p>First of all we need a neural network architecture:</p>
<pre><code class="lang-python"><span class="hljs-comment"># code from https://github.com/EdgarAMO/PINN-Burgers/blob/main/burgers_LBFGS.py</span>

<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> random <span class="hljs-keyword">import</span> uniform

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">PhysicsInformedNN</span>():</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, X_u, u, X_f</span>):</span>
        <span class="hljs-comment"># x &amp; t from boundary conditions:</span>
        self.x_u = torch.tensor(X_u[:, <span class="hljs-number">0</span>].reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>),
                                dtype=torch.float32,
                                requires_grad=<span class="hljs-literal">True</span>)
        self.t_u = torch.tensor(X_u[:, <span class="hljs-number">1</span>].reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>),
                                dtype=torch.float32,
                                requires_grad=<span class="hljs-literal">True</span>)

        <span class="hljs-comment"># x &amp; t from collocation points:</span>
        self.x_f = torch.tensor(X_f[:, <span class="hljs-number">0</span>].reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>),
                                dtype=torch.float32,
                                requires_grad=<span class="hljs-literal">True</span>)
        self.t_f = torch.tensor(X_f[:, <span class="hljs-number">1</span>].reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>),
                                dtype=torch.float32,
                                requires_grad=<span class="hljs-literal">True</span>)

        <span class="hljs-comment"># boundary solution:</span>
        self.u = torch.tensor(u, dtype=torch.float32)

        <span class="hljs-comment"># null vector to test against f:</span>
        self.null =  torch.zeros((self.x_f.shape[<span class="hljs-number">0</span>], <span class="hljs-number">1</span>))

        <span class="hljs-comment"># initialize net:</span>
        self.create_net()
        <span class="hljs-comment">#self.net.apply(self.init_weights)</span>

        <span class="hljs-comment"># this optimizer updates the weights and biases of the net:</span>
        self.optimizer = torch.optim.LBFGS(self.net.parameters(),
                                    lr=<span class="hljs-number">1</span>,
                                    max_iter=<span class="hljs-number">50000</span>,
                                    max_eval=<span class="hljs-number">50000</span>,
                                    history_size=<span class="hljs-number">50</span>,
                                    tolerance_grad=<span class="hljs-number">1e-05</span>,
                                    tolerance_change=<span class="hljs-number">0.5</span> * np.finfo(float).eps,
                                    line_search_fn=<span class="hljs-string">"strong_wolfe"</span>)

        <span class="hljs-comment"># typical MSE loss (this is a function):</span>
        self.loss = nn.MSELoss()

        <span class="hljs-comment"># loss :</span>
        self.ls = <span class="hljs-number">0</span>

        <span class="hljs-comment"># iteration number:</span>
        self.iter = <span class="hljs-number">0</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_net</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">""" net takes a batch of two inputs: (n, 2) --&gt; (n, 1) """</span>
        self.net = nn.Sequential(
            nn.Linear(<span class="hljs-number">2</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">20</span>), nn.Tanh(),
            nn.Linear(<span class="hljs-number">20</span>, <span class="hljs-number">1</span>))

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">init_weights</span>(<span class="hljs-params">self, m</span>):</span>
        <span class="hljs-keyword">if</span> type(m) == nn.Linear:
            torch.nn.init.xavier_normal_(m.weight, <span class="hljs-number">0.1</span>)
            m.bias.data.fill_(<span class="hljs-number">0.001</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">net_u</span>(<span class="hljs-params">self, x, t</span>):</span>
        u = self.net( torch.hstack((x, t)) )
        <span class="hljs-keyword">return</span> u

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">net_f</span>(<span class="hljs-params">self, x, t</span>):</span>
        u = self.net_u(x, t)

        u_t = torch.autograd.grad(
            u, t, 
            grad_outputs=torch.ones_like(u),
            retain_graph=<span class="hljs-literal">True</span>,
            create_graph=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]

        u_x = torch.autograd.grad(
            u, x, 
            grad_outputs=torch.ones_like(u),
            retain_graph=<span class="hljs-literal">True</span>,
            create_graph=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]

        u_xx = torch.autograd.grad(
            u_x, x, 
            grad_outputs=torch.ones_like(u_x),
            retain_graph=<span class="hljs-literal">True</span>,
            create_graph=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]

        f = u_t + (u * u_x) - (nu * u_xx)

        <span class="hljs-keyword">return</span> f

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">plot</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">""" plot the solution on new data """</span>

        <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
        <span class="hljs-keyword">from</span> mpl_toolkits.axes_grid1 <span class="hljs-keyword">import</span> make_axes_locatable

        x = torch.linspace(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">200</span>)
        t = torch.linspace( <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">100</span>)

        <span class="hljs-comment"># x &amp; t grids:</span>
        X, T = torch.meshgrid(x, t)

        <span class="hljs-comment"># x &amp; t columns:</span>
        xcol = X.reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
        tcol = T.reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)

        <span class="hljs-comment"># one large column:</span>
        usol = self.net_u(xcol, tcol)

        <span class="hljs-comment"># reshape solution:</span>
        U = usol.reshape(x.numel(), t.numel())

        <span class="hljs-comment"># transform to numpy:</span>
        xnp = x.numpy()
        tnp = t.numpy()
        Unp = U.detach().numpy()

        <span class="hljs-comment"># plot:</span>
        fig = plt.figure(figsize=(<span class="hljs-number">9</span>, <span class="hljs-number">4.5</span>))
        ax = fig.add_subplot(<span class="hljs-number">111</span>)

        h = ax.imshow(Unp,
                      interpolation=<span class="hljs-string">'nearest'</span>,
                      cmap=<span class="hljs-string">'rainbow'</span>, 
                      extent=[tnp.min(), tnp.max(), xnp.min(), xnp.max()], 
                      origin=<span class="hljs-string">'lower'</span>, aspect=<span class="hljs-string">'auto'</span>)
        divider = make_axes_locatable(ax)
        cax = divider.append_axes(<span class="hljs-string">"right"</span>, size=<span class="hljs-string">"5%"</span>, pad=<span class="hljs-number">0.10</span>)
        cbar = fig.colorbar(h, cax=cax)
        cbar.ax.tick_params(labelsize=<span class="hljs-number">10</span>)
        plt.show()

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">closure</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-comment"># reset gradients to zero:</span>
        self.optimizer.zero_grad()

        <span class="hljs-comment"># u &amp; f predictions:</span>
        u_prediction = self.net_u(self.x_u, self.t_u)
        f_prediction = self.net_f(self.x_f, self.t_f)

        <span class="hljs-comment"># losses:</span>
        u_loss = self.loss(u_prediction, self.u)
        f_loss = self.loss(f_prediction, self.null)
        self.ls = u_loss + f_loss

        <span class="hljs-comment"># derivative with respect to net's weights:</span>
        self.ls.backward()

        <span class="hljs-comment"># increase iteration count:</span>
        self.iter += <span class="hljs-number">1</span>

        <span class="hljs-comment"># print report:</span>
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self.iter % <span class="hljs-number">100</span>:
            print(<span class="hljs-string">'Epoch: {0:}, Loss: {1:6.3f}'</span>.format(self.iter, self.ls))

        <span class="hljs-keyword">return</span> self.ls

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">""" training loop """</span>
        self.net.train()
        self.optimizer.step(self.closure)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span> :

    nu = <span class="hljs-number">0.01</span> / np.pi         <span class="hljs-comment"># constant in the diff. equation</span>
    N_u = <span class="hljs-number">100</span>                 <span class="hljs-comment"># number of data points in the boundaries</span>
    N_f = <span class="hljs-number">10000</span>               <span class="hljs-comment"># number of collocation points</span>

    <span class="hljs-comment"># X_u_train: a set of pairs (x, t) located at:</span>
        <span class="hljs-comment"># x =  1, t = [0,  1]</span>
        <span class="hljs-comment"># x = -1, t = [0,  1]</span>
        <span class="hljs-comment"># t =  0, x = [-1, 1]</span>
    x_upper = np.ones((N_u//<span class="hljs-number">4</span>, <span class="hljs-number">1</span>), dtype=float)
    x_lower = np.ones((N_u//<span class="hljs-number">4</span>, <span class="hljs-number">1</span>), dtype=float) * (<span class="hljs-number">-1</span>)
    t_zero = np.zeros((N_u//<span class="hljs-number">2</span>, <span class="hljs-number">1</span>), dtype=float)

    t_upper = np.random.rand(N_u//<span class="hljs-number">4</span>, <span class="hljs-number">1</span>)
    t_lower = np.random.rand(N_u//<span class="hljs-number">4</span>, <span class="hljs-number">1</span>)
    x_zero = (<span class="hljs-number">-1</span>) + np.random.rand(N_u//<span class="hljs-number">2</span>, <span class="hljs-number">1</span>) * (<span class="hljs-number">1</span> - (<span class="hljs-number">-1</span>))

    <span class="hljs-comment"># stack uppers, lowers and zeros:</span>
    X_upper = np.hstack( (x_upper, t_upper) )
    X_lower = np.hstack( (x_lower, t_lower) )
    X_zero = np.hstack( (x_zero, t_zero) )

    <span class="hljs-comment"># each one of these three arrays haS 2 columns, </span>
    <span class="hljs-comment"># now we stack them vertically, the resulting array will also have 2 </span>
    <span class="hljs-comment"># columns and 100 rows:</span>
    X_u_train = np.vstack( (X_upper, X_lower, X_zero) )

    <span class="hljs-comment"># shuffle X_u_train:</span>
    index = np.arange(<span class="hljs-number">0</span>, N_u)
    np.random.shuffle(index)
    X_u_train = X_u_train[index, :]

    <span class="hljs-comment"># make X_f_train:</span>
    X_f_train = np.zeros((N_f, <span class="hljs-number">2</span>), dtype=float)
    <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> range(N_f):
        x = uniform(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)  <span class="hljs-comment"># x range</span>
        t = uniform( <span class="hljs-number">0</span>, <span class="hljs-number">1</span>)  <span class="hljs-comment"># t range</span>

        X_f_train[row, <span class="hljs-number">0</span>] = x 
        X_f_train[row, <span class="hljs-number">1</span>] = t

    <span class="hljs-comment"># add the boundary points to the collocation points:</span>
    X_f_train = np.vstack( (X_f_train, X_u_train) )

    <span class="hljs-comment"># make u_train</span>
    u_upper =  np.zeros((N_u//<span class="hljs-number">4</span>, <span class="hljs-number">1</span>), dtype=float)
    u_lower =  np.zeros((N_u//<span class="hljs-number">4</span>, <span class="hljs-number">1</span>), dtype=float) 
    u_zero = -np.sin(np.pi * x_zero)  

    <span class="hljs-comment"># stack them in the same order as X_u_train was stacked:</span>
    u_train = np.vstack( (u_upper, u_lower, u_zero) )

    <span class="hljs-comment"># match indices with X_u_train</span>
    u_train = u_train[index, :]

    <span class="hljs-comment"># pass data sets to the PINN:</span>
    pinn = PhysicsInformedNN(X_u_train, u_train, X_f_train)

    pinn.train()
</code></pre>
<p>Then we can build our loss function and define our ODE:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Callable
<span class="hljs-keyword">import</span> argparse

<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">from</span> torch <span class="hljs-keyword">import</span> nn
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> torchopt

<span class="hljs-keyword">from</span> pinn <span class="hljs-keyword">import</span> make_forward_fn, LinearNN


R = <span class="hljs-number">1.0</span>  <span class="hljs-comment"># rate of maximum population growth parameterizing the equation</span>
X_BOUNDARY = <span class="hljs-number">0.0</span>  <span class="hljs-comment"># boundary condition coordinate</span>
F_BOUNDARY = <span class="hljs-number">0.5</span>  <span class="hljs-comment"># boundary condition value</span>


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">make_loss_fn</span>(<span class="hljs-params">f: Callable, dfdx: Callable</span>) -&gt; Callable:</span>
    <span class="hljs-string">"""Make a function loss evaluation function

    The loss is computed as sum of the interior MSE loss (the differential equation residual)
    and the MSE of the loss at the boundary

    Args:
        f (Callable): The functional forward pass of the model used a universal function approximator. This
            is a function with signature (x, params) where `x` is the input data and `params` the model
            parameters
        dfdx (Callable): The functional gradient calculation of the universal function approximator. This
            is a function with signature (x, params) where `x` is the input data and `params` the model
            parameters

    Returns:
        Callable: The loss function with signature (params, x) where `x` is the input data and `params` the model
            parameters. Notice that a simple call to `dloss = functorch.grad(loss_fn)` would give the gradient
            of the loss with respect to the model parameters needed by the optimizers
    """</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">loss_fn</span>(<span class="hljs-params">params: torch.Tensor, x: torch.Tensor</span>):</span>

        <span class="hljs-comment"># interior loss</span>
        f_value = f(x, params)
        interior = dfdx(x, params) - R * f_value * (<span class="hljs-number">1</span> - f_value)

        <span class="hljs-comment"># boundary loss</span>
        x0 = X_BOUNDARY
        f0 = F_BOUNDARY
        x_boundary = torch.tensor([x0])
        f_boundary = torch.tensor([f0])
        boundary = f(x_boundary, params) - f_boundary

        loss = nn.MSELoss()
        loss_value = loss(interior, torch.zeros_like(interior)) + loss(
            boundary, torch.zeros_like(boundary)
        )

        <span class="hljs-keyword">return</span> loss_value

    <span class="hljs-keyword">return</span> loss_fn


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:

    <span class="hljs-comment"># make it reproducible</span>
    torch.manual_seed(<span class="hljs-number">42</span>)

    <span class="hljs-comment"># parse input from user</span>
    parser = argparse.ArgumentParser()

    parser.add_argument(<span class="hljs-string">"-n"</span>, <span class="hljs-string">"--num-hidden"</span>, type=int, default=<span class="hljs-number">5</span>)
    parser.add_argument(<span class="hljs-string">"-d"</span>, <span class="hljs-string">"--dim-hidden"</span>, type=int, default=<span class="hljs-number">5</span>)
    parser.add_argument(<span class="hljs-string">"-b"</span>, <span class="hljs-string">"--batch-size"</span>, type=int, default=<span class="hljs-number">30</span>)
    parser.add_argument(<span class="hljs-string">"-lr"</span>, <span class="hljs-string">"--learning-rate"</span>, type=float, default=<span class="hljs-number">1e-1</span>)
    parser.add_argument(<span class="hljs-string">"-e"</span>, <span class="hljs-string">"--num-epochs"</span>, type=int, default=<span class="hljs-number">100</span>)

    args = parser.parse_args()

    <span class="hljs-comment"># configuration</span>
    num_hidden = args.num_hidden
    dim_hidden = args.dim_hidden
    batch_size = args.batch_size
    num_iter = args.num_epochs
    tolerance = <span class="hljs-number">1e-8</span>
    learning_rate = args.learning_rate
    domain = (<span class="hljs-number">-5.0</span>, <span class="hljs-number">5.0</span>)

    <span class="hljs-comment"># function versions of model forward, gradient and loss</span>
    model = LinearNN(num_layers=num_hidden, num_neurons=dim_hidden, num_inputs=<span class="hljs-number">1</span>)
    funcs = make_forward_fn(model, derivative_order=<span class="hljs-number">1</span>)

    f = funcs[<span class="hljs-number">0</span>]
    dfdx = funcs[<span class="hljs-number">1</span>]
    loss_fn = make_loss_fn(f, dfdx)

    <span class="hljs-comment"># choose optimizer with functional API using functorch</span>
    optimizer = torchopt.FuncOptimizer(torchopt.adam(lr=learning_rate))

    <span class="hljs-comment"># initial parameters randomly initialized</span>
    params = tuple(model.parameters())

    <span class="hljs-comment"># train the model</span>
    loss_evolution = []
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(num_iter):

        <span class="hljs-comment"># sample points in the domain randomly for each epoch</span>
        x = torch.FloatTensor(batch_size).uniform_(domain[<span class="hljs-number">0</span>], domain[<span class="hljs-number">1</span>])

        <span class="hljs-comment"># compute the loss with the current parameters</span>
        loss = loss_fn(params, x)

        <span class="hljs-comment"># update the parameters with functional optimizer</span>
        params = optimizer.step(loss, params)

        print(<span class="hljs-string">f"Iteration <span class="hljs-subst">{i}</span> with loss <span class="hljs-subst">{float(loss)}</span>"</span>)
        loss_evolution.append(float(loss))

    <span class="hljs-comment"># plot solution on the given domain</span>
    x_eval = torch.linspace(domain[<span class="hljs-number">0</span>], domain[<span class="hljs-number">1</span>], steps=<span class="hljs-number">100</span>).reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
    f_eval = f(x_eval, params)
    analytical_sol_fn = <span class="hljs-keyword">lambda</span> x: <span class="hljs-number">1.0</span> / (<span class="hljs-number">1.0</span> + (<span class="hljs-number">1.0</span>/F_BOUNDARY - <span class="hljs-number">1.0</span>) * np.exp(-R * x))
    x_eval_np = x_eval.detach().numpy()
    x_sample_np = torch.FloatTensor(batch_size).uniform_(domain[<span class="hljs-number">0</span>], domain[<span class="hljs-number">1</span>]).detach().numpy()

    fig, ax = plt.subplots()

    ax.scatter(x_sample_np, analytical_sol_fn(x_sample_np), color=<span class="hljs-string">"red"</span>, label=<span class="hljs-string">"Sample training points"</span>)
    ax.plot(x_eval_np, f_eval.detach().numpy(), label=<span class="hljs-string">"PINN final solution"</span>)
    ax.plot(
        x_eval_np,
        analytical_sol_fn(x_eval_np),
        label=<span class="hljs-string">f"Analytic solution"</span>,
        color=<span class="hljs-string">"green"</span>,
        alpha=<span class="hljs-number">0.75</span>,
    )
    ax.set(title=<span class="hljs-string">"Logistic equation solved with PINNs"</span>, xlabel=<span class="hljs-string">"t"</span>, ylabel=<span class="hljs-string">"f(t)"</span>)
    ax.legend()

    fig, ax = plt.subplots()
    ax.semilogy(loss_evolution)
    ax.set(title=<span class="hljs-string">"Loss evolution"</span>, xlabel=<span class="hljs-string">"# epochs"</span>, ylabel=<span class="hljs-string">"Loss"</span>)
    ax.legend()

    plt.show()
</code></pre>
<p>And this is the result:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714926804988/e57e2fee-9ae1-401f-b70e-d59f3569cca5.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-approximating-a-pde-using-a-pinn-the-heat-equation">Approximating a PDE using a PINN: the heat equation</h3>
<p>The heat equation is a classical partial differential equation that describes the diffusion of heat (or equivalently, the distribution of temperature) in a given region over time. The one-dimensional form of the heat equation is given by:</p>
<p>$$\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}$$</p><p>Again, we implement the PINN:</p>
<pre><code class="lang-python"><span class="hljs-comment"># code from https://github.com/udemirezen/PINN-1/blob/main/solve_PDE_NN.ipynb</span>

<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">from</span> torch.autograd <span class="hljs-keyword">import</span> Variable
device = torch.device(<span class="hljs-string">"cuda:0"</span> <span class="hljs-keyword">if</span> torch.cuda.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">"cpu"</span>)
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># We consider Net as our solution u_theta(x,t)</span>

<span class="hljs-string">"""
When forming the network, we have to keep in mind the number of inputs and outputs
In ur case: #inputs = 2 (x,t)
and #outputs = 1

You can add ass many hidden layers as you want with as many neurons.
More complex the network, the more prepared it is to find complex solutions, but it also requires more data.

Let us create this network:
min 5 hidden layer with 5 neurons each.
"""</span>

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Net</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        super(Net, self).__init__()
        self.hidden_layer1 = nn.Linear(<span class="hljs-number">2</span>,<span class="hljs-number">5</span>)
        self.hidden_layer2 = nn.Linear(<span class="hljs-number">5</span>,<span class="hljs-number">5</span>)
        self.hidden_layer3 = nn.Linear(<span class="hljs-number">5</span>,<span class="hljs-number">5</span>)
        self.hidden_layer4 = nn.Linear(<span class="hljs-number">5</span>,<span class="hljs-number">5</span>)
        self.hidden_layer5 = nn.Linear(<span class="hljs-number">5</span>,<span class="hljs-number">5</span>)
        self.output_layer = nn.Linear(<span class="hljs-number">5</span>,<span class="hljs-number">1</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x,t</span>):</span>
        inputs = torch.cat([x,t],axis=<span class="hljs-number">1</span>) <span class="hljs-comment"># combined two arrays of 1 columns each to one array of 2 columns</span>
        layer1_out = torch.sigmoid(self.hidden_layer1(inputs))
        layer2_out = torch.sigmoid(self.hidden_layer2(layer1_out))
        layer3_out = torch.sigmoid(self.hidden_layer3(layer2_out))
        layer4_out = torch.sigmoid(self.hidden_layer4(layer3_out))
        layer5_out = torch.sigmoid(self.hidden_layer5(layer4_out))
        output = self.output_layer(layer5_out) <span class="hljs-comment">## For regression, no activation is used in output layer</span>
        <span class="hljs-keyword">return</span> output
<span class="hljs-comment">### (2) Model</span>
net = Net()
net = net.to(device)
mse_cost_function = torch.nn.MSELoss() <span class="hljs-comment"># Mean squared error</span>
optimizer = torch.optim.Adam(net.parameters())
<span class="hljs-comment">## PDE as loss function. Thus would use the network which we call as u_theta</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">f</span>(<span class="hljs-params">x,t, net</span>):</span>
    u = net(x,t) <span class="hljs-comment"># the dependent variable u is given by the network based on independent variables x,t</span>
    <span class="hljs-comment">## Based on our f = du/dx - 2du/dt - u, we need du/dx and du/dt</span>
    u_x = torch.autograd.grad(u.sum(), x, create_graph=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
    u_t = torch.autograd.grad(u.sum(), t, create_graph=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
    pde = u_x - <span class="hljs-number">2</span>*u_t - u
    <span class="hljs-keyword">return</span> pde
<span class="hljs-comment">## Data from Boundary Conditions</span>
<span class="hljs-comment"># u(x,0)=6e^(-3x)</span>
<span class="hljs-comment">## BC just gives us datapoints for training</span>

<span class="hljs-comment"># BC tells us that for any x in range[0,2] and time=0, the value of u is given by 6e^(-3x)</span>
<span class="hljs-comment"># Take say 500 random numbers of x</span>
x_bc = np.random.uniform(low=<span class="hljs-number">0.0</span>, high=<span class="hljs-number">2.0</span>, size=(<span class="hljs-number">500</span>,<span class="hljs-number">1</span>))
t_bc = np.zeros((<span class="hljs-number">500</span>,<span class="hljs-number">1</span>))
<span class="hljs-comment"># compute u based on BC</span>
u_bc = <span class="hljs-number">6</span>*np.exp(<span class="hljs-number">-3</span>*x_bc)
<span class="hljs-comment">### (3) Training / Fitting</span>
iterations = <span class="hljs-number">20000</span>
previous_validation_loss = <span class="hljs-number">99999999.0</span>
<span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(iterations):
    optimizer.zero_grad() <span class="hljs-comment"># to make the gradients zero</span>

    <span class="hljs-comment"># Loss based on boundary conditions</span>
    pt_x_bc = Variable(torch.from_numpy(x_bc).float(), requires_grad=<span class="hljs-literal">False</span>).to(device)
    pt_t_bc = Variable(torch.from_numpy(t_bc).float(), requires_grad=<span class="hljs-literal">False</span>).to(device)
    pt_u_bc = Variable(torch.from_numpy(u_bc).float(), requires_grad=<span class="hljs-literal">False</span>).to(device)

    net_bc_out = net(pt_x_bc, pt_t_bc) <span class="hljs-comment"># output of u(x,t)</span>
    mse_u = mse_cost_function(net_bc_out, pt_u_bc)

    <span class="hljs-comment"># Loss based on PDE</span>
    x_collocation = np.random.uniform(low=<span class="hljs-number">0.0</span>, high=<span class="hljs-number">2.0</span>, size=(<span class="hljs-number">500</span>,<span class="hljs-number">1</span>))
    t_collocation = np.random.uniform(low=<span class="hljs-number">0.0</span>, high=<span class="hljs-number">1.0</span>, size=(<span class="hljs-number">500</span>,<span class="hljs-number">1</span>))
    all_zeros = np.zeros((<span class="hljs-number">500</span>,<span class="hljs-number">1</span>))


    pt_x_collocation = Variable(torch.from_numpy(x_collocation).float(), requires_grad=<span class="hljs-literal">True</span>).to(device)
    pt_t_collocation = Variable(torch.from_numpy(t_collocation).float(), requires_grad=<span class="hljs-literal">True</span>).to(device)
    pt_all_zeros = Variable(torch.from_numpy(all_zeros).float(), requires_grad=<span class="hljs-literal">False</span>).to(device)

    f_out = f(pt_x_collocation, pt_t_collocation, net) <span class="hljs-comment"># output of f(x,t)</span>
    mse_f = mse_cost_function(f_out, pt_all_zeros)

    <span class="hljs-comment"># Combining the loss functions</span>
    loss = mse_u + mse_f


    loss.backward() <span class="hljs-comment"># This is for computing gradients using backward propagation</span>
    optimizer.step() <span class="hljs-comment"># This is equivalent to : theta_new = theta_old - alpha * derivative of J w.r.t theta</span>

    <span class="hljs-keyword">with</span> torch.autograd.no_grad():
        print(epoch,<span class="hljs-string">"Traning Loss:"</span>,loss.data)
</code></pre>
<p>and then plot the result:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> mpl_toolkits.mplot3d <span class="hljs-keyword">import</span> Axes3D
Axes3D = Axes3D
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">from</span> matplotlib <span class="hljs-keyword">import</span> cm
<span class="hljs-keyword">from</span> matplotlib.ticker <span class="hljs-keyword">import</span> LinearLocator, FormatStrFormatter
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

fig = plt.figure()
ax = fig.add_subplot(<span class="hljs-number">111</span>, projection=<span class="hljs-string">'3d'</span>)

x=np.arange(<span class="hljs-number">0</span>,<span class="hljs-number">2</span>,<span class="hljs-number">0.02</span>)
t=np.arange(<span class="hljs-number">0</span>,<span class="hljs-number">1</span>,<span class="hljs-number">0.02</span>)
ms_x, ms_t = np.meshgrid(x, t)
<span class="hljs-comment">## Just because meshgrid is used, we need to do the following adjustment</span>
x = np.ravel(ms_x).reshape(<span class="hljs-number">-1</span>,<span class="hljs-number">1</span>)
t = np.ravel(ms_t).reshape(<span class="hljs-number">-1</span>,<span class="hljs-number">1</span>)

pt_x = Variable(torch.from_numpy(x).float(), requires_grad=<span class="hljs-literal">True</span>).to(device)
pt_t = Variable(torch.from_numpy(t).float(), requires_grad=<span class="hljs-literal">True</span>).to(device)
pt_u = net(pt_x,pt_t)
u=pt_u.data.cpu().numpy()
ms_u = u.reshape(ms_x.shape)

surf = ax.plot_surface(ms_x,ms_t,ms_u, cmap=cm.coolwarm,linewidth=<span class="hljs-number">0</span>, antialiased=<span class="hljs-literal">False</span>)

ax.zaxis.set_major_locator(LinearLocator(<span class="hljs-number">10</span>))
ax.zaxis.set_major_formatter(FormatStrFormatter(<span class="hljs-string">'%.02f'</span>))

fig.colorbar(surf, shrink=<span class="hljs-number">0.5</span>, aspect=<span class="hljs-number">5</span>)

plt.show()
</code></pre>
<p>which returns the following:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714927875543/aced65ad-27c1-4b67-b561-d722a81bb408.png" alt class="image--center mx-auto" /></p>
<hr />
<p>And that's it for this article. Thanks for reading. If you have any suggestions for improvement or any further insights to share, please don't hesitate to reach out and leave a comment below. Your feedback is invaluable and greatly appreciated.</p>
<h2 id="heading-reference">Reference</h2>
<ul>
<li><p><a target="_blank" href="https://link.springer.com/content/pdf/10.1007/s10915-022-01939-z.pdf">https://link.springer.com/content/pdf/10.1007/s10915-022-01939-z.pdf</a></p>
</li>
<li><p><a target="_blank" href="https://acnpsearch.unibo.it/OpenURL?id=tisearch%3Ati-ex&amp;sid=google&amp;rft.auinit=S&amp;rft.aulast=Cuomo&amp;rft.atitle=Scientific+machine+learning+through+physics%E2%80%93informed+neural+networks%3A+Where+we+are+and+what%E2%80%99s+next&amp;rft.title=Journal+of+scientific+computing+%28Dordrecht.+Online%29&amp;rft.volume=92&amp;rft.issue=3&amp;rft.date=2022&amp;rft.spage=88&amp;rft.issn=1573-7691">https://acnpsearch.unibo.it/OpenURL?id=tisearch%3Ati-ex&amp;sid=google&amp;rft.auinit=S&amp;rft.aulast=Cuomo&amp;rft.atitle=Scientific+machine+learning+through+physics%E2%80%93informed+neural+networks%3A+Where+we+are+and+what%E2%80%99s+next&amp;rft.title=Journal+of+scientific+computing+%28Dordrecht.+Online%29&amp;rft.volume=92&amp;rft.issue=3&amp;rft.date=2022&amp;rft.spage=88&amp;rft.issn=1573-7691</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Quantum Blockchain]]></title><description><![CDATA[In the landscape of technological innovation, two disruptive forces stand out: quantum computing and blockchain. While each has made significant strides independently, their convergence holds the promise of revolutionizing cryptography and reshaping ...]]></description><link>https://amm.zanotp.com/quantum-blockchain</link><guid isPermaLink="true">https://amm.zanotp.com/quantum-blockchain</guid><category><![CDATA[quantum-blockchain]]></category><category><![CDATA[quantum-money]]></category><category><![CDATA[quantum computing]]></category><category><![CDATA[Blockchain]]></category><category><![CDATA[Blockchain technology]]></category><category><![CDATA[Quantum]]></category><category><![CDATA[blockchain security]]></category><category><![CDATA[Quantum Cryptography]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sun, 10 Mar 2024 21:23:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/qDG7XKJLKbs/upload/bc9d0dfed6abfa3457e9e876e362fb87.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the landscape of technological innovation, two disruptive forces stand out: quantum computing and blockchain. While each has made significant strides independently, their convergence holds the promise of revolutionizing cryptography and reshaping the foundations of digital trust. At the heart of this synergy lies the concept of quantum blockchain, a novel blockchain model infused with quantum cryptographic principles.</p>
<p>Blockchain technology, epitomized by cryptocurrencies like Bitcoin and Ethereum, has redefined trust in digital transactions. Its decentralized ledger system offers immutable records, resistant to tampering and censorship, transforming industries beyond finance. Meanwhile, quantum computing, leveraging quantum mechanics, offers exponential computational power, poised to tackle problems deemed infeasible by classical computers.</p>
<p>While the two technologies seems unrelated, a profound connection exists between quantum computing and blockchain and this blog post intoduces you to quantum blockchain after a brief digression on fundamental concepts of quantum computing and quantum cryptography.</p>
<h2 id="heading-quantum-computing">Quantum computing</h2>
<p>As you may know all the information a computer store and process are just interminable strings of 0s and 1s, the so-called bits. Quantum computing is a completely different computational paradigm, relying on quantum bits (also called qubits), which can exist in a superposition of states, enabling them to represent both 0 and 1 simultaneously.</p>
<p>While this may seems a logical contraddiction, according to postulates of quantum mechanics, the state of a system is described as a linear combination of all possible states until measured, when the state collapse and is deterministically defined. This superposition property allows qubits to hold exponentially more information than classical bits. Furthermore, qubits can exhibit another peculiar quantum behavior called entanglement, where the state of one qubit becomes correlated with the state of another qubit. The third ingredient that makes a quantum computer faster than a classical one is quantum interference. Quantum interference occurs when the probability amplitudes of different quantum states interfere constructively or destructively, resulting in the amplification or reduction of certain outcomes. In quantum computing, this interference allows for the manipulation and processing of information in a highly efficient manner.</p>
<p>These phenomena enables quantum computers to outperform classical computers in solving certain types of problems, particularly those that require extensive exploration of solution spaces. Thus, superposition, entanglement, and quantum interference collectively contribute to the computational power and speed of quantum computers, offering the potential for revolutionary advancements in various fields of science and technology.</p>
<h2 id="heading-quantum-cryptography">Quantum cryptography</h2>
<p>One of the most fascinating and more important application of quantum technologies lies in quantum cryptography, a subset of quantum information science, that aims to utilize the principles of quantum mechanics to secure communication channels in a fundamentally different way than classical cryptographic methods.</p>
<p>In fact traditional cryptographic techniques rely on mathematical complexity, such as factorization or discrete logarithm problems, for securing data transmission. The idea here is to consider a problem that may take ages to a classical computer to solve due to its complexity and use this difficulty as the basis for encryption. Obviously having more and more powerful computers poses a threat to the security of these classical encryption methods. Moreover quantum computers will be able to efficiently solve these mathematical problems using algorithms like Shor's algorithm, rendering traditional encryption schemes obsolete.</p>
<p>Quantum cryptography, on the other hand, offers a solution that is fundamentally secure, regardless of the computational power of the adversary. By exploiting the properties of quantum mechanics, such as the superposition and entanglement of quantum states, quantum cryptography provides a means for two parties to communicate with absolute secrecy. Quantum Key Distribution (QKD), one of the most prominent applications of quantum cryptography, allows two parties to share a secret cryptographic key with the assurance that any attempt to intercept the key will be detected. This is achieved through the use of quantum states to encode the key information, making it impossible for an eavesdropper to gain knowledge of the key without disturbing the quantum states and revealing their presence.</p>
<p><img src="https://www.drishtiias.com/images/uploads/1645696513_Quantum_Key_Distribution_Work_Drishti_IAS_English.png" alt="Quantum Key Distribution Technology | 24 Feb 2022" /></p>
<p>As such, quantum cryptography offers a level of security that is unparalleled by classical cryptographic methods, making it an essential tool for ensuring the confidentiality and integrity of sensitive information in the digital age.</p>
<h2 id="heading-quantum-money">Quantum money</h2>
<p>Before diving into quantum blockchain here I want to discuss another intriguing application of quantum technologies, somehow related to the blockchain and the cryptocurrencies: quantum money.</p>
<p>The concept of quantum money traces back to the early days of quantum information theory and cryptography, with theoretical proposals emerging in the 1970s and gaining momentum in subsequent decades. One of the pioneering works in this field was proposed by physicist Stephen Wiesner which was published in a <a target="_blank" href="http://users.cms.caltech.edu/~vidick/teaching/120_qcrypto/wiesner.pdf">scientific journal</a> in 1983.</p>
<p>Wiesner's idea involved using quantum states to encode information on banknotes, making them effectively unforgeable due to the inherent properties of quantum mechanics. Specifically, Wiesner proposed a scheme where each banknote would contain a unique quantum state, which could not be precisely duplicated or measured without disturbing its state. This would make counterfeiting quantum money practically impossible, as any attempt to copy or measure the quantum state would inevitably alter it, thus revealing the counterfeit attempt.</p>
<p><img src="https://www.nist.gov/sites/default/files/styles/480_x_480_limit/public/images/public_affairs/colloquia/011711_lr.jpg?itok=LWulsDZE" alt="photo by NIST" class="image--center mx-auto" /></p>
<p>Despite the theoretical appeal of Wiesner's proposal, the practical implementation of quantum money remains a significant challenge. Generating and manipulating quantum states with the precision and reliability required for quantum money presents formidable technical hurdles. Additionally, quantum systems are inherently fragile and susceptible to environmental noise, which could compromise the security of quantum money schemes.</p>
<p>Therefore, similarly to cryptocurrencies, quantum money seeks to provide an unforgeable form of currency by exploiting the fundamental principles of quantum mechanics.</p>
<h2 id="heading-quantum-blockchain">Quantum blockchain</h2>
<p>As you may already know, a blockchain functions as an immutable ledger where data is stored in the form of transactions, interconnected through a Merkle tree, and organized into blocks linked by hash functions. This network operates in a decentralized manner, with each node retaining a copy of the growing chain of blocks. Consensus protocols determine the addition of new blocks and establish agreement on the block sequence. Typically, the blockchain process begins with users broadcasting transactions, which are then verified and organized into a new block according to specific consensus rules, such as proof-of-work or proof-of-stake. Participants, often referred to as "miners" in systems like Bitcoin, compete to create the next block, with the successful miner being rewarded. The longest chain of blocks is considered definitive, providing a basis for consensus.</p>
<p>One of the main features of blockchain is that if any block within the chain is altered, it invalidates all subsequent blocks. Consequently, nodes in the blockchain network reject the tampered version and continue to work on the version supported by the majority.</p>
<p>Moreover access control in blockchains relies on public-key cryptography, where users safeguard private keys as passwords and use public keys as account identifiers. Transactions are authenticated using signatures generated with private keys, which are verified by network nodes against the corresponding public keys. Once you are familiar with the above information, you are ready to explore quantum blockchains.</p>
<p>Quantum blockchain typically refers to a variety of protocols, including classical blockchains with quantum-resistant cryptography, hybrid blockchains leveraging Quantum Key Distribution networks (just hybrid blockchains from now on), and fully quantum blockchains operating in the realm of quantum computing.</p>
<p>Hybrid blockchains aim to tackle the fact that public-key cryptography is not quantum resistant (we already mentioned Shor’s algorithm), therefore substitutig publik-key cryptography with the already mentioned Quantum Key Distribution.</p>
<p>Quantum blockchains on the other hand are more variegate and substitute some core block of a classical blockchain with a quantum counterpart. For example, <a target="_blank" href="https://arxiv.org/pdf/1804.05979.pdf">Rajan, D., &amp; Visser, M. (2019)</a>, whose quantum blockchain is regarded as a pioneering theoretical work, replaces the functionality of time-stamped blocks and hash functions linking them with a temporal entangled state, which offers a fairly interesting advantage: since the sensitivity towards tampering is significantly amplified, meaning that the full local copy of the blockchain is destroyed if one tampers with a single block (due to entanglement) while on a classical blockchain only the blocks following the compromised block are changed , which leaves it open to vulnerabilities. However let's dive a little more in the formulation of both the blockchain and the network as proposed in <a target="_blank" href="https://arxiv.org/pdf/1804.05979.pdf">Rajan, D., &amp; Visser, M. (2019)</a>.</p>
<h3 id="heading-blockchain">Blockchain</h3>
<p>This subsection explores the implementation of a quantum version of a block and a blockchain, utilizing temporally entangled states (a concept in quantum mechanics where the quantum states of multiple particles become correlated over time, rather than in space).</p>
<p>Entanglement, essentially the inseparability of distinct states, forms the basis for capturing the chain-like structure. Consequently, the blockchain can be viewed as an entangled quantum state, with a block's timestamp emerging from the immediate absorption of the first qubit of a block.</p>
<p>Constructing the blockchain from a series of entangled states involves amalgamating the blocks into a specialized entangled state known as a Greenberger–Horne–Zeilinger (GHZ) state.</p>
<h3 id="heading-network">Network</h3>
<p>After establishing the blockchain, additional components are necessary for a functional blockchain system, notably a protocol for disseminating the blockchain's state to all network nodes. Since the blockchain's state is quantum in nature, a quantum channel must replace the classical one, with digital signatures implemented through Quantum Key Distribution (QKD) protocols.</p>
<p>Similar to classical blockchain systems, each node in a quantum blockchain setup must possess a copy of the blockchain, and new blocks must undergo verification before integration into each node's blockchain.</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>Quantum blockchain is still an area of research and is an author’s opinion that, given the rise of classical blockchains and the realistic development of a global quantum network, quantum blockchain can potentially open the door to a new research frontier in quantum information science as well as new business possibilities.</p>
<p>Thanks for reading. This article does not want to be exhaustive on the topic and is no more than an introduction to quantum blockchain. To go further there are resources online and the source section below is a good starting point.</p>
<p>Sources:</p>
<ul>
<li><p><a target="_blank" href="http://users.cms.caltech.edu/~vidick/teaching/120_qcrypto/wiesner.pdf">Weisner, S. (1983) Conjugate Coding. ACM SIGACT News, 15, 78-88</a></p>
</li>
<li><p><a target="_blank" href="https://arxiv.org/pdf/1804.05979.pdf">Rajan, D., &amp; Visser, M. (2019)</a></p>
</li>
<li><p><a target="_blank" href="https://www.nature.com/articles/s41534-018-0086-y">Ringbauer, M.; Costa, F.; Goggin, M.E.; White, A.G.; Fedrizzi, A. Multi-time quantum correlations with no spatial analog’. NPJ Quantum Inf. 2018, 4 , 37.</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Blockchain and randomness]]></title><description><![CDATA[Getting random numbers on the blockchain used to be a headache for those who wanted to use truly random numbers in a dapp or protocol, and the lotteries that used these pseudo-random numbers were easily hacked by fast and malicious agents. However, t...]]></description><link>https://amm.zanotp.com/blockchain-and-randomness</link><guid isPermaLink="true">https://amm.zanotp.com/blockchain-and-randomness</guid><category><![CDATA[Blockchain]]></category><category><![CDATA[Blockchain technology]]></category><category><![CDATA[blockchain security]]></category><category><![CDATA[randomness]]></category><category><![CDATA[Blockchain development]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Wed, 23 Aug 2023 12:17:29 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/T9rKvI3N0NM/upload/28c0ced23ce653a91b9b9bde743215c0.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Getting random numbers on the blockchain used to be a headache for those who wanted to use truly random numbers in a dapp or protocol, and the lotteries that used these pseudo-random numbers were easily hacked by fast and malicious agents. However, the idea of a blockchain whose dapps can function without random numbers was (and is) out of the question, and obtaining random numbers has become easy and relatively secure.</p>
<p>The reason for this difficulty is that the calculations must be deterministic in order to be replayed in a decentralized manner, and any data that can serve as random sources is also available to an attacker.</p>
<p>In this article, I'll review the solutions that blockchain engineers have developed in the past to address this problem and their weaknesses and conclude with the simplest and most commonly used method currently available.</p>
<h2 id="heading-pseudo-randomness-from-unknowable-at-the-time-of-transacting-information">Pseudo-randomness from unknowable at the time of transacting information</h2>
<p>One of the first sources of entropy that blockchain engineers used was the block timestamp, a global variable that represents the timestamp of the current block in which the contract is executed. This timestamp is a Unix timestamp that indicates the number of seconds that have elapsed since January 1, 1970 (UTC) and provides information about when the block was mined.</p>
<p>The problem with block timestamps is that miners have the ability to influence them as long as the timestamp doesn't precede that of the parent block. Although timestamps are usually quite accurate, there is a potential problem if a miner benefits from inaccurate timestamps. In such cases, the miner could use his mining power to create blocks with incorrect timestamps and thus manipulate the results of the random function to his advantage.</p>
<p>For example, imagine a lottery in which a random bidder is selected from a set of bidders by a function that uses the timestamp of a block as the source of the randomness:  a miner may enter the lottery and then modify the timestamp value to increase his chances of winning.</p>
<p>While these attacks may sound anachronistic, they are not beyond the realm of possibility. In fact, Feathercoin was the victim of a time-warp attack in 2013. In it, a group of miners exploited a vulnerability in Feathercoin's mining algorithm that allowed them to manipulate the timestamps of blocks, resulting in the rapid creation of new blocks. The attack undeniably caused significant damage to Feathercoin's value and reputation.</p>
<p>Still, one might think that using the block hash as a source of entropy or other block information that is generally unknown at the time of the transaction is a good idea. However, similar implementations have a major problem: they rely on publicly available information, which means that malicious actors can increase the probability of winning the lottery with an attack similar to the time-warp attack. This is because these quantities can be read and manipulated by any other transaction within the same mining block if the attacker is also a miner.</p>
<p>Even using a sophisticated combination of all information unknown at the time of the transaction is not a good idea: it makes the attack much more difficult, but does not make the protocol as secure as other methods do.</p>
<h2 id="heading-randomness-from-off-chain-data-oracles-and-apis">Randomness from off-chain data: oracles and APIs</h2>
<p>I hope you have been convinced that using on-chain information is not a good practice when security is a crucial feature. What can we do to get an unpredictable random number for our lottery?</p>
<p>We can turn our attention to off-chain data, i.e. use the data that an API or oracle provides. For example, if we have an API that provides the temperature in a particular city, we can use it to calculate the remainder when dividing the number of tipsters and use the result as a random number. The temperature in a particular city changes frequently, and if the API's answer is updated frequently, the likelihood of a malicious agent guessing the number is very low.</p>
<p>Although this is a better solution than using on-chain data, it is not the best available because we centralize our random source and the smart contract is useless if the API is corrupted.</p>
<p>Moreover, no one would trust the lottery contract, since it can be assumed that the API is programmed to always return the same set of values and the protocol is no longer trustless.</p>
<p>Despite these drawbacks, oracles and APIs have been widely used to obtain data outside the chain, and are sometimes still used. It's worth noting that combining the results of different APIs and oracles can result in almost unpredictable output, which can be a good deal for small dapps or protocols that don't rely entirely on randomness. The reputation of the data provider is also important in this case.</p>
<p>The most important attack on APIs and oracles is so-called oracle manipulation, in which vulnerabilities in a blockchain oracle are exploited to make it report inaccurate information about events outside the chain. This attack is often part of a broader attack on a protocol, as malicious actors can cause a protocol’s smart contracts to execute based on false input or in a way that is advantageous to them.</p>
<h2 id="heading-verifiable-random-functions-vrfs">Verifiable random functions (VRFs)</h2>
<p>Steering clear of intricate mathematics,Verifiable Random Functions (VRFs ) can be described as public key pseudorandom functions. Put simply, these functions produce outputs that appear pseudorandom based on a given seed and mimic the behavior of true random outputs (if you want to dig deeper, read <a target="_blank" href="https://amm.zanotp.com/an-introduction-to-prngs-with-python-and-r">this</a> article). The real power of VRFs is their ability to prove the correctness of their output calculations. The possessor of the secret key is the only one able to compute the output of the function (i.e., the random output) along with a corresponding proof for any input value. Conversely, anyone else who has the proof and the corresponding public key can verify the exact computation of this output. However, this information is not sufficient to derive the secret key.</p>
<p>One of the most commonly used VRFs is the Chainlink VRF, which relies on a decentralized oracle network (i.e., a set of oracles that receive data from multiple reliable sources) to enhance existing blockchains by providing verified off-chain data.</p>
<p>Chainlink VRF enables the generation of random numbers within smart contracts, enabling blockchain developers to create improved user experiences by incorporating unpredictable outcomes into their blockchain-powered applications. In addition, Chainlink VRF is immune to tampering, whether done by node operators, users, or malicious entities.</p>
<h3 id="heading-tohttpentitiesto-go-further"><a target="_blank" href="http://entities.To">To</a> go further</h3>
<p>To be an outstanding blockchain developer it's not necessary to know everything about VRFs, however for the curious ones I suggest <a target="_blank" href="https://dash.harvard.edu/bitstream/handle/1/5028196/Vadhan_VerifRandomFunction.pdf">Micali, Rabin, Vadhan (1999)</a> and the <a target="_blank" href="https://docs.chain.link/vrf/v2/introduction">Chainlink VRF docs</a>.</p>
<hr />
<p>And that's it for this article.</p>
<p>Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here</strong></a>.</p>
]]></content:encoded></item><item><title><![CDATA[Testing smart contracts: unit tests and invariant tests]]></title><description><![CDATA[Testing plays a vital role in ensuring the security, functionality, and reliability of smart contracts and being able to write some goods test can save not only a lot of time but also a lot of money. In this article, we will discuss two types of test...]]></description><link>https://amm.zanotp.com/testing-smart-contracts</link><guid isPermaLink="true">https://amm.zanotp.com/testing-smart-contracts</guid><category><![CDATA[Smart Contracts]]></category><category><![CDATA[Testing]]></category><category><![CDATA[Security]]></category><category><![CDATA[foundry]]></category><category><![CDATA[Solidity]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Thu, 03 Aug 2023 08:35:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/FnA5pAzqhMM/upload/64dc6ede535c99c4686ca6f1df72f553.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Testing plays a vital role in ensuring the security, functionality, and reliability of smart contracts and being able to write some goods test can save not only a lot of time but also a lot of money. In this article, we will discuss two types of testing methodologies: unit tests and invariant tests.</p>
<p>Note that I assume a basic knowledge of the Solidity language and Foundry framework, however, even someone without this knowledge should be able to follow along.</p>
<h2 id="heading-blockchain-101-smart-contracts">Blockchain 101: smart contracts</h2>
<p>In simple words, smart contracts are like digital agreements that automatically execute and enforce themselves when certain conditions are met. We can liken smart contracts to vending machines: once they receive the right inputs, they automatically execute the agreement.</p>
<p>For example, to decentralize a lottery, we will write a function that takes in input the amount paid by a particular player and a function that, once a particular condition is met (e.g. if the number of players is 10 or if one day has passed), it generates a pseudo-random number between 0 and the number of players and pays the amount to the selected winner.</p>
<p>These digital agreements can be used for various purposes, such as transferring money, buying and selling assets, or even voting in elections. Since smart contracts run on the blockchain, they are tamper-resistant and transparent (or at least they should be). Nevertheless, not all blockchain developers pay attention to the contract doing what it is supposed to do, and in fact, the number of hacked or tampered smart contracts is surprisingly high.</p>
<p>To prevent this, it is important to develop a comprehensive testing strategy that includes both unit tests and invariant tests.</p>
<h2 id="heading-set-up">Set up</h2>
<p>First of all, we need to set up the foundry environment:</p>
<pre><code class="lang-solidity">forge init
</code></pre>
<p>Then we need a contract to test. The following lines of code implement a lottery as the one described before. Note that for simplicity the winner should be the first player (<code>players[0]</code>) that joins the lottery (pseudo-random numbers on the blockchain is a big theme for an upcoming article) and that the lottery ends once there are at least 5 participants and the owner of the lottery calls the function <code>endTheLottery</code>.</p>
<pre><code class="lang-solidity"><span class="hljs-comment">// SPDX-License-Identifier: MIT</span>
<span class="hljs-meta"><span class="hljs-keyword">pragma</span> <span class="hljs-keyword">solidity</span> ^0.8.19;</span>

<span class="hljs-class"><span class="hljs-keyword">contract</span> <span class="hljs-title">Lottery</span> </span>{
    <span class="hljs-function"><span class="hljs-keyword">error</span> <span class="hljs-title">Lottery__notEnoughEthSent</span>(<span class="hljs-params"><span class="hljs-keyword">uint256</span> amount</span>)</span>;
    <span class="hljs-function"><span class="hljs-keyword">error</span> <span class="hljs-title">Lottery__notTheOwner</span>(<span class="hljs-params"><span class="hljs-keyword">address</span> sender</span>)</span>;
    <span class="hljs-function"><span class="hljs-keyword">error</span> <span class="hljs-title">Lottery__notEnoughtPlayers</span>(<span class="hljs-params"></span>)</span>;
    <span class="hljs-function"><span class="hljs-keyword">error</span> <span class="hljs-title">Lottery__invalidTransaction</span>(<span class="hljs-params"></span>)</span>;

    <span class="hljs-keyword">uint256</span> <span class="hljs-keyword">immutable</span> i_lotteryPriceInEth;
    <span class="hljs-keyword">address</span> owner;
    <span class="hljs-keyword">address</span>[] players;
    <span class="hljs-keyword">address</span> winner;

    <span class="hljs-function"><span class="hljs-keyword">modifier</span> <span class="hljs-title">onlyOwner</span>(<span class="hljs-params"></span>) </span>{
        <span class="hljs-keyword">if</span> (<span class="hljs-built_in">msg</span>.<span class="hljs-built_in">sender</span> <span class="hljs-operator">!</span><span class="hljs-operator">=</span> owner) <span class="hljs-keyword">revert</span> Lottery__notTheOwner(<span class="hljs-built_in">msg</span>.<span class="hljs-built_in">sender</span>);
        <span class="hljs-keyword">_</span>;
    }

    <span class="hljs-function"><span class="hljs-keyword">modifier</span> <span class="hljs-title">moreThanFivePlayers</span>(<span class="hljs-params"></span>) </span>{
        <span class="hljs-keyword">if</span> (players.<span class="hljs-built_in">length</span> <span class="hljs-operator">&lt;</span> <span class="hljs-number">5</span>) <span class="hljs-keyword">revert</span> Lottery__notEnoughtPlayers();
        <span class="hljs-keyword">_</span>;
    }

    <span class="hljs-function"><span class="hljs-keyword">constructor</span>(<span class="hljs-params"><span class="hljs-keyword">uint256</span> lotteryPriceInEth</span>) </span>{
        i_lotteryPriceInEth <span class="hljs-operator">=</span> lotteryPriceInEth;
        owner <span class="hljs-operator">=</span> <span class="hljs-built_in">msg</span>.<span class="hljs-built_in">sender</span>;
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">joinLottery</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> <span class="hljs-title"><span class="hljs-keyword">payable</span></span> </span>{
        <span class="hljs-keyword">if</span> (<span class="hljs-built_in">msg</span>.<span class="hljs-built_in">value</span> <span class="hljs-operator">&lt;</span> i_lotteryPriceInEth)
            <span class="hljs-keyword">revert</span> Lottery__notEnoughEthSent(<span class="hljs-built_in">msg</span>.<span class="hljs-built_in">value</span>);
        players.<span class="hljs-built_in">push</span>(<span class="hljs-built_in">msg</span>.<span class="hljs-built_in">sender</span>);
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">endTheLottery</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> <span class="hljs-title">onlyOwner</span> <span class="hljs-title">moreThanFivePlayers</span> </span>{
        <span class="hljs-keyword">if</span> (players.<span class="hljs-built_in">length</span> <span class="hljs-operator">%</span> <span class="hljs-number">2</span> <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-number">0</span>) {
            winner <span class="hljs-operator">=</span> players[<span class="hljs-number">1</span>];
        }
        <span class="hljs-keyword">if</span> (players.<span class="hljs-built_in">length</span> <span class="hljs-operator">%</span> <span class="hljs-number">2</span> <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-number">1</span>) {
            winner <span class="hljs-operator">=</span> players[<span class="hljs-number">0</span>];
        }
        (<span class="hljs-keyword">bool</span> success, <span class="hljs-keyword">bytes</span> <span class="hljs-keyword">memory</span> data) <span class="hljs-operator">=</span> <span class="hljs-keyword">payable</span>(winner).<span class="hljs-built_in">call</span>{
            <span class="hljs-built_in">value</span>: <span class="hljs-keyword">address</span>(<span class="hljs-built_in">this</span>).<span class="hljs-built_in">balance</span>
        }(<span class="hljs-string">""</span>);
        <span class="hljs-keyword">if</span> (<span class="hljs-operator">!</span>success) <span class="hljs-keyword">revert</span> Lottery__invalidTransaction();
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">transferOwnership</span>(<span class="hljs-params"><span class="hljs-keyword">address</span> newOwner</span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        owner <span class="hljs-operator">=</span> newOwner;
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getNumberOfPlayer</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> <span class="hljs-title"><span class="hljs-keyword">view</span></span> <span class="hljs-title"><span class="hljs-keyword">returns</span></span> (<span class="hljs-params"><span class="hljs-keyword">uint256</span></span>) </span>{
        <span class="hljs-keyword">return</span> players.<span class="hljs-built_in">length</span>;
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getPlayer</span>(<span class="hljs-params"><span class="hljs-keyword">uint256</span> index</span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> <span class="hljs-title"><span class="hljs-keyword">view</span></span> <span class="hljs-title"><span class="hljs-keyword">returns</span></span> (<span class="hljs-params"><span class="hljs-keyword">address</span></span>) </span>{
        <span class="hljs-keyword">return</span> players[index];
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getWinner</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> <span class="hljs-title"><span class="hljs-keyword">view</span></span> <span class="hljs-title"><span class="hljs-keyword">returns</span></span> (<span class="hljs-params"><span class="hljs-keyword">address</span></span>) </span>{
        <span class="hljs-keyword">return</span> winner;
    }
}
</code></pre>
<p>In this first version of the contract, a couple of things are not correct and by testing we should be able to spot them.</p>
<h2 id="heading-unit-tests">Unit tests</h2>
<p>Unit tests are deterministic tests, i.e. they produce deterministic results, are easy to debut and are used to assert particular behaviors of the contract. Before we write any unit test we need a deployer script as the following one:</p>
<pre><code class="lang-solidity"><span class="hljs-comment">// SPDX-License-Identifier: MIT</span>
<span class="hljs-meta"><span class="hljs-keyword">pragma</span> <span class="hljs-keyword">solidity</span> ^0.8.19;</span>

<span class="hljs-keyword">import</span> {<span class="hljs-title">Script</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"../lib/forge-std/src/Script.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">Lottery</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"src/Lottery.sol"</span>;

<span class="hljs-class"><span class="hljs-keyword">contract</span> <span class="hljs-title">DeployLottery</span> <span class="hljs-keyword">is</span> <span class="hljs-title">Script</span> </span>{
    Lottery lottery;

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">run</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> <span class="hljs-title"><span class="hljs-keyword">returns</span></span> (<span class="hljs-params">Lottery</span>) </span>{
        vm.startBroadcast();
        lottery <span class="hljs-operator">=</span> <span class="hljs-keyword">new</span> Lottery(<span class="hljs-number">1</span> <span class="hljs-literal">ether</span>);
        vm.stopBroadcast();
        <span class="hljs-keyword">return</span> lottery;
    }
}
</code></pre>
<p>Suppose now we want to assess that the modifier <code>onlyOwner</code> is doing his job (which is to prevent addresses different from the owner to call the <code>endLotteryFunction</code>), what we need to do is:</p>
<ul>
<li><p>deploy the contract;</p>
</li>
<li><p>transfer the ownership of the contract calling the <code>transferOwnership</code> function;</p>
</li>
<li><p>prank an address (different from the owner) and try to call the <code>endTheLottery</code> function;</p>
</li>
<li><p>assert that the contract trows the <code>Lottery__notTheOwner</code> error.</p>
</li>
</ul>
<pre><code class="lang-solidity"><span class="hljs-comment">// SPDX-License-Identifier: MIT</span>
<span class="hljs-meta"><span class="hljs-keyword">pragma</span> <span class="hljs-keyword">solidity</span> ^0.8.19;</span>

<span class="hljs-keyword">import</span> {<span class="hljs-title">Test</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"lib/forge-std/src/Test.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">Lottery</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"src/Lottery.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">DeployLottery</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"script/DeployLottery.s.sol"</span>;

<span class="hljs-class"><span class="hljs-keyword">contract</span> <span class="hljs-title">LotteryTest</span> <span class="hljs-keyword">is</span> <span class="hljs-title">Test</span> </span>{
    <span class="hljs-keyword">address</span> player0 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Alice"</span>);
    <span class="hljs-keyword">address</span> player1 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Bob"</span>);
    <span class="hljs-keyword">address</span> player2 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Carl"</span>);
    <span class="hljs-keyword">address</span> player3 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"David"</span>);
    <span class="hljs-keyword">address</span> player4 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Eleonor"</span>);
    <span class="hljs-keyword">address</span> owner <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Owner"</span>);
    <span class="hljs-keyword">uint256</span> <span class="hljs-keyword">public</span> <span class="hljs-keyword">constant</span> BALANCE <span class="hljs-operator">=</span> <span class="hljs-number">100</span> <span class="hljs-literal">ether</span>;

    Lottery lottery;

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">setUp</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        DeployLottery deployer <span class="hljs-operator">=</span> <span class="hljs-keyword">new</span> DeployLottery();
        vm.deal(player0, BALANCE);
        vm.deal(player1, BALANCE);
        vm.deal(player2, BALANCE);
        vm.deal(player3, BALANCE);
        vm.deal(player4, BALANCE);
        vm.deal(owner, BALANCE);
        lottery <span class="hljs-operator">=</span> deployer.run();
        lottery.transferOwnership(owner);
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">testOnlyOwnerCanEndTheLottery</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        vm.expectRevert(
            <span class="hljs-built_in">abi</span>.<span class="hljs-built_in">encodeWithSelector</span>(
                Lottery.Lottery__notTheOwner.<span class="hljs-built_in">selector</span>,
                player0
            )
        );

        vm.startPrank(player0); <span class="hljs-comment">// not the owner</span>
        lottery.endTheLottery();
        vm.stopPrank();
    }
}
</code></pre>
<p>Since <code>player0</code> is the caller of <code>endTheLottery</code>, the contract trows the <code>Lottery__notTheOwner</code> error, as expected:</p>
<pre><code class="lang-bash">Running 1 <span class="hljs-built_in">test</span> <span class="hljs-keyword">for</span> <span class="hljs-built_in">test</span>/LotteryTest.t.sol:LotteryTest
[PASS] testOnlyOwnerCanEndTheLottery() (gas: 13825)
Test result: ok. 1 passed; 0 failed; 0 skipped; finished <span class="hljs-keyword">in</span> 1.22ms
</code></pre>
<p>Another classical use of unit tests is for asserting a particular relation between two variables. For example, let's assert that the number of players is five in the following script:</p>
<pre><code class="lang-solidity"><span class="hljs-comment">// SPDX-License-Identifier: MIT</span>
<span class="hljs-meta"><span class="hljs-keyword">pragma</span> <span class="hljs-keyword">solidity</span> ^0.8.19;</span>

<span class="hljs-keyword">import</span> {<span class="hljs-title">Test</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"lib/forge-std/src/Test.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">Lottery</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"src/Lottery.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">DeployLottery</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"script/DeployLottery.s.sol"</span>;

<span class="hljs-class"><span class="hljs-keyword">contract</span> <span class="hljs-title">LotteryTest</span> <span class="hljs-keyword">is</span> <span class="hljs-title">Test</span> </span>{
    <span class="hljs-keyword">address</span> player0 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Alice"</span>);
    <span class="hljs-keyword">address</span> player1 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Bob"</span>);
    <span class="hljs-keyword">address</span> player2 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Carl"</span>);
    <span class="hljs-keyword">address</span> player3 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"David"</span>);
    <span class="hljs-keyword">address</span> player4 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Eleanor"</span>);
    <span class="hljs-keyword">address</span> owner <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Owner"</span>);
    <span class="hljs-keyword">uint256</span> <span class="hljs-keyword">public</span> <span class="hljs-keyword">constant</span> BALANCE <span class="hljs-operator">=</span> <span class="hljs-number">100</span> <span class="hljs-literal">ether</span>;

    Lottery lottery;

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">setUp</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        DeployLottery deployer <span class="hljs-operator">=</span> <span class="hljs-keyword">new</span> DeployLottery();
        vm.deal(player0, BALANCE);
        vm.deal(player1, BALANCE);
        vm.deal(player2, BALANCE);
        vm.deal(player3, BALANCE);
        vm.deal(player4, BALANCE);
        vm.deal(owner, BALANCE);
        lottery <span class="hljs-operator">=</span> deployer.run();
        lottery.transferOwnership(owner);
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">testOnlyOwnerCanEndTheLottery</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        vm.expectRevert(
            <span class="hljs-built_in">abi</span>.<span class="hljs-built_in">encodeWithSelector</span>(
                Lottery.Lottery__notTheOwner.<span class="hljs-built_in">selector</span>,
                player0
            )
        );

        vm.startPrank(player0); <span class="hljs-comment">// not the owner</span>
        lottery.endTheLottery();
        vm.stopPrank();
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">testAssertNumberOfPlayers</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        vm.startPrank(player0);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player1);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player2);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player3);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player4);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();

        <span class="hljs-keyword">uint256</span> expectedNumberOfPlayers <span class="hljs-operator">=</span> <span class="hljs-number">5</span>;
        <span class="hljs-keyword">uint256</span> numberOfPlayers <span class="hljs-operator">=</span> lottery.getNumberOfPlayer();

        assertEq(expectedNumberOfPlayers, numberOfPlayers);
    }
}
</code></pre>
<p>As expected, <code>assertEq(expectedNumberOfPlayers, numberOfPlayers);</code> is true and the test passed:</p>
<pre><code class="lang-bash">Running 2 tests <span class="hljs-keyword">for</span> <span class="hljs-built_in">test</span>/LotteryTest.t.sol:LotteryTest
[PASS] testAssertNumberOfPlayers() (gas: 192928)
[PASS] testOnlyOwnerCanEndTheLottery() (gas: 13880)
Test result: ok. 2 passed; 0 failed; 0 skipped; finished <span class="hljs-keyword">in</span> 2.34ms
</code></pre>
<p>Note that these are only two simple cases and we haven't tested edge cases (for example the contract has undesired behavior after the first lottery concludes as the address array is never reinitialized).</p>
<p>As we saw, unit tests are particularly powerful tests in particular when the contract is quite simple. If the contract has some complex functions or inherits from other contracts we may want to conduct a different type of test: the invariant tests.</p>
<h2 id="heading-invariant-tests">Invariant tests</h2>
<p>Invariant tests are a form of stochastic testing, meaning the results may vary across test runs (unless the same seed is set). In other words, performing an invariant test means supplying random data to the contract functions trying to individuate some unexpected behavior.</p>
<p>The part of my reader that <em>actually</em> read the contract may have noticed that the function <code>endTheLottery</code> does something undesired. In fact, if the length of <code>players</code> is an even number (<code>%</code> is the modulus operator), the contract behaves correctly (remember that for simplicity we want the first to join the lottery to be the winner), but if the number is odd the winner is <code>player[1]</code> (i.e. the second one who joined the lottery).</p>
<p>It appears that the victory of <code>player[0]</code> should be an invariant property of the contract. Since many contracts have at least an invariant property and testing these properties with unit tests may be difficult or impossible (especially for complex contracts), knowing how to perform invariant tests is the <em>conditio sine qua non</em> to be a proficient blockchain engineer.</p>
<p>Note that there are two types of invariant tests:</p>
<ul>
<li><p>stateless invariant tests: tests where the states of the test are independent of one other.;</p>
</li>
<li><p>stateful invariant tests: tests where the state of the next run is affected by all the previous states;</p>
</li>
</ul>
<p>We can in fact find the undesired behaviour of <code>endTheLottery</code> just by performing the following test:</p>
<pre><code class="lang-solidity"><span class="hljs-comment">// SPDX-License-Identifier: MIT</span>
<span class="hljs-meta"><span class="hljs-keyword">pragma</span> <span class="hljs-keyword">solidity</span> ^0.8.19;</span>

<span class="hljs-keyword">import</span> {<span class="hljs-title">Test</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"lib/forge-std/src/Test.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">Lottery</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"src/Lottery.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">DeployLottery</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"script/DeployLottery.s.sol"</span>;
<span class="hljs-keyword">import</span> {<span class="hljs-title">StdInvariant</span>} <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-string">"lib/forge-std/src/StdInvariant.sol"</span>;

<span class="hljs-class"><span class="hljs-keyword">contract</span> <span class="hljs-title">LotteryTest</span> <span class="hljs-keyword">is</span> <span class="hljs-title">Test</span> </span>{
    <span class="hljs-keyword">address</span> player0 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Alice"</span>);
    <span class="hljs-keyword">address</span> player1 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Bob"</span>);
    <span class="hljs-keyword">address</span> player2 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Carl"</span>);
    <span class="hljs-keyword">address</span> player3 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"David"</span>);
    <span class="hljs-keyword">address</span> player4 <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Eleonor"</span>);
    <span class="hljs-keyword">address</span> owner <span class="hljs-operator">=</span> makeAddr(<span class="hljs-string">"Owner"</span>);
    <span class="hljs-keyword">uint256</span> <span class="hljs-keyword">public</span> <span class="hljs-keyword">constant</span> BALANCE <span class="hljs-operator">=</span> <span class="hljs-number">100</span> <span class="hljs-literal">ether</span>;

    Lottery lottery;

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">setUp</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        DeployLottery deployer <span class="hljs-operator">=</span> <span class="hljs-keyword">new</span> DeployLottery();
        vm.deal(player0, BALANCE);
        vm.deal(player1, BALANCE);
        vm.deal(player2, BALANCE);
        vm.deal(player3, BALANCE);
        vm.deal(player4, BALANCE);
        vm.deal(owner, BALANCE);
        lottery <span class="hljs-operator">=</span> deployer.run();
        lottery.transferOwnership(owner);
        targetContract(<span class="hljs-keyword">address</span>(lottery));
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">testOnlyOwnerCanEndTheLottery</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        vm.expectRevert(
            <span class="hljs-built_in">abi</span>.<span class="hljs-built_in">encodeWithSelector</span>(
                Lottery.Lottery__notTheOwner.<span class="hljs-built_in">selector</span>,
                player0
            )
        );

        vm.startPrank(player0); <span class="hljs-comment">// not the owner</span>
        lottery.endTheLottery();
        vm.stopPrank();
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">testAssertNumberOfPlayers</span>(<span class="hljs-params"></span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        vm.startPrank(player0);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player1);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player2);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player3);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();
        vm.startPrank(player4);
        lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        vm.stopPrank();

        <span class="hljs-keyword">uint256</span> expectedNumberOfPlayers <span class="hljs-operator">=</span> <span class="hljs-number">5</span>;
        <span class="hljs-keyword">uint256</span> numberOfPlayers <span class="hljs-operator">=</span> lottery.getNumberOfPlayer();

        assertEq(expectedNumberOfPlayers, numberOfPlayers);
    }

    <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">testFuzz_WinnerIsAlwaysPlayers0</span>(<span class="hljs-params"><span class="hljs-keyword">uint96</span> numPlayers</span>) <span class="hljs-title"><span class="hljs-keyword">public</span></span> </span>{
        vm.startPrank(player0);
        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">uint256</span> i <span class="hljs-operator">=</span> <span class="hljs-number">5</span>; i <span class="hljs-operator">&lt;</span> numPlayers; i<span class="hljs-operator">+</span><span class="hljs-operator">+</span>) {
            lottery.joinLottery{<span class="hljs-built_in">value</span>: <span class="hljs-number">1</span> <span class="hljs-literal">ether</span>}();
        }
        vm.stopPrank();

        vm.startPrank(owner);
        lottery.endTheLottery();
        vm.stopPrank();

        <span class="hljs-keyword">address</span> expectedWinner <span class="hljs-operator">=</span> lottery.getPlayer(<span class="hljs-number">0</span>);
        assertEq(
            lottery.getPlayer(<span class="hljs-number">0</span>),
            expectedWinner
        );
    }
}
</code></pre>
<p>The test fails (as expected) and it returns the following logs to notify that there is at leas a situation in which <code>endTheLottery</code> has unexpected behavior:</p>
<pre><code class="lang-solidity">Test result: FAILED. 2 passed; <span class="hljs-number">1</span> failed; <span class="hljs-number">0</span> skipped; finished in <span class="hljs-number">4</span>.96ms

Failing tests:
Encountered <span class="hljs-number">1</span> failing test in test<span class="hljs-operator">/</span>LotteryTest.t.sol:LotteryTest
[FAIL. Reason: Lottery__notEnoughtPlayers() Counterexample: <span class="hljs-keyword">calldata</span><span class="hljs-operator">=</span><span class="hljs-number">0x515cecbc0000000000000000000000000000000000000000000000000000000000000000</span>, args<span class="hljs-operator">=</span>[<span class="hljs-number">0</span>]] testFuzz_WinnerIsAlwaysPlayers0(<span class="hljs-keyword">uint96</span>) (runs: <span class="hljs-number">0</span>, μ: <span class="hljs-number">0</span>, <span class="hljs-operator">~</span>: <span class="hljs-number">0</span>)

Encountered a total of <span class="hljs-number">1</span> failing tests, <span class="hljs-number">2</span> tests succeeded
</code></pre>
<p>In fact <code>0x515cecbc0000000000000000000000000000000000000000000000000000000000000000</code> is the hexadecimal representation of an even number. Adding an even number to 5 (the starting point of the for loop) results in an odd number, which triggers the second if statement in the <code>endLottery</code> function and thus the winner is <code>players[1]</code>.</p>
<h2 id="heading-to-go-further">To go further</h2>
<p>To learn more about testing Solidity contract with the Foundry framework and discover advanced testing techniques consult the <a target="_blank" href="https://book.getfoundry.sh/forge/tests">Foundry docs</a>.</p>
<hr />
<hr />
<p>And that's it for this article.</p>
<p>Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here</strong></a>.</p>
]]></content:encoded></item><item><title><![CDATA[Numerical methods for ODEs]]></title><description><![CDATA[In mathematics, an ordinary differential equation (ODE) is a type of differential equation whose definition and analysis rely exclusively on a single independent variable. The solution of an ODE is no different from the solution of any other differen...]]></description><link>https://amm.zanotp.com/odes</link><guid isPermaLink="true">https://amm.zanotp.com/odes</guid><category><![CDATA[Mathematics]]></category><category><![CDATA[ode]]></category><category><![CDATA[Python]]></category><category><![CDATA[#numerical-methods]]></category><category><![CDATA[#differential-equations]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Mon, 17 Jul 2023 10:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/pv5SUbgRRIU/upload/6f10c166f8816bc01dae545fe9906cd2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In mathematics, an ordinary differential equation (ODE) is a type of differential equation whose definition and analysis rely exclusively on a single independent variable. The solution of an ODE is no different from the solution of any other differential equation, as the solutions are one or more functions that satisfy the equation.</p>
<p>Let’s take a look at a simple differential equation</p>
<p>$$\frac{\delta y}{\delta x}=ky$$</p><p>where \(k \in R\).</p>
<p>The solutions of the above equation are the functions whose derivatives are proportional to a constant factor (\(k\)) of the original function.</p>
<p>Bringing back some calculus, consider the function \(y = ce^{kx}\), where \(c\) is a real constant: the derivative of \(y\) with respect to \(x\) is \(\frac{dy}{dx} = ky\). Consequently, the family of solutions is represented by the expression \(y = ce^{kx}\), where q is a real number.</p>
<p>It is common to add an initial condition that gives the value of the unknown function at a particular point in the domain. For example:</p>
<p>$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$</p><p>It is straightforward to prove that the solution of the above system is \(y = 2e^{2x}\).</p>
<p>Unfortunately, not every ODE can be directly solved explicitly, so numerical methods come to the rescue by providing an approximation to the solution.</p>
<p>It is worth noting that these numerical methods are not only useful for solving first-order ODEs but are equally valuable for addressing higher-order ODEs as well (i.e. ODE involving higher-order derivative) since a higher-order ODE can often be transformed into a system of first-order ODEs.</p>
<p>In this article, I introduce two among the multitude of available numerical methods.</p>
<h2 id="heading-euler-method">Euler method</h2>
<p>The Euler method offers a simple approach by breaking down the continuous ODE into discrete steps. The idea is to update the function's value based on its derivative at each step, effectively simulating the behavior of the ODE over a range of points. In fact, from any point \(p\) on a curve, we can find an approximation of the nearby points on the curve by moving along the line tangent to \(p\).</p>
<p>Let</p>
<p>$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=f(x, y(x))\\ y(x_0)=y_0 \end{cases} \end{equation}$$</p><p>be our initial system.</p>
<p>Replacing the derivate with its discrete version and rearranging we get</p>
<p>$$\begin{equation} \begin{cases} y(x+h)=y(x) + h f(x, y(x))\\ y(x _0)=y_0 \end{cases} \end{equation}$$</p><p>and we can compute the following recursive scheme</p>
<p>$$y_{n+1}=y_n+hf(x_n,y_n)$$</p><p>Graphically:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1692255385324/bd351877-69c0-4ee5-b46b-bfe39aff4b9f.png" alt class="image--center mx-auto" /></p>
<p>Using the above equation, we can now compute \(y(x_n)\) \(\forall \space x_n\) with the following steps:</p>
<ol>
<li><p>store \(y(x_0)=y_0\);</p>
</li>
<li><p>compute \(y(x_1)=y_0+hf(x_0, y_0)\);</p>
</li>
<li><p>store \(y(x_1)\);</p>
</li>
<li><p>compute \(y(x_2)=y_1+hf(x_1, y_1)\);</p>
</li>
<li><p>store \(y(x_2)\);</p>
</li>
</ol>
<p>and so on.<br />We now want to approximate the solution of the initial system</p>
<p>$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$</p><p>and visualize the exact solution and the approximation</p>
<pre><code class="lang-solidity"><span class="hljs-keyword">import</span> <span class="hljs-title">numpy</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">np</span>
<span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">plt</span>

# <span class="hljs-title">define</span> <span class="hljs-title">params</span>, <span class="hljs-title">ode</span> <span class="hljs-title">and</span> <span class="hljs-title">inital</span> <span class="hljs-title">condition</span>
<span class="hljs-title">k</span> <span class="hljs-operator">=</span> 2
<span class="hljs-title">f</span> <span class="hljs-operator">=</span> <span class="hljs-title">lambda</span> <span class="hljs-title">y</span>, <span class="hljs-title">x</span>: <span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">y</span>)
<span class="hljs-title">h</span> <span class="hljs-operator">=</span> 0.1
<span class="hljs-title">x</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1 <span class="hljs-operator">+</span> <span class="hljs-title">h</span>, <span class="hljs-title">h</span>)
<span class="hljs-title">x_</span><span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1, 0.0001)
<span class="hljs-title">y0</span> <span class="hljs-operator">=</span> 2 

# <span class="hljs-title">initialize</span> <span class="hljs-title">the</span> <span class="hljs-title">y</span> <span class="hljs-title">vector</span>
<span class="hljs-title">y</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">zeros</span>(<span class="hljs-title">len</span>(<span class="hljs-title">x</span>))
<span class="hljs-title">y</span>[0] <span class="hljs-operator">=</span> <span class="hljs-title">y0</span>

# <span class="hljs-title">populate</span> <span class="hljs-title">the</span> <span class="hljs-title">y</span> <span class="hljs-title">vector</span>
<span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(0, <span class="hljs-title">len</span>(<span class="hljs-title">x</span>) <span class="hljs-operator">-</span> 1):
    <span class="hljs-title">y</span>[<span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1] <span class="hljs-operator">=</span> <span class="hljs-title">y</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">*</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>], <span class="hljs-title">y</span>[<span class="hljs-title">i</span>])

# <span class="hljs-title">plot</span> <span class="hljs-title">the</span> <span class="hljs-title">results</span>
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x</span>, <span class="hljs-title">y</span>, <span class="hljs-string">'bo--'</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Approximated solution'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x_</span>, <span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">x_</span>)<span class="hljs-operator">+</span>1, <span class="hljs-string">'g'</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Exact solution'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">xlabel</span>(<span class="hljs-string">'x'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">ylabel</span>(<span class="hljs-string">'y'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">grid</span>()
<span class="hljs-title">plt</span>.<span class="hljs-title">legend</span>(<span class="hljs-title">loc</span><span class="hljs-operator">=</span><span class="hljs-string">'lower right'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">show</span>()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1692262422447/d1a5e9ca-89f7-41ed-b129-80bdd87702ca.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-runge-kutta-methods">Runge-Kutta methods</h2>
<p>The Euler method is often not enough accurate. One solution is to use more than one point in the interval \([x_n, x_{n+1}]\) as the Runge-Kutta methods do. The number of points in the interval \([x_n, x_{n+1}]\) defines the order of the method.</p>
<h3 id="heading-second-order-runge-kutta-method-rk2"><strong>Second-Order Runge-Kutta Method (RK2)</strong></h3>
<p>Starting with the Runge-Kutta method of order 2, we need the following second-order Taylor expansion:</p>
<p>$$y(x+h)=y(x)+h\frac{\delta y}{\delta x}(x) + \frac {h^2}2\frac{\delta^2 y}{\delta x^2}(x)+\epsilon$$</p><p>where \(\epsilon\) is the truncation error.</p>
<p>We can obtain \(\frac{\delta^2 y}{\delta x^2}(x)\) by differentiating the ODE \(\frac{\delta y}{\delta x}(x)=f(x, y(x))\):</p>
<p>$$\frac{\delta^2 y}{\delta x^2}(x)=\frac{\delta }{\delta x}f(x, y)+\frac{\delta}{\delta y}f(x, y)f(x, y)$$</p><p>and the Taylor expansion hence becomes</p>
<p>$$y(x+h)=y(x)+hf(x,y) + \frac {h^2}2(\frac{\delta }{\delta x}f(x, y)+\frac{\delta}{\delta y}f(x, y)f(x, y) )+\epsilon$$</p><p>After some manipulation, we obtain</p>
<p>$$y(x+h)=y(x)+\frac h2f(x,y) + \frac {h}2f(x+h,y+hf(x,y))+\epsilon$$</p><p>which corresponds to the following recursive scheme:</p>
<p>$$y_{n+1}=y_n+\frac h2(s_1+s_2)$$</p><p>with</p>
<p>$$s_1=f(x_n, y_n)\\$$</p><p>$$s_2 = f(x_n+\frac h2, y_n+\frac h2{s_1})$$</p><p>Note that \(s_1\) and \(s_2\) correspond to two different estimates of the slope of the solution and the method is nothing more than the average between the two.</p>
<p>Again using the above equation, we can compute \(y(x_n)\) \(\forall \space x_n\) with the following steps:</p>
<ol>
<li><p>store \(y(x_0)=y_0\);</p>
</li>
<li><p>compute \(s_1 \) and \(s_2\);</p>
</li>
<li><p>compute \(y(x_1)=y_0+\frac h2(s_1+s_2)\);</p>
</li>
<li><p>store \(y(x_1)\);</p>
</li>
<li><p>update \(s_1 \) and \(s_2\);</p>
</li>
<li><p>\(y(x_2)=y_1+\frac h2(s_1+s_2)\);</p>
</li>
<li><p>store \(y(x_2)\);</p>
</li>
</ol>
<p>and so on.</p>
<p>Again, we want to approximate the solution of the initial system</p>
<p>$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$</p><pre><code class="lang-solidity"><span class="hljs-keyword">import</span> <span class="hljs-title">numpy</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">np</span>
<span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">plt</span>


# <span class="hljs-title">define</span> <span class="hljs-title">params</span>, <span class="hljs-title">the</span> <span class="hljs-title">ode</span> <span class="hljs-title">and</span> <span class="hljs-title">the</span> <span class="hljs-title">inital</span> <span class="hljs-title">consition</span>
<span class="hljs-title">k</span> <span class="hljs-operator">=</span> 2
<span class="hljs-title">f</span> <span class="hljs-operator">=</span> <span class="hljs-title">lambda</span> <span class="hljs-title">y</span>, <span class="hljs-title">x</span>: <span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">y</span>)
<span class="hljs-title">h</span> <span class="hljs-operator">=</span> 0.1
<span class="hljs-title">x</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1 <span class="hljs-operator">+</span> <span class="hljs-title">h</span>, <span class="hljs-title">h</span>)
<span class="hljs-title">x_</span><span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1, 0.0001)
<span class="hljs-title">y0</span> <span class="hljs-operator">=</span> 2 

# <span class="hljs-title">initialize</span> <span class="hljs-title">thw</span> <span class="hljs-title">y</span> <span class="hljs-title">vector</span>
<span class="hljs-title">y</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">zeros</span>(<span class="hljs-title">len</span>(<span class="hljs-title">x</span>))
<span class="hljs-title">y</span>[0] <span class="hljs-operator">=</span> <span class="hljs-title">y0</span>

# <span class="hljs-title">populate</span> <span class="hljs-title">the</span> <span class="hljs-title">y</span> <span class="hljs-title">vector</span>
<span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(0, <span class="hljs-title">len</span>(<span class="hljs-title">x</span>) <span class="hljs-operator">-</span> 1):
    <span class="hljs-title">s1</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>], <span class="hljs-title">y</span>[<span class="hljs-title">i</span>])
    <span class="hljs-title">s2</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">/</span>2<span class="hljs-operator">*</span><span class="hljs-title">s1</span>)
    <span class="hljs-title">y</span>[<span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1] <span class="hljs-operator">=</span> <span class="hljs-title">y</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2 <span class="hljs-operator">*</span> (<span class="hljs-title">s1</span><span class="hljs-operator">+</span><span class="hljs-title">s2</span>)

# <span class="hljs-title">plot</span> <span class="hljs-title">the</span> <span class="hljs-title">results</span>
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x</span>, <span class="hljs-title">y</span>, <span class="hljs-string">'bo--'</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Approximated solution'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x_</span>, <span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">x_</span>)<span class="hljs-operator">+</span>1, <span class="hljs-string">'g'</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Exact solution'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">xlabel</span>(<span class="hljs-string">'x'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">ylabel</span>(<span class="hljs-string">'y'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">grid</span>()
<span class="hljs-title">plt</span>.<span class="hljs-title">legend</span>(<span class="hljs-title">loc</span><span class="hljs-operator">=</span><span class="hljs-string">'lower right'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">show</span>()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1692262848070/183f3202-2a9b-4d47-82d2-c441fb027f32.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-fourth-order-runge-kutta-method-rk4"><strong>Fourth-Order Runge-Kutta Method (RK4)</strong></h3>
<p>Repeating what we did to find the recursive scheme for the Range-Kutta method of order 2, but using a fourth-order Taylor expansion, we obtain the following recursive scheme:</p>
<p>$$y_{n+1}=y_n+\frac h3(\frac {s_1}2 + s_2 + s_3 + \frac{s_4}2)$$</p><p>with</p>
<p>$$s_1 = f(x_n, y_n)$$</p><p>$$s_2 = f(x_n+\frac h2, y_n+\frac h2{s_1})$$</p><p>$$s_3 = f(x_n+\frac h2, y_n+\frac h2{s_2})$$</p><p>$$s_4 = f(x_n+h, y_n+h{s_3})$$</p><p>The steps used to approximate the system</p>
<p>$$\begin{equation} \begin{cases} \frac{\delta y}{\delta x}=2y\\ y(0)=2 \end{cases} \end{equation}$$</p><p>are specular to the ones used in the Runge-Kutta method of order 2.</p>
<pre><code class="lang-solidity"><span class="hljs-keyword">import</span> <span class="hljs-title">numpy</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">np</span>
<span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">plt</span>


# <span class="hljs-title">define</span> <span class="hljs-title">params</span>, <span class="hljs-title">the</span> <span class="hljs-title">ode</span> <span class="hljs-title">and</span> <span class="hljs-title">the</span> <span class="hljs-title">inital</span> <span class="hljs-title">consition</span>
<span class="hljs-title">k</span> <span class="hljs-operator">=</span> 2
<span class="hljs-title">f</span> <span class="hljs-operator">=</span> <span class="hljs-title">lambda</span> <span class="hljs-title">y</span>, <span class="hljs-title">x</span>: <span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">y</span>)
<span class="hljs-title">h</span> <span class="hljs-operator">=</span> .1
<span class="hljs-title">x</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1 <span class="hljs-operator">+</span> <span class="hljs-title">h</span>, <span class="hljs-title">h</span>)
<span class="hljs-title">x_</span><span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1, 0.0001)
<span class="hljs-title">y0</span> <span class="hljs-operator">=</span> 2 

# <span class="hljs-title">initialize</span> <span class="hljs-title">thw</span> <span class="hljs-title">y</span> <span class="hljs-title">vector</span>
<span class="hljs-title">y</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">zeros</span>(<span class="hljs-title">len</span>(<span class="hljs-title">x</span>))
<span class="hljs-title">y</span>[0] <span class="hljs-operator">=</span> <span class="hljs-title">y0</span>

# <span class="hljs-title">populate</span> <span class="hljs-title">the</span> <span class="hljs-title">y</span> <span class="hljs-title">vector</span>
<span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(0, <span class="hljs-title">len</span>(<span class="hljs-title">x</span>) <span class="hljs-operator">-</span> 1):
    <span class="hljs-title">s1</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>], <span class="hljs-title">y</span>[<span class="hljs-title">i</span>])
    <span class="hljs-title">s2</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">/</span>2<span class="hljs-operator">*</span><span class="hljs-title">s1</span>)
    <span class="hljs-title">s3</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">/</span>2<span class="hljs-operator">*</span><span class="hljs-title">s2</span>)
    <span class="hljs-title">s4</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span>, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">*</span><span class="hljs-title">s3</span>)
    <span class="hljs-title">y</span>[<span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1] <span class="hljs-operator">=</span> <span class="hljs-title">y</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>3 <span class="hljs-operator">*</span> (<span class="hljs-title">s1</span><span class="hljs-operator">/</span>2<span class="hljs-operator">+</span><span class="hljs-title">s2</span><span class="hljs-operator">+</span><span class="hljs-title">s3</span><span class="hljs-operator">+</span><span class="hljs-title">s4</span><span class="hljs-operator">/</span>2)

# <span class="hljs-title">plot</span> <span class="hljs-title">the</span> <span class="hljs-title">results</span>
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x</span>, <span class="hljs-title">y</span>, <span class="hljs-string">'bo--'</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Approximated solution'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x_</span>, <span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">x_</span>)<span class="hljs-operator">+</span>1, <span class="hljs-string">'g'</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Exact solution'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">xlabel</span>(<span class="hljs-string">'x'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">ylabel</span>(<span class="hljs-string">'y'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">grid</span>()
<span class="hljs-title">plt</span>.<span class="hljs-title">legend</span>(<span class="hljs-title">loc</span><span class="hljs-operator">=</span><span class="hljs-string">'lower right'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">show</span>()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1692262889037/49aae4f4-3074-4f00-858f-a0b3a651ff92.png" alt class="image--center mx-auto" /></p>
<p>There are also higher-order Range-Kutta methods but they are relatively inefficient, so I won't cover them in this article.</p>
<h2 id="heading-comparison-between-the-three-methods">Comparison between the three methods</h2>
<ol>
<li><p><strong>Euler Method:</strong></p>
<ul>
<li><p>Accuracy: the Euler method is a first-order method, which means that it can accumulate a significant error over many steps or for stiff ODEs.</p>
</li>
<li><p>Computational Complexity: the Euler method involves a single evaluation of the derivative function per step.</p>
</li>
</ul>
</li>
<li><p><strong>Second-Order Runge-Kutta Method (RK2):</strong></p>
<ul>
<li><p>Accuracy: RK2 is a second-order method: it offers better accuracy than the Euler method and is less prone to accumulating a relevant error over many steps.</p>
</li>
<li><p>Computational Complexity: RK2 requires two evaluations of the derivative function per step (one at the beginning and one at the midpoint).</p>
</li>
</ul>
</li>
<li><p><strong>Fourth-Order Runge-Kutta Method (RK4):</strong></p>
<ul>
<li><p>Accuracy: RK4 is a fourth-order method, which implies that it's significantly more accurate than both Euler and RK2 methods, making it suitable for many practical applications.</p>
</li>
<li><p>Computational Complexity: RK4 involves four evaluations of the derivative function per step, along with weighted combinations of these evaluations.</p>
<p>  Despite the higher computational cost compared to Euler and RK2, RK4 still remains a popular choice due to its reliability and accuracy.</p>
</li>
</ul>
</li>
</ol>
<p>Graphically:</p>
<pre><code class="lang-solidity"><span class="hljs-keyword">import</span> <span class="hljs-title">numpy</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">np</span>
<span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">plt</span>

# <span class="hljs-title">define</span> <span class="hljs-title">params</span>, <span class="hljs-title">the</span> <span class="hljs-title">ode</span> <span class="hljs-title">and</span> <span class="hljs-title">the</span> <span class="hljs-title">inital</span> <span class="hljs-title">consition</span>
<span class="hljs-title">k</span> <span class="hljs-operator">=</span> 2
<span class="hljs-title">f</span> <span class="hljs-operator">=</span> <span class="hljs-title">lambda</span> <span class="hljs-title">y</span>, <span class="hljs-title">x</span>: <span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">y</span>)
<span class="hljs-title">h</span> <span class="hljs-operator">=</span> 0.1
<span class="hljs-title">x</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1 <span class="hljs-operator">+</span> <span class="hljs-title">h</span>, <span class="hljs-title">h</span>)
<span class="hljs-title">x_</span><span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">arange</span>(0, 1, 0.0001)
<span class="hljs-title">y0</span> <span class="hljs-operator">=</span> 2 

# <span class="hljs-title">initialize</span> <span class="hljs-title">thw</span> <span class="hljs-title">y</span> <span class="hljs-title">vector</span>
<span class="hljs-title">y</span> <span class="hljs-operator">=</span> <span class="hljs-title">np</span>.<span class="hljs-title">zeros</span>(<span class="hljs-title">len</span>(<span class="hljs-title">x</span>))
<span class="hljs-title">y</span>[0] <span class="hljs-operator">=</span> <span class="hljs-title">y0</span>

# <span class="hljs-title">euler</span> <span class="hljs-title">method</span>
<span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(0, <span class="hljs-title">len</span>(<span class="hljs-title">x</span>) <span class="hljs-operator">-</span> 1):
    <span class="hljs-title">y</span>[<span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1] <span class="hljs-operator">=</span> <span class="hljs-title">y</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">*</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>], <span class="hljs-title">y</span>[<span class="hljs-title">i</span>])
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x</span>, <span class="hljs-title">y</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Euler'</span>, <span class="hljs-title">linestyle</span><span class="hljs-operator">=</span><span class="hljs-string">'--'</span>)


# <span class="hljs-title">rk2</span> <span class="hljs-title">method</span>
<span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(0, <span class="hljs-title">len</span>(<span class="hljs-title">x</span>) <span class="hljs-operator">-</span> 1):
    <span class="hljs-title">s1</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>], <span class="hljs-title">y</span>[<span class="hljs-title">i</span>])
    <span class="hljs-title">s2</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">/</span>2<span class="hljs-operator">*</span><span class="hljs-title">s1</span>)
    <span class="hljs-title">y</span>[<span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1] <span class="hljs-operator">=</span> <span class="hljs-title">y</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2 <span class="hljs-operator">*</span> (<span class="hljs-title">s1</span><span class="hljs-operator">+</span><span class="hljs-title">s2</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x</span>, <span class="hljs-title">y</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'RK2'</span>, <span class="hljs-title">linestyle</span><span class="hljs-operator">=</span><span class="hljs-string">'--'</span>)


# <span class="hljs-title">rk4</span> <span class="hljs-title">method</span>
<span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(0, <span class="hljs-title">len</span>(<span class="hljs-title">x</span>) <span class="hljs-operator">-</span> 1):
    <span class="hljs-title">s1</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>], <span class="hljs-title">y</span>[<span class="hljs-title">i</span>])
    <span class="hljs-title">s2</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">/</span>2<span class="hljs-operator">*</span><span class="hljs-title">s1</span>)
    <span class="hljs-title">s3</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>2, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">/</span>2<span class="hljs-operator">*</span><span class="hljs-title">s2</span>)
    <span class="hljs-title">s4</span><span class="hljs-operator">=</span><span class="hljs-title">f</span>(<span class="hljs-title">x</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span>, <span class="hljs-title">y</span>[<span class="hljs-title">i</span>]<span class="hljs-operator">+</span>  <span class="hljs-title">h</span><span class="hljs-operator">*</span><span class="hljs-title">s3</span>)
    <span class="hljs-title">y</span>[<span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1] <span class="hljs-operator">=</span> <span class="hljs-title">y</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">+</span> <span class="hljs-title">h</span><span class="hljs-operator">/</span>3 <span class="hljs-operator">*</span> (<span class="hljs-title">s1</span><span class="hljs-operator">/</span>2<span class="hljs-operator">+</span><span class="hljs-title">s2</span><span class="hljs-operator">+</span><span class="hljs-title">s3</span><span class="hljs-operator">+</span><span class="hljs-title">s4</span><span class="hljs-operator">/</span>2)
<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x</span>, <span class="hljs-title">y</span>, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'RK4'</span>, <span class="hljs-title">linestyle</span><span class="hljs-operator">=</span><span class="hljs-string">'--'</span>)


<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-title">x_</span>, <span class="hljs-title">np</span>.<span class="hljs-title">exp</span>(<span class="hljs-title">k</span><span class="hljs-operator">*</span><span class="hljs-title">x_</span>)<span class="hljs-operator">+</span>1, <span class="hljs-title">label</span><span class="hljs-operator">=</span><span class="hljs-string">'Exact solution'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">xlabel</span>(<span class="hljs-string">'x'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">ylabel</span>(<span class="hljs-string">'y'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">grid</span>()
<span class="hljs-title">plt</span>.<span class="hljs-title">legend</span>(<span class="hljs-title">loc</span><span class="hljs-operator">=</span><span class="hljs-string">'lower right'</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">show</span>()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1692263238201/19df3f10-e26a-41b0-b3bb-fe766025ea89.png" alt class="image--center mx-auto" /></p>
<hr />
<p>And that's it for this article.</p>
<p>Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here</strong></a>.</p>
]]></content:encoded></item><item><title><![CDATA[Simulated annealing in Python]]></title><description><![CDATA[Optimization is a crucial aspect of many fields, as it helps us find the best possible solution to a problem. In statistics, for example, it’s common to maximize the likelihood function or minimize the norm of residuals, in microeconomics optimizatio...]]></description><link>https://amm.zanotp.com/simulated-annealing-in-python</link><guid isPermaLink="true">https://amm.zanotp.com/simulated-annealing-in-python</guid><category><![CDATA[Python]]></category><category><![CDATA[optimization]]></category><category><![CDATA[programming]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sat, 18 Mar 2023 13:06:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/OPpCbAAKWv8/upload/641b311c5b6bbd3ccecf7edd8da5fd2c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Optimization is a crucial aspect of many fields, as it helps us find the best possible solution to a problem. In statistics, for example, it’s common to maximize the likelihood function or minimize the norm of residuals, in microeconomics optimization is used to study the behaviour of economic agents, who are assumed to maximize their utility subject to various constraints.</p>
<p>There are many different types of optimization problems, including linear programming, nonlinear programming, convex optimization, and integer programming, to name a few. Each type of optimization problem requires a different approach and a different set of algorithms to solve it.</p>
<p>In this post, I will talk about <strong>simulated annealing</strong>, which is a well-known algorithm but also is still exotic for noninitiated. For the sake of simplicity, I'll talk about minimization problems since seeking the maximum of a function \(f\) equals seeking the maximum of a function \(-f\).</p>
<h1 id="heading-simulated-annealing">Simulated annealing</h1>
<p>Simulated annealing is an iterative method for solving unconstrained and bound-constrained optimization problems. The algorithm borrows inspiration from the physical process of heating a material and then slowly lowering the <strong>temperature</strong>.</p>
<p>At each iteration of the simulated annealing algorithm, a new point \(x_i\) is randomly generated (if you don't know how computers deal with randomicity, see <a target="_blank" href="https://amm.zanotp.com/an-introduction-to-prngs-with-python-and-r">this article</a>). As we'll see in a minute, the distance of the new point \(x\_i\) from the current point \(x\_{i-1}\) is proportional to the temperature and based on a certain probability distribution. The algorithm accepts all new points \(x_i \) such that \(f(x\_i) \leq f(x\_{i-1})\) where \(f\) is the objective function (i.e. the function to be minimized), but also \(x_i \) such that \(f(x\_i) \geq f(x\_{i-1})\), with a <strong>certain probability</strong>. This property is significant and it prevents the algorithm from being trapped in <em>local minima</em>.</p>
<h3 id="heading-simulated-annealing-with-python">Simulated annealing with Python</h3>
<p>First of all, we need to load some packages:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> math
<span class="hljs-keyword">import</span> random <span class="hljs-keyword">as</span> rd
</code></pre>
<p>We now define the parameters, we need:</p>
<ul>
<li><p>an objective function \(f\);</p>
</li>
<li><p>a domain (where the algorithm should look for a solution);</p>
</li>
<li><p>initial temperature;</p>
</li>
<li><p>an initial point (which is usually selected randomly);</p>
</li>
<li><p>a step size;</p>
</li>
<li><p>a maximum number of iterations.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># 1) the objective function</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">f</span>(<span class="hljs-params">x</span>):</span>
    <span class="hljs-keyword">return</span> x**<span class="hljs-number">3</span> - <span class="hljs-number">8</span>

<span class="hljs-comment"># 2) the domain</span>
domain = [<span class="hljs-number">-10.</span>, <span class="hljs-number">10.</span>]

<span class="hljs-comment"># 3) initial temperature</span>
start_temp = <span class="hljs-number">100</span>

<span class="hljs-comment"># 4) starting value</span>
x_0 = rd.uniform(domain[<span class="hljs-number">0</span>], domain[<span class="hljs-number">1</span>])

<span class="hljs-comment"># 5) the step size</span>
step_size = <span class="hljs-number">2</span>

<span class="hljs-comment"># 6) maximum number of iterations</span>
max_iter = <span class="hljs-number">1000</span>
iteration = <span class="hljs-number">0</span>
</code></pre>
<p>First of all, we evaluate \(x_0\) and assign \(x_0\) and \(y_0 \) to <code>x_best</code> and <code>y_best</code> (the best value since now) and <code>x_curr</code> and <code>y_curr</code> (the current solution).</p>
<pre><code class="lang-python">y_0 = f(x_0)

x_curr, y_curr = x_0, y_0
x_best, y_best = x_0, y_0
</code></pre>
<p>The first step of the algorithm is to generate a new candidate solution \(f(x_1) \) from the current solution \(f(x_0)\). We count an iteration (this step is crucial, otherwise the algorithm would run forever).</p>
<pre><code class="lang-python">x_1 = x_curr + step_size * rd.uniform(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
y_1 = f(x_1)

iteration += <span class="hljs-number">1</span>
</code></pre>
<p>Since we are looking for a <em>minimum</em>, if <code>y_1</code> is smaller than <code>y_best</code>, we assign <code>y_1</code> and <code>x_1</code> to <code>y_best</code> and <code>x_best</code>. We then calculate the difference between <code>y_best</code> and <code>y_curr.</code></p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> y_1 &lt; y_best:
    x_best, y_best = x_1, y_1

diff = y_1 - y_curr
</code></pre>
<p>Here comes the most exciting part: we update the temperature (using fast annealing schedule) and use this value to calculate the <em>Metropolis criterion</em>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676404727190/499ebf45-dc0b-4839-aadf-cc34b0bd53d0.png" alt class="image--center mx-auto" /></p>
<p>where \(\Delta y\) is <code>diff</code> and \(t\) is <code>temp</code>. The numbers represent the probability of accepting the transition from \(x\_i\) to \(x\_{i+1}\) and are what allows to escape <em>local minima</em>.</p>
<pre><code class="lang-python">temp = start_temp / (iteration + <span class="hljs-number">1.</span>)
metropolis = math.exp(-diff / temp)

<span class="hljs-keyword">if</span> diff &lt;= <span class="hljs-number">0</span> <span class="hljs-keyword">or</span> rd.random() &lt; metropolis:
    x_curr, y_curr = x_best, y_best
</code></pre>
<p>And this is the last step. In fact after that, the algorithm calculates \(x_3\) and \(y_3\) and evaluates them for <code>x_best</code> and <code>y_best</code> and <code>x_curr</code> and <code>y_curr</code> and repeat itself until <code>iteration == max_iter</code>.</p>
<h3 id="heading-making-a-function-for-simulated-annealing">Making a function for simulated annealing</h3>
<p>Since the algorithm at some point repeats itself, we may want to wrap it up in a function.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> math
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> random <span class="hljs-keyword">as</span> rd

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">simulated_annealing</span>(<span class="hljs-params">f, domain, step_size, start_temp, max_iter = <span class="hljs-number">1000</span></span>):</span>

    x_0 = rd.uniform(domain[<span class="hljs-number">0</span>], domain[<span class="hljs-number">1</span>])
    y_0 = f(x_0)
    x_curr, y_curr = x_0, y_0
    x_best, y_best = x_0, y_0

    <span class="hljs-keyword">for</span> n <span class="hljs-keyword">in</span> range(max_iter):
        x_i = x_curr + step_size * rd.uniform(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
        y_i = f(x_i)

        <span class="hljs-keyword">if</span> y_i &lt; y_best:
            x_best, y_best = x_i, y_i

        diff = y_best - y_curr

        temp = start_temp/ float(n + <span class="hljs-number">1</span>)
        metropolis = math.exp(-diff / start_temp)

        <span class="hljs-keyword">if</span> diff &lt;= <span class="hljs-number">0</span> <span class="hljs-keyword">or</span> rd.random() &lt; metropolis:
            x_curr, y_curr = x_i, y_i

    <span class="hljs-keyword">return</span> [y_best, x_best]
</code></pre>
<p>Note that we don't have to count the iterations since we are using a for loop.</p>
<p>If we test the function we see that for well-chosen parameters the algorithm finds the value with a good approximation.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fun</span>(<span class="hljs-params">x</span>):</span>
    <span class="hljs-keyword">return</span> x**<span class="hljs-number">2</span> + np.sin(x**<span class="hljs-number">4</span>)

simulated_annealing(f = fun, domain = [<span class="hljs-number">-3</span>, <span class="hljs-number">3</span>], step_size = <span class="hljs-number">1</span>, start_temp = <span class="hljs-number">100</span>, max_iter = <span class="hljs-number">1000</span>)

<span class="hljs-comment">#&gt; [9.915548806706291e-08, -0.00031488962865622305]</span>
</code></pre>
<p>Finally, we plot what we got (the red line is the real minimum while the blue one is our result):</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> matplotlib <span class="hljs-keyword">import</span> pyplot <span class="hljs-keyword">as</span> plt

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fun</span>(<span class="hljs-params">x</span>):</span>
    <span class="hljs-keyword">return</span> x**<span class="hljs-number">2</span> + np.sin(x**<span class="hljs-number">4</span>)

x = np.linspace(<span class="hljs-number">-3</span>, <span class="hljs-number">3</span>, <span class="hljs-number">1000</span>)
y = fun(x)

plt.plot(x, y)

plt.axvline(x = <span class="hljs-number">0</span>, color = <span class="hljs-string">"blue"</span>, label = <span class="hljs-string">"real minimum"</span>)
plt.axvline(x = <span class="hljs-number">9.915548806706291e-08</span>, color = <span class="hljs-string">"red"</span>, label = <span class="hljs-string">"approximate minimum"</span>)
plt.legend(bbox_to_anchor = (<span class="hljs-number">1.0</span>, <span class="hljs-number">1</span>), loc = <span class="hljs-string">"upper left"</span>)

plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676015990766/bd4504c6-d319-49c5-9d69-2a3c3e991205.png" alt class="image--center mx-auto" /></p>
<p>In the picture, the approximate minimum overlaps the real minimum (they are too close) and only the approximate minimum is visible.</p>
<h3 id="heading-beyond-2d">Beyond 2D</h3>
<p>Of course, the algorithm work also in more than one dimension, but the function needs some adjustment. In particular, we have to define a domain for y:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">simulated_annealing_3d</span>(<span class="hljs-params">f, domain_x, domain_y, step_size, start_temp, max_iter = <span class="hljs-number">1000</span></span>):</span>

    x_0 = rd.uniform(domain_x[<span class="hljs-number">0</span>], domain_x[<span class="hljs-number">1</span>])
    y_0 = rd.uniform(domain_y[<span class="hljs-number">0</span>], domain_y[<span class="hljs-number">1</span>])
    z_0 = f(x_0, y_0)
    x_curr, y_curr, z_curr = x_0, y_0, z_0
    x_best, y_best, z_best = x_0, y_0, z_0

    <span class="hljs-keyword">for</span> n <span class="hljs-keyword">in</span> range(max_iter):
        x_i = x_curr + step_size * rd.uniform(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
        y_i = y_curr + step_size * rd.uniform(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
        z_i = f(x_i, y_i)

        <span class="hljs-keyword">if</span> z_i &lt; z_best:
            x_best, y_best, z_best = x_i, y_i, z_i

        diff = z_i - z_curr

        temp = start_temp / (n + <span class="hljs-number">1</span>)       
        metropolis = math.exp(-diff / temp)

        <span class="hljs-keyword">if</span> diff &lt;= <span class="hljs-number">0</span> <span class="hljs-keyword">or</span> rd.random() &lt; metropolis:
            x_curr, y_curr, z_curr = x_i, y_i, z_i

    <span class="hljs-keyword">return</span> [z_best, y_best, x_best]
</code></pre>
<p>Let's test the function:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fun_3d</span>(<span class="hljs-params">x, y</span>):</span>
    <span class="hljs-keyword">return</span> (x-y)**<span class="hljs-number">2</span> + (x+y)**<span class="hljs-number">2</span>

simulated_annealing_3d(f = fun_3d, domain_x = [<span class="hljs-number">-5</span>, <span class="hljs-number">5</span>], domain_y = [<span class="hljs-number">-5</span>, <span class="hljs-number">5</span>], step_size = <span class="hljs-number">1</span>, start_temp = <span class="hljs-number">1000</span>, max_iter = <span class="hljs-number">10000</span>)

<span class="hljs-comment">#&gt; [0.0007833147844967029, 0.018319454959260906, 0.007486986192318135]</span>
</code></pre>
<p>If we plot the result:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> matplotlib <span class="hljs-keyword">import</span> pyplot <span class="hljs-keyword">as</span> plt

x = np.linspace(<span class="hljs-number">-.1</span>, <span class="hljs-number">.1</span>, <span class="hljs-number">20</span>)
y = np.linspace(<span class="hljs-number">-.1</span>, <span class="hljs-number">.1</span>, <span class="hljs-number">20</span>)
X, Y = np.meshgrid(x, y)
Z = fun_3d(X,Y)

a = np.repeat(<span class="hljs-number">0</span>, <span class="hljs-number">50</span>)
b = np.repeat(<span class="hljs-number">0</span>, <span class="hljs-number">50</span>)
c = np.arange(<span class="hljs-number">0</span>, <span class="hljs-number">.05</span>, <span class="hljs-number">.001</span>)

a_ = np.repeat(<span class="hljs-number">0.015536251178558613</span>, <span class="hljs-number">50</span>) 
b_ = np.repeat(<span class="hljs-number">-0.014426988378332561</span>, <span class="hljs-number">50</span>)
c_ = np.arange(<span class="hljs-number">0</span>, <span class="hljs-number">.05</span>, <span class="hljs-number">.001</span>)

fig = plt.figure(figsize=(<span class="hljs-number">4</span>,<span class="hljs-number">4</span>))
ax = fig.add_subplot(<span class="hljs-number">111</span>, projection=<span class="hljs-string">'3d'</span>)
ax.plot_wireframe(X, Y, Z, color = <span class="hljs-string">"red"</span>, linewidth = <span class="hljs-number">.3</span>)
ax.plot(a, b, c, color = <span class="hljs-string">"blue"</span>, label = <span class="hljs-string">"real minimum"</span>)
ax.plot(a_, b_, c_, color = <span class="hljs-string">"green"</span>, label = <span class="hljs-string">"approximated minimum"</span>)


ax.set_xlabel(<span class="hljs-string">"x"</span>)
ax.set_ylabel(<span class="hljs-string">"y"</span>)
ax.set_zlabel(<span class="hljs-string">"z"</span>)
plt.legend(bbox_to_anchor = (<span class="hljs-number">1.0</span>, <span class="hljs-number">1</span>), loc = <span class="hljs-string">"upper left"</span>) 

plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676034097906/c5af8256-e82c-4c6a-9126-5498ba3fbbc7.png" alt class="image--center mx-auto" /></p>
<p>Zooming we can appreciate the error.</p>
<hr />
<p>And that's it for this article.</p>
<p>Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here</strong></a>.</p>
]]></content:encoded></item><item><title><![CDATA[Efficiently computing eigenvalues and eigenvectors in Python]]></title><description><![CDATA[Let \(M\) be an \(n \times n\) matrix. A scalar \(\lambda \) is an eigenvalue of \(M\) if there is a non-zero vector \(x\) (called eigenvector) s.t.:
$$M x = \lambda x$$Eigenvalues and eigenvectors are crucial in many fields of science. For example, ...]]></description><link>https://amm.zanotp.com/eigen-py</link><guid isPermaLink="true">https://amm.zanotp.com/eigen-py</guid><category><![CDATA[Python]]></category><category><![CDATA[linear algebra ]]></category><category><![CDATA[eigenvalues]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Mon, 20 Feb 2023 11:00:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/xAPS28sng4w/upload/f9a9e9bb6648195ebc538d9c27c1779f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Let \(M\) be an \(n \times n\) matrix. A scalar \(\lambda \) is an <em>eigenvalue</em> of \(M\) if there is a non-zero vector \(x\) (called <em>eigenvector</em>) s.t.:</p>
<p>$$M x = \lambda x$$</p><p>Eigenvalues and eigenvectors are crucial in many fields of science. For example, consider a discrete-time and discrete states Markov chain, whose <em>transition matrix</em> \(M\) is defined as follows:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676541307648/742b137d-3abf-402b-a139-242ad1ddf0da.png" alt class="image--center mx-auto" /></p>
<p>Let the <em>initial state vector</em> \(x_1\) be:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676456064591/4291e494-8276-49b0-9964-a4057b7d56f3.png" alt class="image--center mx-auto" /></p>
<p>We know that from \(M\) and \(x\_1\) we could compute all the successive states and it's true that:</p>
<p>$$x\_2 = M x \_1$$</p><p>$$x\_3 = M x\_2$$</p><p>and in general</p>
<p>$$x\k = M x\{k-1}$$</p><p>We may want to find a vector \(x\) s.t.</p>
<p>$$Mx = x$$</p><p>Vectors with this property as known as <em>steady-state vectors</em>. It can be demonstrated that finding <em>steady-state vectors</em> equals finding any eigenvector \(x\) with eigenvalue 1.</p>
<p>For example, the steady-state vector for the matrix \(M\) is:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676456085121/30e2a302-cb30-484d-b751-086abf0bee34.png" alt class="image--center mx-auto" /></p>
<p>and one can easily show that</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676541411713/3d24f86c-d90a-4b2a-bd45-7ff6b06a86c1.png" alt class="image--center mx-auto" /></p>
<p>Finding eigenvalues and eigenvectors is not always easy to do by hand, and there are some algorithms to compute them. Unfortunately, this calculation may be expensive, especially with large matrices, and the result may be inaccurate due to approximations.</p>
<p>However, some algorithms perform better than others, and I want to discuss some of them in this article.</p>
<h2 id="heading-solving-characteristic-equation">Solving characteristic equation</h2>
<p>We can rewrite \(M x = \lambda x\) as</p>
<p>$$M x-\lambda x = 0$$</p><p>$$( M-\lambda I)x = 0$$</p><p>This system has a non-trivial solution (i.e. \(x \neq 0\)) only if \(det(M-\lambda I) =0\). \(det(M-\lambda I) =0\) is known as the <em>characteristic equation</em>.</p>
<p>Expanding \(det(M-\lambda I) =0\) we obtain a polynomial of degree \(n\), whose roots are the eigenvalues of \(M\). Computing eigenvectors from eigenvalues is trivial: for each eigenvalue \(\lambda\), we just need to find the null space of the matrix \(M-\lambda I\).</p>
<p>This is how we compute eigenvalues and eigenvectors by hand, but following this approach on a computer leads to some problems:</p>
<ul>
<li><p>it depends on the computation of the determinant, which is a time-consuming process (due to the symbolic nature of the computation);</p>
</li>
<li><p>there is no formula for solving polynomial equations of degree higher than 4. Even though some techniques exist, like Newton's method, it's tough to find all the roots.</p>
</li>
</ul>
<p>Therefore we need a different approach.</p>
<h2 id="heading-iterative-methods">Iterative methods</h2>
<p>Unfortunately, there is no simple algorithm to directly compute eigenvalues and eigenvectors for general matrices (there are special cases of matrices where it's possible, but I won't cover them in this article).</p>
<p>However, there are iterative algorithms that produce sequences that <em>converge</em> to eigenvectors or eigenvalues. There are several variations of these methods, I'll just cover two of them: the <em>power method</em> and the <em>QR algorithm</em>.</p>
<h3 id="heading-the-power-method">The power method</h3>
<p>This method applies to matrices that have a <em>dominant eigenvalue</em> \(\lambda\_d\) <em>(i.e. an eigenvalue that is larger in absolute value than the other eigenvalues).</em></p>
<p>Let \(M\) be an \(n \times n\) matrix, the power method approximates a dominant eigenvector in the following steps:</p>
<p>$$x\_1 = Mx\_0$$</p><p>$$x\_2 = Mx\_1$$</p><p>$$x\k = Mx\{k-1}$$</p><p>And the more steps we take (i.e. the bigger \(k\) is) the more accurate will be our approximation. This is expressed in the following formula</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1676561958168/a8407e60-0fd3-438f-9167-7915f9615a89.png" alt class="image--center mx-auto" /></p>
<p>Once we have an approximation of the dominant eigenvector \(x\_d\) we find the corresponding dominant eigenvalue \(\lambda\_d\) with the Rayleigh quotient</p>
<p>$$\frac{(Mx)x}{xx} = \frac{(\lambda\_d x)x}{xx} = \frac{\lambda\_d (xx)}{xx}\lambda\_d$$</p><p>Once we have \(\lambda\_d\), we use the observation that if \(\lambda\) is an eigenvalue of \(M\), \(\lambda - \beta\) is an eigenvalue of \(M-\beta I\) for any scalar \(\beta\). We can then apply the power method to compute a second eigenvalue. Repeating this process will allow us to compute all of the eigenvalues.</p>
<p>In Python this is:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">power_method</span>(<span class="hljs-params">M, n_iter = <span class="hljs-number">100</span></span>):</span>
    n = M.shape[<span class="hljs-number">0</span>]
    x_d = np.repeat(<span class="hljs-number">.5</span>, n)
    lambda_d = n

    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_iter):
        x_0 = x_d
        x_d = np.matmul(M, x_0)
    lambda_d = np.matmul(np.matmul(M, x_d), x_d) / np.matmul(x_d, x_d)

    h = np.zeros((n, n), int)
    np.fill_diagonal(h, lambda_d)
    N = M - h 
    x_1 = np.array([<span class="hljs-number">1</span>, <span class="hljs-number">0</span>])
    lambda_1 = n

    <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(n_iter):
        x_0 = x_1
        x_1 = np.matmul(N, x_0)
    lambda_1 = np.matmul(np.matmul(M, x_1), x_1) / np.matmul(x_1, x_1)


    <span class="hljs-keyword">return</span> [[x_d, lambda_d], [x_1, lambda_1]]
</code></pre>
<p>The function above works only for \(2 \times2\) matrices, but can easily be modified to \(n\times n\) matrices. We now test the function:</p>
<pre><code class="lang-python">Matr = np.array([[<span class="hljs-number">1</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">2</span>, <span class="hljs-number">1</span>]])

power_method(Matr)
<span class="hljs-comment">#&gt; [[array([1, 0.81649658]), 3.449489742783178],</span>
<span class="hljs-comment">#&gt; [array([-1.22474487, 1]), -1.449489742783178]]</span>
</code></pre>
<p>We can even prove that those values represent a good approximation by checking the equation</p>
<p>$$Mx=\lambda x$$</p><p>Since this is an approximation, the <code>==</code> operator is not suited, we define instead the <code>is_close</code> function.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">is_close</span>(<span class="hljs-params">x, y</span>):</span>
    <span class="hljs-keyword">if</span> all(abs(x-y) &lt; <span class="hljs-number">1e-5</span>):
        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

Matr = np.array([[<span class="hljs-number">.7</span>, <span class="hljs-number">.2</span>], [<span class="hljs-number">.3</span>, <span class="hljs-number">.8</span>]])

sol = power_method(Matr)
lambda_a = sol[<span class="hljs-number">0</span>][<span class="hljs-number">1</span>]
lambda_b = sol[<span class="hljs-number">1</span>][<span class="hljs-number">1</span>]

x_a = sol[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]
x_b = sol[<span class="hljs-number">1</span>][<span class="hljs-number">0</span>]

print(is_close(np.matmul(Matr, x_a), lambda_a * x_a))
<span class="hljs-comment">#&gt; True</span>

print(is_close(np.matmul(Matr, x_b), lambda_b * x_b))
<span class="hljs-comment">#&gt; True</span>
</code></pre>
<p>Above we defined the algorithm as follows</p>
<p>$$x\k = Mx\{k-1}$$</p><p>We can notice that if</p>
<p>$$x\{k-1} = Mx\{k-2}$$</p><p>then we can substitute</p>
<p>$$x\k = MMx\{k-2}$$</p><p>By induction, we can prove that</p>
<p>$$x\k = M^kx{0}$$</p><p>We now use this formula to update the Python function above. The new function is the following:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">power_method_2</span>(<span class="hljs-params">M, n_iter = <span class="hljs-number">100</span></span>):</span>
    n = M.shape[<span class="hljs-number">0</span>]           
    x_d = np.array([<span class="hljs-number">1</span>, <span class="hljs-number">0</span>])

    M_k = np.linalg.matrix_power(M, n_iter) 
    M_k = M_k / np.max(M_k)
    x_d = np.matmul(M_k, x_d)
    x_d = x_d / np.max(x_d)     

    lambda_d = np.matmul(np.matmul(M, x_d), x_d) / np.matmul(x_d, x_d)

    D = np.zeros((n, n), float)
    np.fill_diagonal(D, lambda_d)
    N = M - D
    x_nd = np.array([<span class="hljs-number">1</span>,<span class="hljs-number">0</span>])

    N_k = np.linalg.matrix_power(N, n_iter) 
    N_k= N_k / np.max(N_k)
    x_nd = np.matmul(N_k, x_nd)
    x_nd = x_nd/np.max(x_nd)  

    lambda_nd = np.matmul(np.matmul(N, x_nd), x_nd) / np.matmul(x_nd, x_nd)
    lambda_nd = lambda_nd + lambda_d 

    <span class="hljs-keyword">return</span> [[x_d, lambda_d], [x_nd, lambda_nd]]
</code></pre>
<p>Again we test the function:</p>
<pre><code class="lang-python">Matr = np.array([[<span class="hljs-number">.7</span>, <span class="hljs-number">.2</span>], [<span class="hljs-number">.3</span>, <span class="hljs-number">.8</span>]])

sol_2 = power_method(Matr)
lambda_a = sol_2[<span class="hljs-number">0</span>][<span class="hljs-number">1</span>]
lambda_b = sol_2[<span class="hljs-number">1</span>][<span class="hljs-number">1</span>]

x_a = sol_2[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]
x_b = sol_2[<span class="hljs-number">1</span>][<span class="hljs-number">0</span>]

print(is_close(np.matmul(Matr, x_a), lambda_a * x_a))
<span class="hljs-comment">#&gt; True</span>

print(is_close(np.matmul(Matr, x_b), lambda_b * x_b))
<span class="hljs-comment">#&gt; True</span>
</code></pre>
<p>Once we are sure both the functions work correctly, we can now test which has a better performance.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> timeit

%timeit power_method(Matr)
<span class="hljs-comment">#&gt; 558 µs ± 32.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)</span>

%timeit power_method_2(Matr)
<span class="hljs-comment">#&gt; 144 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)</span>
</code></pre>
<p>And we have a winner: the second function is 3.875 times faster than the first one.</p>
<h3 id="heading-the-qr-algorithm">The QR algorithm</h3>
<p>One of the best methods for approximating the eigenvalues and the eigenvectors of a matrix applies the <em>QR factorization</em> and for this reason is known as the <em>QR algorithm</em>.</p>
<p>Let \(M\) be an \(n\times n\) matrix, first of all, we need to factor it as</p>
<p>$$M = Q\_0R\_0$$</p><p>then we set</p>
<p>$$M\_1 = R\_0Q\_0$$</p><p>We then factor \(M\_1 = Q\_1R\_1\) and define \(M\_2 = R\_1Q\_1\) and so on.</p>
<p>It can be proven that \(M\) is similar to \(M\_1,M\_1, \dots, M\_k\), which means \(M \) and \(M\_1,M\_1, \dots, M\_k\) have the same eigenvalues.</p>
<p>It can also be shown that the matrices \(M\_k\) converge to a triangular matrix \(T\) and that the elements on the diagonal are the eigenvalues of \(M\_k\).</p>
<p>In Python this is:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">QR_argo</span>(<span class="hljs-params">M, n_iter = <span class="hljs-number">100</span></span>):</span>
    n = M.shape[<span class="hljs-number">1</span>]
    Q_k = np.linalg.qr(M)[<span class="hljs-number">0</span>]
    R_k = np.linalg.qr(M)[<span class="hljs-number">1</span>]
    e_values = []

    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_iter):
        M_k = np.matmul(R_k, Q_k)
        Q_k = np.linalg.qr(M_k)[<span class="hljs-number">0</span>]
        R_k = np.linalg.qr(M_k)[<span class="hljs-number">1</span>]

    <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(M_k.shape[<span class="hljs-number">1</span>]):
        e_values.append(M_k[j, j])

    <span class="hljs-keyword">return</span> e_values
</code></pre>
<p>We can now test the function and compare it to the power method.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">is_close</span>(<span class="hljs-params">x, y</span>):</span>
    <span class="hljs-keyword">if</span> abs(x-y) &lt; <span class="hljs-number">1e-5</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

Matr = np.array([[<span class="hljs-number">1</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">2</span>, <span class="hljs-number">1</span>]])

pow_lambda_a = power_method(Matr)[<span class="hljs-number">0</span>][<span class="hljs-number">1</span>]
pow_lambda_b = power_method(Matr)[<span class="hljs-number">1</span>][<span class="hljs-number">1</span>]
QR_lambda_a = QR_argo(Matr)[<span class="hljs-number">0</span>]
QR_lambda_b = QR_argo(Matr)[<span class="hljs-number">1</span>]

is_close(QR_lambda_a, pow_lambda_a)
<span class="hljs-comment">#&gt; True</span>

is_close(QR_lambda_b, pow_lambda_b)
<span class="hljs-comment">#&gt; True</span>
</code></pre>
<p>Once we have eigenvalues \(\lambda\_i\), computing eigenvectors is easy: they are the <em>non-trivial</em> solution of</p>
<p>$$(M-\lambda\_i I) x=0$$</p><hr />
<p>And that's it for this article.</p>
<p>Thanks for reading.</p>
<p>For any questions or suggestions related to what I covered in this article, please add them as a comment. In case of more specific inquiries, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact"><strong>here</strong></a>.</p>
]]></content:encoded></item><item><title><![CDATA[An introduction to PRNGs with Python and R]]></title><description><![CDATA[Life's most important questions are, for the most part, nothing but probability problems.
Pierre-Simon de Laplace

Introduction
Imagine this scenario: you and your brother want to go to the cinema. Two movies are played: Interstellar (the one you wan...]]></description><link>https://amm.zanotp.com/an-introduction-to-prngs-with-python-and-r</link><guid isPermaLink="true">https://amm.zanotp.com/an-introduction-to-prngs-with-python-and-r</guid><category><![CDATA[statistics]]></category><category><![CDATA[Cryptography]]></category><category><![CDATA[randomness]]></category><category><![CDATA[random numbers]]></category><category><![CDATA[prngs]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Sun, 29 Jan 2023 23:00:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/aaSTQ-wY5DQ/upload/2a831240ef9f9415c1cb90dc01525477.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>Life's most important questions are, for the most part, nothing but probability problems.</p>
<p>Pierre-Simon de Laplace</p>
</blockquote>
<h1 id="heading-introduction">Introduction</h1>
<p>Imagine this scenario: you and your brother want to go to the cinema. Two movies are played: Interstellar (the one you want to see) and A Clockwork Orange (that your brother wants to see).</p>
<p>The classic solution to this problem is flipping a coin, but since we are not unimaginative people (or we don't have a coin) we may want to find a more elegant solution.</p>
<p>Thus let's define a program that decides what to see in R and Python. The program will generate a number between 0 and 1. If this number is closer to 0 we watch Interstellar. Otherwise, A Clockwork Orange is chosen.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> random <span class="hljs-keyword">as</span> rd

x = rd.uniform(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>)

<span class="hljs-keyword">if</span> x &lt; <span class="hljs-number">.5</span>:
    print(<span class="hljs-string">"Interstellar"</span>)
<span class="hljs-keyword">else</span>:
    print(<span class="hljs-string">"A Clockwork Orange"</span>)
</code></pre>
<p>We now do the same in R:</p>
<pre><code class="lang-r">x &lt;- runif(<span class="hljs-number">1</span>)

ifelse(x &lt; <span class="hljs-number">.5</span>, <span class="hljs-string">"Interstellar"</span>, <span class="hljs-string">"A Clockwork Orange"</span>)
</code></pre>
<p>Fair enough, but there is something paradoxical in the previous examples: a computer, a perfectly <em>deterministic machine</em>, is creating something <em>randomly</em>.</p>
<p>Fair enough, but there is something paradoxical in the previous examples: a computer, a perfectly <em>deterministic machine</em>, is creating something <em>randomly</em>.</p>
<p>In this article, I want to introduce you to <strong>pseudorandom number generators</strong> and their application.</p>
<h2 id="heading-determinism-versus-randomness">Determinism versus randomness</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674834068845/a7d1ce92-4841-42cc-9cc3-18101b50b441.jpeg" alt class="image--center mx-auto" /></p>
<p>Above I wrote "computer, a perfectly <em>deterministic machine</em>", but what does it mean to be deterministic?</p>
<p>In brief, computers are deterministic because they follow a set of instructions, or a program, in a predictable manner, i.e. given some inputs they return always the same output. The paradox lies in the fact that in the above example, <code>x = rd.uniform(0, 1)</code> and <code>x &lt;- runif(1)</code> return a different value every time the line is compiled.</p>
<p>Are <code>x = rd.uniform(0, 1)</code> and <code>x &lt;- runif(1)</code> exceptions to the deterministic property of computers?</p>
<p>The answer is no, and in a minute I'll explain the reasons behind that.</p>
<h2 id="heading-what-is-randomness">What is randomness</h2>
<p>Before diving into PRNGs we need to define <strong>randomicity</strong>. We usually call random a sequence of numbers with the following trait:</p>
<ul>
<li><p><strong>lack of pattern</strong>: a random sequence should not have any discernible structure;</p>
</li>
<li><p><strong>independence</strong>: the numbers in a random sequence should not be affected by one another;</p>
</li>
<li><p><strong>unpredictability</strong>: a random sequence of numbers should not be able to be predicted or reconstructed.</p>
</li>
</ul>
<p>It's important to notice that randomicity is a complex concept and it's hard to quantify it precisely. Therefore it's common to use statistical tests to evaluate the randomness of a sequence of numbers, but this is beyond the scope of this article.</p>
<p>Random number generators are mathematical algorithms or mechanical devices that produce a sequence that follows the above properties.</p>
<p>As you may suppose, there are two types of random number generators:</p>
<ul>
<li><p><strong>true random number generators</strong> (TRNGs from now on)</p>
</li>
<li><p><strong>pseudorandom number generators</strong> (PRNGs from now on)</p>
</li>
</ul>
<p>In this article, I'll just cover PRNGs but be aware that TRNGs exist and have important applications in many fields such as gaming, gambling and cryptography.</p>
<h1 id="heading-pseudorandom-number-generators-prngs">Pseudorandom number generators (PRNGs)</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674834995478/6742e995-0396-4aa0-827b-2c3363249970.jpeg" alt class="image--center mx-auto" /></p>
<p>As the name suggests, pseudorandom number generators are a type of software used to generate a sequence of numbers that <em>mimic</em> the properties of truly random numbers. The algorithm takes an initial input (the <strong>seed</strong>) which produces a sequence. The <strong>seed</strong> is what <em>determines</em> the sequence of numbers, for example, if we set the <code>1234</code> seed, compiling multiple times the following lines of code, <code>x</code> remains the same. In Python this is:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> random <span class="hljs-keyword">as</span> rd

rd.seed(<span class="hljs-number">1234</span>)

x = rd.uniform(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>)

print(x)
</code></pre>
<p>The same is true for the following R code:</p>
<pre><code class="lang-r">set.seed(<span class="hljs-number">1234</span>)

x &lt;- runif(<span class="hljs-number">1</span>)

print(x)
</code></pre>
<h2 id="heading-properties">Properties</h2>
<p>The goodness of a PRNG is given by its properties. The most important properties are:</p>
<ul>
<li><p><strong>periodicity</strong>: PRNGs will generate a sequence of numbers that repeats itself after a certain number of iterations, known as the <em>period</em>. A PRNG with a long period is more desirable than one with a shorter period;</p>
</li>
<li><p><strong>uniformity</strong>: PRNGs generate numbers that are distributed uniformly across the range of possible values;</p>
</li>
<li><p><strong>independence</strong>: the numbers generated by a PRNG should be independent of one another;</p>
</li>
<li><p><strong>randomness</strong>: the numbers generated by a PRNG should not have any discernible patterns;</p>
</li>
<li><p><strong>seed-ability</strong>: PRNGs should be able to be seeded with an initial value in order to produce a different sequence.</p>
</li>
</ul>
<h2 id="heading-two-prngs-algorithms">Two PRNGs algorithms</h2>
<p>In this section, I want to present the most known PRNGs algorithms to practically show how PRNGs look like. I will present the <strong>middle square algorithm</strong> and the <strong>linear congruential generator algorithms</strong>.</p>
<h3 id="heading-middle-square-algorithm">Middle square algorithm</h3>
<p>Proposed by von Neumann, the middle square algorithm takes a <strong>seed</strong> that is squared and its midterm is fetched as the random number. Let's discuss an example and then implement it in Python and R.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>seed</td><td>square</td><td>random number</td></tr>
</thead>
<tbody>
<tr>
<td>12</td><td>0<strong>14</strong>4</td><td>14</td></tr>
<tr>
<td>33</td><td>1<strong>08</strong>9</td><td>08</td></tr>
<tr>
<td>24</td><td>0<strong>57</strong>6</td><td>57</td></tr>
<tr>
<td>66</td><td>4<strong>35</strong>6</td><td>35</td></tr>
</tbody>
</table>
</div><p>Usually, the algorithm is repeated more than once, i.e the random number becomes the new seed and is then squared and its midterm becomes the random number and so on.</p>
<p>Here is an implementation in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">middle_square_algo</span>(<span class="hljs-params">seed</span>):</span>

    <span class="hljs-comment"># first of all we square the seed</span>
    square = str(np.square(seed))

    <span class="hljs-comment"># then we need to take the mid-term, we have two possibilities</span>
    <span class="hljs-comment"># the square may have an even number of digits:</span>
    <span class="hljs-keyword">if</span> len(square) % <span class="hljs-number">2</span> == <span class="hljs-number">0</span>:
        half_str = int(len(square) / <span class="hljs-number">2</span>)

    <span class="hljs-comment"># the number has an odd number of digits:</span>
    <span class="hljs-keyword">else</span>:
        half_str = int(len(square) / <span class="hljs-number">2</span> - <span class="hljs-number">.5</span>)


    mid = square[half_str - <span class="hljs-number">1</span> : half_str + <span class="hljs-number">1</span>]
    <span class="hljs-keyword">return</span> int(mid)

<span class="hljs-comment"># finally the testing:</span>

print(middle_square_algo(<span class="hljs-number">12</span>))

<span class="hljs-comment">#&gt; 14</span>
</code></pre>
<p>And here is the R code:</p>
<pre><code class="lang-r">middle_square_algo &lt;- <span class="hljs-keyword">function</span>(seed){

  <span class="hljs-comment"># first of all we square the seed</span>
  square &lt;- seed^<span class="hljs-number">2</span>

  <span class="hljs-comment"># we now need to get the number of digits of square</span>
  len &lt;- nchar(square)

  <span class="hljs-comment"># we have two possible scenarios</span>
  <span class="hljs-comment"># len is even:</span>
  <span class="hljs-keyword">if</span>(len %% <span class="hljs-number">2</span> == <span class="hljs-number">0</span>){

    half_square &lt;- len / <span class="hljs-number">2</span>

  <span class="hljs-comment"># len is odd:  </span>
  } <span class="hljs-keyword">else</span>{

    half_square &lt;- len / <span class="hljs-number">2</span> + <span class="hljs-number">.5</span>

  }
  square &lt;- as.character(square)
  mid &lt;- substr(square, half_square, half_square + <span class="hljs-number">1</span>)

  <span class="hljs-keyword">return</span>(as.double(mid))
}

<span class="hljs-comment"># finally the testing:</span>
print(middle_square_algo(<span class="hljs-number">33</span>))

<span class="hljs-comment">#&gt; 8</span>
</code></pre>
<p>Assuming now that we want to loop more than one time the algorithm, the Python code is:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">middle_square_algo_deep</span>(<span class="hljs-params">seed, deep</span>):</span>

    <span class="hljs-comment"># we just need to repeat what we did before but more than one time</span>

    <span class="hljs-keyword">for</span> rep <span class="hljs-keyword">in</span> range(deep):
        seed = int(middle_square_algo(seed))
    <span class="hljs-keyword">return</span> seed

<span class="hljs-comment"># finally the testing:      </span>
middle_square_algo_deep(<span class="hljs-number">33</span>, <span class="hljs-number">3</span>)

<span class="hljs-comment">#&gt; 9</span>
</code></pre>
<p>And similarly, the R code is:</p>
<pre><code class="lang-r">middle_square_algo_deep &lt;- <span class="hljs-keyword">function</span>(seed, deep=<span class="hljs-number">2</span>){

  <span class="hljs-comment"># we just need to repeat what we did before but more than one time</span>

  <span class="hljs-keyword">for</span>( rep <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:deep){
    seed &lt;- middle_square_algo(seed)
  }

  <span class="hljs-keyword">return</span>(seed)
}

<span class="hljs-comment"># finally the testing:</span>
middle_square_algo_deep(<span class="hljs-number">33</span>, <span class="hljs-number">3</span>)

<span class="hljs-comment">#&gt; 9</span>
</code></pre>
<p>The most important weakness of this algorithm is that it needs an appropriate starting seed. In fact, some seeds have a really short period.</p>
<p>For example the seed <code>50</code> has a really short period (1) as shown in the following lines of code:</p>
<pre><code class="lang-r">middle_square_algo_deep(<span class="hljs-number">50</span>, <span class="hljs-number">1</span>)
<span class="hljs-comment">#&gt; 50</span>

middle_square_algo_deep(<span class="hljs-number">50</span>, <span class="hljs-number">2</span>)
<span class="hljs-comment">#&gt; 50</span>

middle_square_algo_deep(<span class="hljs-number">50</span>, <span class="hljs-number">3</span>)
<span class="hljs-comment">#&gt; 50</span>

middle_square_algo_deep(<span class="hljs-number">50</span>, <span class="hljs-number">4</span>)
<span class="hljs-comment">#&gt; 50</span>
</code></pre>
<h3 id="heading-linear-congruential-generators">Linear congruential generators</h3>
<p>The linear congruential generators (LCGs) are a family of PRNGs and are probably the most used approach to generate pseudorandom numbers. The algorithms are defined by a linear congruential equation as the following one:</p>
<p>$$x_{n+1} = ax_n + b \space \space mod(y)$$</p><p>where \(a\), \(b\) and \(c\) are positive integers and we also need a <strong>seed</strong>.</p>
<p>Let's now consider (and then implement) a particular LCG: the <strong>Lagged Fibonacci Generator</strong> (LFG).</p>
<p>$$x\_{n+1} = a\1 x\{n-1} + a\2 x\{n-j} + b \space \space mod(y)$$</p><p>We just need to provide LFG from \(x\_1\) to \(x\_{max(i, j)+1}\) and it will generate a pseudorandom sequence of numbers.</p>
<p>Let me make an example to clear your mind. Let the following equation be our LFG:</p>
<p>$$x\{n+1} = x\{n-3} + x\_{n-5} \space \space mod(10)$$</p><p>and let's say we want to generate a sequence of random numbers between 1 and 9 from the initial seed [4, 2, 9, 5, 5].</p>
<p>The sequence starts from \(x\_6\) (you can easily prove that the values before \(x_6\) don't exist by cause of \(max(i, j) = 5\)).</p>
<p>Thus the sequence is:</p>
<p>$$x\{6} = x\{3} + x\_{1} \space \space mod(10) = 9 + 4 \space \space mod(10) = 3$$</p><p>$$x\{7} = x\{4} + x\_{2} \space \space mod(10) = 5 + 2 \space \space mod(10) = 7$$</p><p>$$x\{8} = x\{5} + x\_{3} \space \space mod(10) = 5 + 9 \space \space mod(10) = 4$$</p><p>and so on.</p>
<p>We now implement the LFG in Python and R. In Python the algorithm is something like this:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lagged_fib_gen</span>(<span class="hljs-params">seed, i, j, mod, length, a_1 = <span class="hljs-number">1</span>, a_2 = <span class="hljs-number">1</span>, c = <span class="hljs-number">0</span></span>):</span>
    l_f = seed

        <span class="hljs-comment"># we suppose that i &lt; j</span>

    <span class="hljs-keyword">for</span> rep <span class="hljs-keyword">in</span> range(max([i, j]) + <span class="hljs-number">1</span>, length + <span class="hljs-number">1</span>):

        x = (a_1 * l_f[rep - i - <span class="hljs-number">1</span>] + a_2 * l_f[rep - j - <span class="hljs-number">1</span>]) % <span class="hljs-number">10</span>
        l_f.append(x)

    <span class="hljs-keyword">return</span> l_f

<span class="hljs-comment"># finally the testing:</span>
lagged_fib_gen([<span class="hljs-number">4</span>, <span class="hljs-number">2</span>, <span class="hljs-number">9</span>, <span class="hljs-number">5</span>, <span class="hljs-number">5</span>], <span class="hljs-number">3</span>, <span class="hljs-number">5</span>, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>)

<span class="hljs-comment">#&gt; [4, 2, 9, 5, 5, 3, 7, 4, 8, 2]</span>
</code></pre>
<p>In R the algorithm is:</p>
<pre><code class="lang-r">lagged_fib_gen &lt;- <span class="hljs-keyword">function</span>(seed, i, j, mod, length, a_1 = <span class="hljs-number">1</span>, a_2 = <span class="hljs-number">1</span>, c = <span class="hljs-number">0</span>){

  l_f &lt;- seed

  <span class="hljs-keyword">for</span>(rep <span class="hljs-keyword">in</span> (max(c(i, j))+<span class="hljs-number">1</span>):length){

    x &lt;- (a_1 * l_f[rep - i] + a_2 * l_f[rep - j]) %% mod
    l_f[rep] &lt;- x
  }
  <span class="hljs-keyword">return</span>(l_f)
}

<span class="hljs-comment"># finally the testing:</span>
lagged_fib_gen(c(<span class="hljs-number">4</span>, <span class="hljs-number">2</span>, <span class="hljs-number">9</span>, <span class="hljs-number">5</span>, <span class="hljs-number">5</span>), <span class="hljs-number">3</span>, <span class="hljs-number">5</span>, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>)

<span class="hljs-comment">#&gt; 4 2 9 5 5 3 7 4 8 2</span>
</code></pre>
<p>As for the <strong>middle square algorithm</strong>, the efficiency of LGCs depends on the chosen parameters.</p>
<h1 id="heading-applications-of-prngs">Applications of PRNGs</h1>
<p>Now we know what PRNGs are, but what are they used for? Well they have many applications, some examples include:</p>
<ul>
<li><p>cryptography: random numbers are used to generate encryption keys (the PRNGs used in cryptography are much more complex than the two I showed before);</p>
</li>
<li><p>modelling: many scientific simulations use random numbers to represent uncertainty;</p>
</li>
<li><p>gaming: random numbers are used to make games less predictable and complex (e.g. biomes generation in Minecraft);</p>
</li>
<li><p>randomized algorithms: some algorithms use randomness to solve problems more efficiently (e.g. the famous Randomized Hill Climbing algorithm).</p>
</li>
</ul>
<h1 id="heading-to-go-further">To go further</h1>
<p>As you may imagine, the world of PRNGs is quite vast and complex and has applications in almost every field of science. This article doesn't want to be exhaustive on the topic and is no more than a gentle introduction to PRNGs. To go further there are many resources online but <a target="_blank" href="https://seriouscomputerist.atariverse.com/media/pdf/book/Art%20of%20Computer%20Programming%20-%20Volume%202%20(Seminumerical%20Algorithms).pdf">The Art of Computer Programming - Seminumerical Algorithms</a> by D. Knuth and <a target="_blank" href="https://cran.r-project.org/web/packages/randtoolbox/vignettes/fullpres.pdf">this</a> CRAN vignette are great starting points.</p>
<p>Thanks for reading.</p>
<p>For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me <a target="_blank" href="http://amm.zanotp.com/contact">here</a>.</p>
]]></content:encoded></item><item><title><![CDATA[print("hello world")]]></title><description><![CDATA[Hello world, and welcome to this blog. This article is just an introduction to Algomath μse (is pronounced as "muse"), but first let me introduce myself: my name is Pietro Zanotta and currently I am an economics student in Switzerland (to read more s...]]></description><link>https://amm.zanotp.com/hello-world</link><guid isPermaLink="true">https://amm.zanotp.com/hello-world</guid><category><![CDATA[Blogging]]></category><category><![CDATA[print("hello word")]]></category><category><![CDATA[blog]]></category><dc:creator><![CDATA[Pietro Zanotta]]></dc:creator><pubDate>Fri, 27 Jan 2023 10:47:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/9NpzkH9lb0o/upload/c8400fa0473de5adab52c9f7fb3e679d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello world, and welcome to this blog. This article is just an introduction to Algomath μse (is pronounced as "muse"), but first let me introduce myself: my name is Pietro Zanotta and currently I am an economics student in Switzerland (to read more see <a target="_blank" href="https://amm.zanotp.com/about">here</a>). My main interests are programming and statistics and this blog is my chance to tell the world what I'm learning in my free time.</p>
<p>Let me take a step back and explain why I decided to start this journey in blogging. In January 2023 I wrote an article about web scraping in R (you find it <a target="_blank" href="https://statsandr.com/blog/web-scraping-in-r/">here</a>) and I discovered how useful is to share knowledge to deeply understand a topic. In fact not only it requires a deep comprehension of the topic but also great summary skills. Therefore, I decided to create my own blog and here we are.</p>
<p>Embark on this captivating journey with me, as we explore the boundless horizons of science. Together, we'll uncover the extraordinary in the ordinary and ignite our curiosity to new heights.</p>
]]></content:encoded></item></channel></rss>