Computing Gradients of Quantum Circuits using Parameter Shift Rule

Photo by Sunguk Kim on Unsplash

Computing Gradients of Quantum Circuits using Parameter Shift Rule

Variational Quantum Algorithms (VQAs) are regarded as one of the most promising approaches for leveraging near-term quantum devices as they combine the power of quantum circuits with classical optimization to solve problems in chemistry, material science, machine learning, and beyond. A key challenge in the successful implementation of VQAs, such as the Variational Quantum Eigensolver (VQE) and Quantum Approximate Optimization Algorithm (QAOA), is the ability to efficiently optimize the parameters that govern quantum circuits. This is where the shift rule plays a critical role.

The shift rule provides an analytical and efficient method for computing gradients of parameterized quantum circuits with respect to their parameters. These gradients are essential for guiding the classical optimizer in the search for the optimal parameters that minimize the cost function. Without an accurate and efficient gradient computation method, the optimization would either be too slow or too noisy, leading to suboptimal results. In this context, the shift rule not only enhances the accuracy of the optimization but also enables its scalability, making it a foundational tool for advancing VQA methodologies in near-term quantum computing.

This blog post briefly discuss VQAs and focuses and one of the various shift rule implementations.

Variational Quantum Algorithms

Variational Quantum Algorithms (VQAs) are hybrid quantum-classical approaches that aim to solve optimization problems by minimizing a cost function over a parameterized quantum circuit. The quantum circuit, often referred to as a parameterized quantum circuit (PQC), depends on a set of tunable parameters \({\theta} = (\theta_1, \theta_2, \dots, \theta_n)\). The goal is to find the optimal values of these parameters that minimize a given cost function \( C(\theta)\), which is typically the expectation value of a problem-specific quantum observable.

Mathematically, the cost function can be expressed as:

$$C(\theta) = \bra {\psi(\theta)} H\ket{\psi(\theta)}$$

where

  • \(\ket {\psi(\theta)}\)is the quantum state produced by the PQC,

  • \(H\) is the problem Hamiltonian

Two of the most famous and important VQAs are QAOA and VQEs. Next paragraph is devoted to give an introduction to both.

Variational Quantum Eigensolver

The Variational Quantum Eigensolver (VQE) is a VQA that combines the variational principle from quantum mechanics with classical optimization techniques. VQE is particularly useful for problems in quantum chemistry and materials science, where finding the ground state energy of a Hamiltonian is a central task.

In a VQE setting, the cost function is represented by the energy expectation value, formulated as:

$$E(\theta)=\bra{\psi(\theta)}H\ket{\psi(\theta)}$$

According to the variational principle, this energy is always greater than or equal to the ground state energy \(E_0\)​, meaning:

$$E(\theta)\geq E_0 ​$$

Thus, the goal is to minimize \(E(\theta)\) by adjusting the parameters \(\theta\) to find the state \(\ket{\psi(\theta)}\) that approximates the ground state.

The workflow is the following:

  • define a parametrized unitary \(U(\theta)\) to prepare \(\ket{\psi(\theta)}\)

  • measure the expectation value of the system’s Hamiltonian \(H\) computing \( E(\theta)=\bra{\psi(\theta)}H\ket{\psi(\theta)}\) or the gradient of the expectation value (with shift rule or with numerical methods)
  • provide the measured expectation value \(E(\theta)\) or its gradient to a classical optimization algorithm and iterate until the minimum energy is found.

Quantum Approximate Optimization Algorithm

The Quantum Approximate Optimization Algorithm (QAOA) is a hybrid quantum-classical algorithm designed to solve combinatorial optimization problems, such as the Max-Cut problem. QAOA is again a VQA so it uses a parameterized quantum circuit to encode the problem and iteratively improves the solution by tuning the parameters. It has been proposed as a promising algorithm for near-term quantum devices due to its robustness against noise and hardware limitations.

QAOA aims to minimize a classical objective function, \(C(z)\) where \(z \in \{0,1\}^n\) represents the solution to the optimization problem encoded as a binary string. The goal is to find the string \(\hat z\) that minimizes the objective function.

The algorithm constructs a quantum state that approximates the solution by evolving under two Hamiltonians: a problem Hamiltonian \(H_C\)​ and a mixer Hamiltonian \(H_M\).

The former is meant to represent the cost function of the classical problem, for example for the max-cut problem the classical cost function is:

$$C(z)= \sum_{⟨i,j⟩}w_{ij}(1−z_iz_j)$$

where \(w_{ij}\) is the weight of the edge between vertices \(i\) and \(j\), \(\langle i,j⟩\)denotes the set of all pairs of vertices that are connected by an edge and \(z_i\) is the binary variable representing the partition of vertex \(i\).

The corresponding problem Hamiltonian is formulated in terms of quantum operators as:

$$H_C\sum_{ ⟨i,j⟩}w_{ij}\frac {1-Z_iZ_j} 2$$

where \(Z_i\) is Pauli-Z operator acting on qubit \(i\).

Coming to the mixer Hamiltonian, it should encourage transitions between different solutions by applying a mixing operator that typically consists of Pauli-X gates. The mixer Hamiltonian can defined as:

$${H}_M = \sum_{i} {X}_i$$

where \(X_i\) is the Pauli-X operator acting on qubit \(i\), responsible for flipping the bit \(z_i\).

QAOA workflow alternates between evolving under the problem and mixer Hamiltonians for a series of \(p\) layers, which results in the following quantum state:

$$\ket{ψ(γ,β)}=∏_{j=1}^p​e^{−iβ_j​H_M}​e^{-iγ_j​H_C}​H^{⊗n}\ket0^{⊗n}$$

where:

  • \(\gamma\) are parameter controlling the controlling the evolution under the problem Hamiltonian

  • \(\beta\) are the parameters controlling the evolution under the mixer Hamiltonian

  • \(H^{⊗n}\ket0^{⊗n}\) is the n-fold Hadamard applied on a n-qubit register.

The goal of QAOA is to optimize the parameters \(\gamma\) and \(\beta\) such that the quantum state maximizes the expectation value of the objective function \(C(\gamma, \beta)\), which corresponds to the expectation value of the problem Hamiltonian:

$$C(\gamma, \beta)=\bra{ψ(γ,β)} H_C \ket{ψ(γ,β)}$$

And similarly to VQE, by iteratively updating \(\gamma\) and \(\beta\) using a classical optimizer, the algorithm converges to an approximate solution to the optimization problem.

Gradient Computation Using the Shift Rule

As we have seen in the above example, VQA sometimes requires the quantum circuit to evaluate the gradient of the cost function rather than the function itself. Shift rule is useful because it avoids approximations typically used in numerical differentiation (like finite differences).

The next sections describe one of the formulations proposed for shift rule.

We are interested in the expectation value of an observable \(H\) (e.g., a Hamiltonian) with respect to the quantum state \(\ket{\psi(\theta)}\), which is the state prepared by applying \(U(\theta)\) to the initial state \(\ket0^{\otimes n}\). The expectation value is:

$$C(θ)=\bra{ψ(θ)}H\ket{ψ(θ)}=\bra0U^\dagger(θ)HU(θ)\ket0$$

The goal is to compute the derivative w.r.t. \(\theta\) of \(C(θ)\), so that this information can be supplied to a classical optimizer.

The derivative is

$$\frac{∂C(θ)}{∂_i}​=\frac{\partial〈ψ| U^† ˆQU |ψ〉} {\partial \theta_i}= 〈ψ| U^† Q(\frac{\partial G}{\partial \theta_i}) |ψ〉+ 〈ψ|(\frac{\partial G}{\partial \theta_i})^\dagger QU |ψ〉$$

assuming the parameter \(\theta_i\) only affects a single gate.

Parameter-shift rule for gates with generators with two distinct eigenvalues

If we are given a parametrized quantum gate \(U \) with a parameter \(\theta\) of the form:

$$U(θ_k)=e^{−iθ_kG}$$

where \(G\) is a Hermitian operator, it’s trivial to prove that:

$$\frac{\partial U(θ_k)}{\partial \theta_k}=-iGU(\theta_k)$$

and substituting into the derivative of the circuit equation what we get is:

$$\frac{∂C(θ)}{∂θ_k}​= -i〈ψ| U^† HGU(\theta_k) |ψ〉-i 〈ψ|HU(\theta_k)^\dagger G^\dagger |ψ〉$$

If \(G\) has at most two unique eigenvalues \(\pm r\).

Moreover, one may prove with some algebra that for every operators A, B and Hermitian observable Q:

$$\bra \psi A^\dagger Q B\ket \psi + \bra \psi B^\dagger Q A\ket \psi = \frac 12(\bra \psi (A+B)^\dagger Q (A+B)\ket \psi + \bra \psi (A-B)^\dagger Q (A-B)\ket \psi)$$

therefore, using \(A = I\) and \(B = -ir^{-1}G\) in our problem:

$$\frac{∂C(θ)}{∂θ}​= \frac r2(\bra \psi (I-ir^{-1}G)^\dagger H (I-ir^{-1}G)\ket \psi + \bra \psi (I+ir^{-1}G)^\dagger H(I+ir^{-1}G)\ket \psi)$$

It is also possible to show that for such special \(G\):

$$U(\frac \pi {4r}) = \frac{I-ir^{-1}G}{\sqrt{2}}$$

Hence the partial derivative of the cost function can be estimated place either the gate \(U(\frac \pi{\sqrt 2})\) or \(U(-\frac \pi{\sqrt 2})\) after the gate to be differentiated.

However, since:

$$U(a)U(b) = G(a+b)$$

when \(G \) is a unitarily generated one parameter gate, by substitution this leads to the parameter shift rule:

$$\frac{∂C(θ)}{∂θ_k}​= r(〈ψ| U^†(\theta_k +s) HGU^†(\theta_k +s) |ψ〉- 〈ψ|HU(\theta_k-s)^\dagger U^†(\theta_k -s) |ψ)$$

which is equivalent to:

$$\frac{∂C(θ)}{∂θ_k}​= r(C(\theta +s) - C(\theta-s))$$

where \(s = \frac{\pi}{4r}\).

If the parameter \(\theta_k\) appears in more than a single gate in the circuit, the derivative is obtained using the product rule by shifting the parameter in each gate separately and summing the results.

Differentiation of general gates via linear combination of unitaries

If the \(U(\theta) \) doesn’t has the above mentioned form, we may evaluate the partial derivative of the cost function using an ancilla qubit. The idea is to express \(U(\theta)\) as a linear combination of unitary matrices \(A_j\).

The derivative then becomes:

$$\partial_{\theta_k} C(\theta) = \sum_j a_j (\bra \psi U^\dagger H A_j\ket \psi+ \bra \psi A_j^\dagger H U\ket \psi)$$

where \(a_j\) are real values.

The circuit requires to be instantiated in the:

$$\ket + \ket \psi$$

state and then the controlled \(U\) is applied upon the \(\ket \psi\) state (conditioned on \(\ket \psi\) being in \(\ket 0\) state), followed by a controlled \(A_k\) on the same state (conditioned on \(\ket \psi\) being in \(\ket 1\) state), which results in:

$$\frac 1{\sqrt{2}}(\ket 0 U\ket\psi + \ket 1 A_k\ket\psi)$$

Once another Hadamard gate is applied on the ancilla, the resulting state is:

$$\frac 14 (\ket 0 [G+A_k]\ket \psi + \ket 1 [G-A_k]\ket \psi)$$

One can than estimate \(p_0\) and \(p_1\), respectively the probability of getting the \((G+A_k) \ket\psi\) and \((G-A_k) \ket\psi\) once the ancilla is measured.

With those probabilities the following cost functions can be defined:

$$C_0 = \frac 1{4p_0} \ket \psi (U+A_k)^\dagger H (U+A_k)\ket \psi$$

and similarly:

$$C_1 = \frac 1{4p_1} \ket \psi (U-A_k)^\dagger H (U-A_k)\ket \psi$$

and therefore:

$$\bra \psi U^\dagger H A_k\ket \psi + \bra \psi A_k^\dagger H U\ket \psi = 2(p_0 C_0 - p_1C_1)$$

which means that we can repeat all the steps done for gates with generators with two distinct eigenvalues.


And that's it for this article. Thanks for reading.

For any question or suggestion related to what I covered in this article, please add it as a comment. For special needs, you can contact me here.

Sources: