Classical-to-quantum convolutional neural network transfer learning

Juhyeon Kim SKKU Advanced Institute of Nanotechnology, Sungkyunkwan University, Suwon, Republic of Korea Joonsuk Huh joonsukhuh@gmail.com Department of Chemistry, Sungkyunkwan University, Suwon, Republic of Korea Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, Republic of Korea Daniel K. Park dkd.park@yonsei.ac.kr Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea Department of Statistics and Data Science, Yonsei University, Seoul, Republic of Korea

Abstract

Machine learning using quantum convolutional neural networks (QCNNs) has demonstrated success in both quantum and classical data classification. In previous studies, QCNNs attained a higher classification accuracy than their classical counterparts under the same training conditions in the few-parameter regime. However, the general performance of large-scale quantum models is difficult to examine because of the limited size of quantum circuits, which can be reliably implemented in the near future. We propose transfer learning as an effective strategy for utilizing small QCNNs in the noisy intermediate-scale quantum era to the full extent. In the classical-to-quantum transfer learning framework, a QCNN can solve complex classification problems without requiring a large-scale quantum circuit by utilizing a pre-trained classical convolutional neural network (CNN). We perform numerical simulations of QCNN models with various sets of quantum convolution and pooling operations for MNIST data classification under transfer learning, in which a classical CNN is trained with Fashion-MNIST data. The results show that transfer learning from classical to quantum CNN performs considerably better than purely classical transfer learning models under similar training conditions.

1 Introduction

Machine learning (ML) with a parameterized quantum circuit (PQC) is a promising approach for improving existing methods beyond classical capabilities Romero_2017 ; PhysRevA.98.032309 ; Benedetti_2019 ; cong_quantum_2019 ; cerezo2020variational ; mangini_quantum_2021 ; PRXQuantum.2.040337 . This is a classical-quantum hybrid algorithm in which the cost function and its corresponding gradient are computed using quantum circuits PhysRevLett.118.150503 ; PhysRevA.99.032331 and the model parameters are updated classically. Such hybrid ML models are particularly advantageous when cost function minimization is difficult to perform classically peruzzo_variational_2014 ; McClean_2016 ; cong_quantum_2019 . These models optimize the quantum gate parameters under the given experimental setup, and hence can be robust to systematic errors. Furthermore, they are less prone to decoherence because iterative computation can be exploited to reduce the quantum circuit depth. Thus, the hybrid algorithm has the potential to achieve quantum advantage in solving various problems in the noisy intermediate-scale quantum (NISQ) era Preskill2018quantumcomputingin ; bharti2021noisy .

A critical challenge in the utilization of PQC for solving real-world problems is the barren plateau phenomenon in the optimization landscape, which makes training the quantum model that samples from the Haar measure difficult as the number of qubits increases McClean2018 . One way to avoid the barren plateau is to adopt a hierarchical structure grant_hierarchical_2018 ; pesah2020absence , in which the number of qubits decreases exponentially with quantum circuit depth, such as in quantum convolutional neural networks (QCNNs) cong_quantum_2019 . The hierarchical structure is interesting from a theoretical perspective because of its close connection to tensor networks grant_hierarchical_2018 ; HUANG202189 . Moreover, the shallow depth of a QCNN, which grows logarithmically with the number of input qubits, makes it well suited for NISQ computing. In addition, an information-theoretic analysis shows that the QCNN architecture can help reduce the generalization error PRXQuantum.2.040321 , which is one of the central goals of machine learning. All these factors motivate the application of QCNN for machine learning. QCNNs have been shown to be useful for solving both quantum cong_quantum_2019 ; maccormack_branching_2020 and classical hur2021quantum problems despite their restricted structure with a shallow-depth quantum circuit. In Ref. hur2021quantum , for binary classification with the MNIST 726791 and Fashion-MNIST xiao2017fashionmnist datasets, QCNN yielded higher classification accuracy than the classical convolutional neural network (CNN) when only 51 or fewer parameters were used to construct these models. The best-known classical CNN-based classifiers for the same datasets typically employ millions of parameters. However, the size of the quantum circuits that can be implemented with current quantum devices is too small to incorporate such a large number of parameters. Therefore, two important issues remain. The first is to verify whether a QCNN can continue to outperform its classical counterpart as the number of trainable model parameters increases. The second is to utilize small QCNNs that can be realized in the near future to the full extent, so that a quantum advantage can be achieved in solving practical problems. The latter is the main focus of this work.

An ML problem for which the quantum advantage in the few-parameter regime can be exploited is transfer learning (TL) Bozinovski2020ReminderOT ; 5288526 ; 10.1007/978-3-030-01424-7_27 ; Goodfellow-et-al-2016 . TL aims to utilize what has been learned in one setting to improve generalization in another setting that is independent of the former. TL can be applied to classical-quantum hybrid networks such that the parameters learned for a classical model are transferred to training a quantum model or vice versa Mari2020transferlearning . In the classical-to-quantum (C2Q) TL scheme, the number of qubits increases with the number of output nodes (or features) of the pre-trained classical neural network. This indicates that the transferred part of a classical neural network should have a small number of output nodes to find applications in the NISQ era. For example, using a pre-trained feedforward neural network with a large number of nodes throughout the layers would not be well suited for near-term hybrid TL. By contrast, building a TL model with a classical and quantum CNN is viable because the number of features in the CNN progressively decreases via subsampling (i.e., pooling), and the QCNN has already exhibited an advantage with a small number of input qubits.

Motivated by the aforementioned observations, we present TL from a classical to quantum convolutional neural network (C2Q-CNN). By conducting numerical simulations with PennyLane bergholm2020pennylane , we investigate the performance of the C2Q-CNN for MNIST data classification, where the classical CNN is pre-trained with Fashion-MNIST data. The simulation benchmarks the classification accuracy under various quantum convolution and pooling operations. We compare C2Q-CNN with various classical-to-classical CNN (C2C-CNN) TL schemes and show that C2Q-CNN achieves a higher classification accuracy than C2C-CNN under similar training conditions.

The remainder of this paper is organized as follows. Section 2 reviews QCNNs and TL. This section also introduces the generalization of the pooling operation of the QCNN. Section 3 explains the general framework for C2Q-CNN TL. The simulation results are presented in section 4. MNIST data classification was performed with a CNN pre-trained for Fashion-MNIST data, and the performance of the C2Q-CNN models was compared with that of various C2C-CNN models. The conclusions and outlook are presented in Section 5.

2 Preliminaries

2.1 Quantum convolutional neural network

Quantum convolutional neural networks are parameterized quantum circuits with unique structures inspired by classical CNNs. In general, QCNNs follow two basic principles of classical CNNs: translational invariance of convolutional operations and dimensionality reduction via pooling. However, QCNNs differ from classical CNNs in several aspects. First, the data are defined in quantum Hilbert space, which grows exponentially with the number of qubits. Consequently, the quantum convolutional operation is not an inner product, as in the classical case, but a unitary transformation of a state vector, which is a linear map that transforms a vector to a vector, whereas a classical convolution operation is a linear map that transforms a vector to a scalar. The pooling in QCNN traces out half of the qubits, similar to the pooling in the CNN that subsamples the feature space. Typically, the pooling layer includes parameterized two-qubit controlled-unitary gates, and the control qubits are traced out after the gate operations. Without loss of generality, we refer to the structure of a parameterized unitary operator for either convolution or pooling as ansatz. The cost function of a model with given parameters is defined with an expectation value of some observable with respect to the final quantum state obtained after repeating quantum convolutional and pooling operations. The QCNN is trained by updating the model parameters to minimize the cost function until a pre-determined convergence condition is met. The general concept of a QCNN is illustrated in Fig. 1 (a). An example of a circuit with eight input qubits is shown in (b). The depth of the QCNN circuit after repeating the convolution and pooling until one qubit remains is $O (l o g N)$ , where $N$ is the number of input qubits. This shallow depth allows the QCNN to perform well on quantum hardware that will be developed in the near future.

Figure 1: (a) Schematics of the QCNN algorithm with (b) an example for eight input qubits. Given a quantum state, $| ψ ⟩_{d}$ , which encodes classical data, the quantum circuit comprises two parts: convolutional filters (rectangles) and pooling (circles). The convolutional filter and pooling use parameterized quantum gates. Three layers of convolution–pooling pairs are presented in this example. In each layer, the convolutional filter applies the identical two-qubit ansatz to the nearest neighbor qubits in a translationally invariant manner. Pooling operations within the layer are identical to each other, but differ from convolutional filters. The pooling operation is represented as a controlled unitary transformation, and the half-filled circle on the control qubit indicates that different unitary gates can be applied to each subspace of the control qubit. The measurement outcome of the quantum circuit is used to calculate the user-defined cost function. A classical computer is used to compute the new set of parameters based on the gradient, and the quantum circuit parameters are updated for the subsequent round. The optimization process is iterated until pre-selected conditions are met.

Many ways are available to parameterize the convolution and pooling operations. Because they act on only two qubits, parameterization of a unitary operation that can express any element in $S U (4)$ provides the most general ansatz. In Ref. hur2021quantum , nine different ansatzes were tested for convolution, ranging from an arbitrary $S U (4)$ operation that required 15 parameters to simpler ones that required fewer parameters. The convolution ansatzes tested in the numerical simulations in this study are shown in Fig. 2. Among them, circuits (b) to (j) are the nine ansatzes tested in previous work. We added two convolutional ansatzes, (a) and (k), to our benchmark. The former aims to study the classification capability of a QCNN when only pooling operations are trained. The latter is inspired by the generalized pooling described in the following paragraph, with an $S U (2)$ gate applied to a control qubit to split the subspaces in an arbitrary superposition.

Figure 2: Parameterized quantum circuits used in the convolutional layer. The convolutional circuits from (b) to (j) are adapted from Ref. hur2021quantum , whereas (a) and (k) are the new convolutional circuits tested in this study. $R_{i} (θ)$ is the rotation around the $i$ -axis of the Bloch sphere by an angle of $θ$ , and $H$ is the Hadamard gate. $U (θ, ϕ, λ)$ is an arbitrary single-qubit gate, which can be expressed as $U (θ, ϕ, λ) = R_{z} (ϕ) R_{x} (- π / 2) R_{z} (θ) R_{x} (π / 2) R_{z} (λ)$ . $U (θ, ϕ, λ)$ can implement any unitary operation in $S U (2)$ . As (j) can express an arbitrary two-qubit unitary gate, we test it without any parameterized gates for pooling in addition to ZX pooling and generalized pooling. For (k), we do not apply parameterized gates for pooling. In these cases, pooling simply traces out the top qubit after convolution.

The pooling ansatzes used in previous studies were simple single-qubit-controlled rotations followed by tracing out the control qubit. For example, in Ref. hur2021quantum , a pooling operation in the following form was used:

\TrA[(|1⟩⟨1|A⊗Rz(θ1)B+|0⟩⟨0|A⊗Rx(θ2)B)⋅ρAB],

(1)

where $\TrA(⋅)$ represents a partial trace over subsystem $A$ , $R_{i} (θ)$ is the rotation around the $i$ axis of the Bloch sphere by an angle of $θ$ , $θ_{1}$ and $θ_{2}$ are the free parameters, and $ρ_{A B}$ is a two-qubit state subject to pooling. The pooling operation in Eq. (1) is referred to as ZX pooling. In addition to ZX pooling, generalized pooling is introduced as

\TrA[(|1⟩⟨1|A⊗U(θ1,ϕ2,λ3)B+|0⟩⟨0|A⊗U(θ4,ϕ5,λ6)B)⋅ρAB].

(2)

Here, $U (θ, ϕ, λ) = R_{z} (ϕ) R_{x} (- π / 2) R_{z} (θ) R_{x} (π / 2) R_{z} (λ)$ and can implement any unitary operator in $S U (2)$ . The unitary gates used in ZX pooling and generalized pooling are compared in Fig. 3.

Figure 3: Parameterized quantum gates used in the pooling layer. The pooling circuit (a) is adapted from Ref. hur2021quantum , and (b) is the generalized pooling method introduced in this work. Generalized pooling applies two arbitrary single-qubit unitary gate rotations, $U (θ_{1}, ϕ_{2}, λ_{3})$ and $U (θ_{4}, ϕ_{5}, λ_{6})$ , which are activated when the control qubit is 1 (filled circle) or 0 (open circle), respectively. The control (first) qubit is traced out after the gate operations to reduce the dimensions. The single-qubit unitary gate is defined as $U (θ, ϕ, λ) = R_{z} (ϕ) R_{x} (- π / 2) R_{z} (θ) R_{x} (π / 2) R_{z} (λ)$ , and it can implement any unitary in $S U (2)$ . The thinner horizontal line (top qubit) indicates the qubit that is being traced out after gate operations.

2.2 Transfer learning

Transferring the knowledge accumulated from one task to another is a typical intelligent behavior that human learners always experience. TL refers to the application of this concept in ML. Specifically, TL aims to improve the training of a new ML model by utilizing a reference ML model that is pre-trained for a different but related task with a different dataset Bozinovski2020ReminderOT ; 5288526 ; 10.1007/978-3-030-01424-7_27 ; Goodfellow-et-al-2016 . TL is known to be particularly useful for training a deep learning model that takes a long time owing to the large amount of data, especially if the features extracted in early layers are generic across various datasets. In such cases, starting from a pre-trained network such that only a portion of the model parameters is fine-tuned for a particular task can be more practical than training the entire network from scratch. For example, a reference neural network is trained with data A to solve task A. To solve task B given dataset B, the neural network is not trained from scratch, as this may require vast computational resources. Instead, the parameters (i.e., weights and biases) found for some of the earlier layers of the reference neural network are used as a set of fixed parameters for the new neural network that is subjected to solving task B with data B. The successful application of TL can improve training performance by starting from a higher training accuracy, achieving a faster rate of accuracy improvement, and converging to a higher asymptotic training accuracy 10.5555/1803899 .

The aforementioned observations imply that TL is also beneficial when the amount of data available is insufficient to work with or extremely small to build a good model. Because processing big data in the NISQ era will be challenging, working with small amounts of data through TL is a promising strategy for near-term quantum ML. The target ML model subjected to fresh training (i.e., fine-tuning) in TL typically has a much smaller number of parameters than the pre-trained model. This and the success of QCNN in the few-parameter regime together promote the development of the classical-to-quantum CNN transfer learning.

3 Classical-to-quantum transfer learning

An extension of TL to quantum ML was proposed, and its general concept was formulated in Ref. Mari2020transferlearning . Although the performances of the quantum models were not compared with those of their classical counterparts, three different scenarios of quantum TL, namely C2Q, quantum-to-classical, and quantum-to-quantum, were shown to be feasible. Among these three possible scenarios, we focus on C2Q TL as mentioned in Section 1, because we aim to utilize QCNNs in the few-parameter regime to the full extent. Sufficient reduction of the data dimensionality (i.e., the number of attributes or features) by classical learning would ensure that the size of a quantum circuit subject to training is sufficiently small for implementation with NISQ devices. The dimensionality reduction technique is also necessary to simplify expensive quantum state preparation routines to represent classical data in a quantum state PhysRevA.64.014303 ; PhysRevA.73.012307 ; Mosca:01 ; PhysRevA.83.032302 ; Mottonen:2005:TQS:2011670.2011675 ; araujo_divide-and-conquer_2021 ; 9259210 ; PhysRevA.102.032420 ; araujo2021configurable .

C2Q TL has been utilized for image data classification Mari2020transferlearning ; mogalapalli_classicalquantum_2022 and spoken command recognition qi2021classical . These works serve as proof of the principle of the general idea and present interesting examples to motivate further investigations and benchmarks. The parameterized quantum circuits therein are vulnerable to the barren plateau problem, because they follow the basic structure of a fully connected feedforward neural network with the same number of input and output qubits. Moreover, these studies used classical neural networks to significantly reduce the number of data features to only four or eight. This means that most of the feature extraction is performed classically; hence, the necessity of the quantum part is unclear. These studies encode the reduced data onto a quantum circuit using simple single-qubit rotations, also known as qubit encoding grant_hierarchical_2018 ; PhysRevA.102.032420 , which makes the number of model parameters grow polynomially with the number of data features. In contrast, the number of model parameters in our ML algorithm scales logarithmically with the number of input qubits. Furthermore, all of these works use only one type of ansatz based on repetitive applications of single-qubit rotations and controlled-NOT gates. Finally, the performance of C2Q TL was not compared with that of the C2C version in any of these studies. Because the pre-trained classical neural network performs a significant dimensionality reduction (and hence feature extraction), the absence of a direct comparison with C2C TL raises the question of whether the quantum model achieves any advantage over its classical counterparts.

In this study, we developed C2Q TL with a QCNN and compared C2Q TL results with C2C TL. Transferring pre-learned information to QCNN is important because QCNN can avoid the barren plateau effect. Moreover, because previous studies have shown the advantages of QCNNs over their classical counterpart in the few-parameter regime, fine-tuning an ML model with a QCNN is expected to improve the classification performance compared to fine-tuning it with a CNN. An example of TL using C2Q-CNN, which was used in our benchmark studies, is illustrated in Fig. 4.

Figure 4: An example of classical-to-quantum convolutional neural network transfer learning simulated in this work for benchmarking and comparison to purely classical models. Examples of datasets used in simulations are also shown. A reference CNN is trained with the Fashion-MNIST dataset. Then, the transfer learning trains a QCNN for MNIST data classification while using earlier layers of the pre-trained CNN for feature extraction.

The general model is flexible with the choice of data encoding, which loads classical data features to a quantum state $| ψ ⟩_{d}$ , and ansatz, the quantum circuit model subject to training. We performed extensive benchmarking over the various ansatzes presented in Section 2.1 to classify MNIST data using a classical model pre-trained with Fashion-MNIST data. Finally, we compared the classification accuracies of C2Q and various C2C models. The C2Q models performed noticeably better than all C2C models tested in this study under similar training conditions. More details on the simulation and results are presented in the following section.

4 Simulation Results

To demonstrate the advantage of using a QCNN in TL, we performed classical simulations of binary classification using PennyLane bergholm2020pennylane . The benchmark was performed using two standard image datasets, MNIST and Fashion-MNIST, which were accessed through Keras chollet2015keras . Examples of the datasets are shown on the left in Fig. 4. Note that both datasets have 28 $\times$ 28 features and 10 classes. Among the 10 classes of MNIST data, we performed three independent binary classification tasks aimed at distinguishing between 0 and 1, between 2 and 3, and between 8 and 9. To represent classical data as a quantum state in a QCNN, the classical data must be encoded into a quantum state. The number of data features that can be encoded in $N$ qubits ranges from $N$ to $2^{N}$ depending on the choice of the encoding method grant_hierarchical_2018 ; PhysRevA.102.032420 ; Havlicek2019 ; araujo_divide-and-conquer_2021 ; araujo2021configurable . Among many options, we used amplitude encoding to represent as many features as possible (see A). All C2Q-CNN simulations were performed with eight input qubits, to which the amplitude encoding loads 256 features. Quantum feature maps that encode only eight features, such as qubit encoding, are not considered because they require extreme dimensionality reduction on the classical end, which may dominate the classification result as described in the previous section. Furthermore, amplitude encoding was shown to work well with QCNNs for classical data classification hur2021quantum .

The reference classical CNN model (Fig. 4) was trained using 60,000 Fashion-MNIST data for multinomial classification. For the fine-tuning process, we used 10,000 MNIST data to train the QCNN part of the hybrid TL model. In the meantime, the classical CNN part is frozen and uses the set of parameters found during the Fashion-MNIST data classification, as required in TL. The trainable parameters in the QCNN were optimized by minimizing the cross-entropy cost function with the Adam optimizer kingma2014adam using PennyLane bergholm2020pennylane . The number of MNIST test data was approximately 2,000.

C2Q transfer models can be split into three sets based on different pooling variations. The first set uses ZX pooling with convolution circuits (a)-(j), as shown in Fig. 2. The second set includes the general pooling circuit and convolution circuits (a)-(j), as shown in Fig. 2. Finally, we constructed transfer models without parameterized quantum gates in pooling layers. We refer to this pooling strategy that merely traces out one of the qubits as trivial pooling. Trivial pooling is tested with convolution circuits (j) and (k), as shown in Fig. 2.

To compare C2Q-CNN TL classification results with its classical counterparts, 1D and 2D CNN C2C TL models were constructed with a similar number of trainable parameters as C2Q-CNN models. The 1D CNN model was composed of a 1D convolution layer and 1D max pooling layer with 64 trainable parameters. Similarly, the 2D CNN model was composed of a 2D convolution layer and 2D max pooling layer with 76 trainable parameters. The CNNs subjected to fine-tuning for the MNIST data use the cross-entropy cost function with the Adam optimizer, as in the C2Q-CNN case. These classical CNN architectures are built using Keras chollet2015keras . Detailed descriptions of C2C models are provided in B. The training process used mini-batch gradient descent with a batch size of 50 and a learning rate of 0.01. We also fixed the number of training iterations at 200 for the C2Q TL and C2C TL models. The other training conditions were kept the same in the C2Q and C2C transfer models to make the comparison as fair as possible.

The TL classification results are shown in Fig. 5. Each bar represents the C2Q classification accuracy averaged over ten randomly initialized parameters. Different bars along the x-axis represent different convolutions, labeled according to Fig. 2. The unfilled, filled, and hatched bars represent the results of ZX pooling, general pooling, and trivial pooling, respectively. The blue dashed line and green solid line represent the results of the C2C TL using 1D and 2D CNN architectures, respectively. A full list of simulation results for each of the ten random initialization instances is provided in C.

The 0 vs. 1 classification results are shown in Fig. 5 (a). For ZX and general pooling, most of the average test accuracies were greater than 95% and 98%, respectively. The test accuracy with ZX pooling with convolution 2, 4, 7, 8, 9, and 10 ansatz and general pooling with all convolution ansatz is greater than that of 1D and 2D C2C TL. The accuracy with trivial pooling with convolution 10 ansatz was also higher than that of C2C TL. In addition, the accuracy of trivial pooling with convolution ansatz 10 is higher than that of trivial pooling with convolution ansatz 11. This is because convolution ansatz 10 has more trainable parameters than convolution ansatz 11. The test accuracy of general pooling is greater than that of ZX pooling when the convolution ansatz is the same. This can be inferred from the fact that general pooling can be trained to learn ZX pooling.

The 2 vs. 3 classification results are shown in Fig. 5 (b). Most of the ZX pooling average accuracies were between 70% and 90%, and most of the general pooling average accuracies were between 85% and 90%. The test accuracies in (b) are lower than those of (a) because 2 vs. 3 image classifications are more difficult than 0 vs. 1 image classifications. All C2Q TL classification accuracies are higher than the C2C classification accuracies except for ZX pooling with convolution 1 and 3, which use a much smaller number of parameters than the purely classical TL. As shown in (a), the test accuracy of general pooling is greater than that of ZX pooling when the convolution is the same. In addition, the accuracy of trivial pooling with convolution ansatz 10 is higher than that with ansatz 11.

The 8 vs. 9 classification results are shown in Fig. 5 (c). Most of the ZX pooling average accuracies were between 70% and 90%, and most of the general pooling average accuracies were between 85% and 90%. The test accuracies in (c) are lower than those in (a) because 8 vs. 9 image classification is more difficult than 0 vs. 1 image classification, but the accuracies in (c) are similar to those in (b). All C2Q TL classification accuracies are higher than the C2C TL classification accuracies except for ZX pooling with convolution 1, 2, and 3, which use a much smaller number of parameters than the purely classical TL. As shown in (a) and (b), trivial pooling with convolution ansatz 10 has higher accuracy than trivial pooling with ansatz 11. The accuracy of general pooling is mostly greater than that of ZX pooling when the convolution is the same, but there are exceptional cases such as convolution ansatz 7. These exceptional cases show that increasing the number of trainable parameters does not always improve the test accuracy.

The results in Fig. 5(a), (b), and (c) show a tendency for higher convolution ansatz numbers to yield higher accuracy. It can be interpreted as QCNN’s tendency to perform better when the convolution circuits have a larger number of trainable parameters. However, increasing the number of trainable parameters does not always guarantee to improve the test accuracy because it is affected by various conditions, such as statistical error and quantum gate arrangement. For example, ZX pooling with convolution ansatz 5, 6, 7 in (b) has the same number of trainable parameters, but their average accuracies are different. We did not observe overfitting in our benchmark, but care must be taken to avoid it when the number of model parameters grows beyond what is used in this work.

In summary, general pooling mostly produces higher classification test accuracy than ZX pooling with the same convolution circuit. The accuracy of all ZX pooling, general pooling, and trivial pooling circuits tends to be higher when the convolution circuits have a larger number of gate parameters. Although C2Q models have fewer trainable parameters than C2C models, most C2Q models outperform C2C models.

Figure 5: Summary of the classification results with PennyLane simulations (quantum part) and Keras (classical part). Each bar represents the classification test accuracy of C2Q TL averaged over 10 instances given by the random initialization of parameters. The different bars along the x-axis indicate that the results are for different convolution ansatz, labeled according to Fig. 2. The unfilled, filled, and hatched bars represent the results of ZX pooling, general pooling, and trivial pooling, respectively. The number of trainable model parameters for each case is shown at the top of the x-axis. The horizontal lines represent the results of the C2C TL with 1D and 2D CNN architectures. The number of trainable model parameters for each case is provided in the legend.

5 Conclusion

In this study, we propose a classical-to-quantum CNN (C2Q-CNN), a transfer learning (TL) model that uses some layers of a pre-trained CNN as a starting point for a quantum CNN (QCNN). The QCNN constitutes an extremely compact machine learning (ML) model because the number of trainable parameters grows logarithmically with the number of initial qubits cong_quantum_2019 and is promising because of the absence of barren plateaus pesah2020absence and generalization capabilities PRXQuantum.2.040321 . Supervised learning with a QCNN has also demonstrated classification performance superior to that of its classical counterparts under similar training conditions for a number of canonical datasets hur2021quantum . C2Q-CNN TL provides an approach to utilize the advantages of QCNN in the few-parameter regime to the full extent. Moreover, the proposed method is suitable for implementation in quantum hardware expected to be developed in the near future because it is robust to systematic errors and can be implemented with a shallow-depth quantum circuit. Therefore, C2Q-CNN TL is a strong candidate for practical applications of NISQ computing in ML with a quantum advantage.

To verify the quantum advantage of C2Q-CNN, we compared two classical-to-classical (C2C) TL models with C2Q models. These C2C and C2Q TL models have the same pre-trained CNN model, and the C2C TL models are designed to have slightly larger numbers of model parameters than C2Q TL models. The pre-training was carried out with the Fashion-MNIST dataset for multinomial classification, and the TL models were used for three independent binary classification tasks with the MNIST data. Classification results obtained via simulations using PennyLane and Keras showed that C2Q models produced higher classification accuracy in most cases, even though the number of trainable parameters in the C2Q model was smaller than that in the C2C model.

The potential future research directions are as follows. First, the reason behind the quantum advantage demonstrated by C2Q-CNN remains unclear. Although rigorous analysis is lacking, we speculate that this advantage is related to the ability of a quantum measurement to discriminate non-orthogonal states, for which a classical analog does not exist. Moreover, verifying whether the quantum advantage would continue to hold as the number of trainable parameters increases and for other datasets would be interesting. To increase the number of model parameters for a fixed number of features and input qubits, one may consider generalizing the QCNN model to utilize multiple channels, as in many classical CNN models. Note that, in the TL tested in our experiment, the final dense layer was replaced with a model subjected to fine-tuning, while the entire convolutional part was frozen. Testing the various depths of frozen layers would be an interesting topic for future research. For example, freezing a smaller number of layers to use the features of an earlier layer of the convolutional stage can be beneficial when the new dataset is small and significantly different from the reference. Finally, the focus of this study was on inductive TL, for which both the reference and new datasets were labeled. Exploring the possibility of applying quantum techniques to other TL scenarios, such as self-taught, unsupervised, and transductive TL 5288526 , remains an open challenge.

Acknowledgments

This research was supported by the Yonsei University Research Fund of 2022 (2022-22-0124), the National Research Foundation of Korea (Grant Nos. 2019R1I1A1A01050161, 2021M3H3A1038085, 2019M3E4A1079666, 2022M3E4A1074591, and 2021M3E4A1038308), and the KIST Institutional Program (2E31531-22-076).

References

(1) Jonathan Romero, Jonathan P Olson, and Alan Aspuru-Guzik. Quantum autoencoders for efficient compression of quantum data. Quantum Science and Technology, 2(4):045001, aug 2017.
(2) K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii. Quantum circuit learning. Phys. Rev. A, 98:032309, Sep 2018.
(3) Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized quantum circuits as machine learning models. Quantum Science and Technology, 4(4):043001, nov 2019.
(4) Iris Cong, Soonwon Choi, and Mikhail D. Lukin. Quantum convolutional neural networks. Nature Physics, 15(12):1273–1278, December 2019.
(5) M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R. McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and Patrick J. Coles. Variational quantum algorithms. Nature Reviews Physics, 3(9):625–644, 2021.
(6) S. Mangini, F. Tacchino, D. Gerace, D. Bajoni, and C. Macchiavello. Quantum computing models for artificial neural networks. EPL (Europhysics Letters), 134(1):10002, April 2021.
(7) Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, and Dacheng Tao. Learnability of quantum neural networks. PRX Quantum, 2:040337, Nov 2021.
(8) Jun Li, Xiaodong Yang, Xinhua Peng, and Chang-Pu Sun. Hybrid quantum-classical approach to quantum optimal control. Phys. Rev. Lett., 118:150503, Apr 2017.
(9) Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluating analytic gradients on quantum hardware. Phys. Rev. A, 99:032331, Mar 2019.
(10) Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Communications, 5(1):4213, July 2014.
(11) Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Alán Aspuru-Guzik. The theory of variational hybrid quantum-classical algorithms. New Journal of Physics, 18(2):023023, feb 2016.
(12) John Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2:79, August 2018.
(13) Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw, Tobias Haug, Sumner Alperin-Lea, Abhinav Anand, Matthias Degroote, Hermanni Heimonen, Jakob S. Kottmann, Tim Menke, Wai-Keong Mok, Sukin Sim, Leong-Chuan Kwek, and Alán Aspuru-Guzik. Noisy intermediate-scale quantum algorithms. Rev. Mod. Phys., 94:015004, Feb 2022.
(14) Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Ryan Babbush, and Hartmut Neven. Barren plateaus in quantum neural network training landscapes. Nature Communications, 9(1):4812, Nov 2018.
(15) Edward Grant, Marcello Benedetti, Shuxiang Cao, Andrew Hallam, Joshua Lockhart, Vid Stojevic, Andrew G. Green, and Simone Severini. Hierarchical quantum classifiers. npj Quantum Information, 4(1):65, December 2018.
(16) Arthur Pesah, M. Cerezo, Samson Wang, Tyler Volkoff, Andrew T. Sornborger, and Patrick J. Coles. Absence of barren plateaus in quantum convolutional neural networks. Phys. Rev. X, 11:041011, Oct 2021.
(17) Rui Huang, Xiaoqing Tan, and Qingshan Xu. Variational quantum tensor networks classifiers. Neurocomputing, 452:89–98, 2021.
(18) Leonardo Banchi, Jason Pereira, and Stefano Pirandola. Generalization in quantum machine learning: A quantum information standpoint. PRX Quantum, 2:040321, Nov 2021.
(19) Ian MacCormack, Conor Delaney, Alexey Galda, Nidhi Aggarwal, and Prineha Narang. Branching quantum convolutional neural networks. Phys. Rev. Research, 4:013117, Feb 2022.
(20) Tak Hur, Leeseok Kim, and Daniel K. Park. Quantum convolutional neural network for classical data classification. Quantum Machine Intelligence, 4(1):3, 2022.
(21) Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
(22) Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
(23) Stevo Bozinovski. Reminder of the first paper on transfer learning in neural networks, 1976. Informatica (Slovenia), 44, 2020.
(24) Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
(25) Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. A survey on deep transfer learning. In Věra Kůrková, Yannis Manolopoulos, Barbara Hammer, Lazaros Iliadis, and Ilias Maglogiannis, editors, Artificial Neural Networks and Machine Learning – ICANN 2018, pages 270–279, Cham, 2018. Springer International Publishing.
(26) Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
(27) Andrea Mari, Thomas R. Bromley, Josh Izaac, Maria Schuld, and Nathan Killoran. Transfer learning in hybrid classical-quantum neural networks. Quantum, 4:340, October 2020.
(28) Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, M. Sohaib Alam, Shahnawaz Ahmed, Juan Miguel Arrazola, Carsten Blank, Alain Delgado, Soran Jahangiri, Keri McKiernan, Johannes Jakob Meyer, Zeyue Niu, Antal Száva, and Nathan Killoran. Pennylane: Automatic differentiation of hybrid quantum-classical computations, 2020.
(29) Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, and Antonio Jose Serrano Lopez. Handbook Of Research On Machine Learning Applications and Trends: Algorithms, Methods and Techniques - 2 Volumes. Information Science Reference - Imprint of: IGI Publishing, Hershey, PA, 2009.
(30) Gui-Lu Long and Yang Sun. Efficient scheme for initializing a quantum register with an arbitrary superposed state. Phys. Rev. A, 64:014303, Jun 2001.
(31) Andrei N. Soklakov and Rüdiger Schack. Efficient state preparation for a register of quantum bits. Phys. Rev. A, 73:012307, Jan 2006.
(32) Michele Mosca and Phillip Kaye. Quantum networks for generating arbitrary quantum states. In Optical Fiber Communication Conference and International Conference on Quantum Information, page PB28. Optical Society of America, 2001.
(33) Martin Plesch and Časlav Brukner. Quantum-state preparation with universal gate decompositions. Phys. Rev. A, 83:032302, Mar 2011.
(34) Mikko Möttönen, Juha J. Vartiainen, Ville Bergholm, and Martti M. Salomaa. Transformation of quantum states using uniformly controlled rotations. Quantum Info. Comput., 5(6):467–473, September 2005.
(35) Israel F. Araujo, Daniel K. Park, Francesco Petruccione, and Adenilton J. da Silva. A divide-and-conquer algorithm for quantum state preparation. Scientific Reports, 11(1):6329, March 2021.
(36) T. M. L. Veras, I. C. S. De Araujo, K. D. Park, and A. J. da Silva. Circuit-based quantum random access memory for classical data with continuous amplitudes. IEEE Transactions on Computers, pages 1–1, 2020.
(37) Ryan LaRose and Brian Coyle. Robust data encodings for quantum classifiers. Phys. Rev. A, 102:032420, Sep 2020.
(38) Israel F Araujo, Daniel K Park, Teresa B Ludermir, Wilson R Oliveira, Francesco Petruccione, and Adenilton J da Silva. Configurable sublinear circuits for quantum state preparation. arXiv preprint arXiv:2108.10182, 2021.
(39) Harshit Mogalapalli, Mahesh Abburi, B. Nithya, and Surya Kiran Vamsi Bandreddi. Classical–Quantum Transfer Learning for Image Classification. SN Computer Science, 3(1):20, January 2022.
(40) Jun Qi and Javier Tejedor. Classical-to-quantum transfer learning for spoken command recognition based on quantum neural networks. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8627–8631, 2022.
(41) François Chollet et al. Keras. https://keras.io, 2015.
(42) Vojtech Havlícek, Antonio D. Córcoles, Kristan Temme, Aram W. Harrow, Abhinav Kandala, Jerry M. Chow, and Jay M. Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567(7747):209–212, 2019.
(43) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Appendix A Encoding classical data to a quantum state

The first step in applying quantum ML to a classical dataset is to transform the classical data into a quantum state. Without loss of generality, we consider the classical data given as an $N$ -dimensional real vector $\to x \in R^{N}$ . Several encoding methods exist to achieve this, such as algorithms that require a quantum circuit with $O (N)$ width and $O (1)$ depth and algorithms that require $O (log (N))$ width and $O (p o l y (N))$ depth araujo_divide-and-conquer_2021 ; 9259210 ; PhysRevA.102.032420 ; araujo2021configurable . Among the various encoding methods explored previously hur2021quantum , we observed that amplitude encoding performs best in most cases.

a.1 Amplitude encoding

Amplitude encoding encodes classical data into the probability amplitude of each computational quantum state. Amplitude encoding transforms $x = (x_{1}, . . ., x_{N})^{⊤}$ of dimension $N = 2^{n}$ classical data into an n-qubit quantum state $| ψ (x) ⟩$ as follows:

U (x) : x \in R^{N} \to | ψ (x) ⟩ = \frac{1}{| | x | |} N \sum i = 1 x_{i} | i ⟩,

(3)

where $| i ⟩$ denotes the $i$ th computational basis state. Amplitude encoding can optimize the number of parameters on the $O (l o g (N))$ scale. However, the quantum circuit depth of amplitude encoding typically increases with $O (p o l y (N))$ .

a.2 Qubit encoding

Qubit encoding uses a constant quantum circuit depth, while using $N$ qubits. Qubit encoding rescales classical data to $x_{i}$ which lies between 0 and $π$ , and then inputs $x_{i}$ into a single qubit as $| ψ (x) ⟩ = cos (\frac{x_{i}}{2}) | 0 ⟩ + sin (\frac{x_{i}}{2}) | 1 ⟩$ for $i = 1, . . ., N$ . Therefore, qubit encoding transforms $x = (x_{1}, . . ., x_{N})^{⊤}$ into $N$ qubits as

(4)

where $x_{i} \in [0, π)$ for all $i$ . This unitary operator $U (x)$ can be expressed as the tensor product of a single qubit unitary operator $U (x) = \otimes_{j = 1}^{N} U_{x_{j}}$ , where

(5)

Appendix B Classical Neural Network

We devised classical convolutional neural networks to compare C2C TL and C2Q TL under the same training conditions. In particular, we assigned the similar number of model parameters to C2C TL and C2Q TL. We created 1D and 2D CNN models for the C2C TL using Keras chollet2015keras . We used the same pre-trained CNN model with C2Q TL, which is introduced in Fig. 4. To compare the test accuracies of C2C and C2Q transfer models under the same conditions, the learning rate, batch size, and number of iterations were fixed, and the Adam optimizer was used.

b.1 1d Cnn

The structure of the 1D CNN model is illustrated in Fig. 6. The CNN takes 256 features produced by the reference (pre-trained) CNN as input, and passes them on to 1D convolution and max pooling layers. The output feature size is reduced to 28 at the end of the max pooling layer. Finally, a dense layer is applied to reduce the size of the output features to two for binary classification. The total number of trainable parameters is 64. The accuracy test results for these models are presented in C.

b.2 2d Cnn

The structure of the 2D CNN model is shown in Fig. 7. The CNN takes $16 \times 16$ data, which is reshaped from 256 features produced by the reference (pre-trained) CNN as input. This two-dimensional data is passed on to the 2D convolution and max pooling layer twice, and the output features are reduced to eight. Finally, a dense layer is applied to reduce the size of the output features to two for binary classification. The total number of trainable parameters is 76. The accuracy test results for these models are presented in C.

Appendix C Full List of the Simulation Results

Here, we report the classification accuracy values from all instances of the C2Q-CNN TL models and C2C TL models tested in our simulation. For each combination of convolution and pooling circuits, 10 accuracy values were obtained from the random initialization of the trainable parameters. These values were averaged to produce Fig. 5. The following tables also summarize the means and standard deviations obtained from a random selection of the initial parameters.

The simulation results for C2Q TL were grouped according to the quantum pooling circuit, namely, ZX pooling, generalized pooling, and trivial pooling. For each pooling, we grouped the simulation results for the type of learning problem, namely, the binary classification for 0 and 1, 2 and 3, and 8 and 9. The results of ZX pooling are presented in Tables 1, 2, and 3 for classifying between 0 and 1, 2 and 3, and 8 and 9, respectively. The results with generalized pooling are included in Tables 4, 5, and 6 following the same order as before. The results with trivial pooling are listed in Table 7.

Convolution circuit	1	2	3	4	5	6	7	8	9	10
Accuracy	90.50	99.34	97.68	96.26	96.64	97.12	98.25	99.10	99.20	99.81
	86.19	98.01	92.53	99.39	97.02	95.70	95.22	98.87	99.81	99.62
	81.80	94.14	90.45	98.96	99.43	97.30	97.64	97.68	98.63	99.53
	92.01	97.87	99.53	97.83	96.69	94.85	99.48	99.72	99.72	99.57
	95.13	98.77	91.73	99.39	96.93	98.11	98.20	99.72	99.57	95.41
	75.56	97.02	96.78	99.15	99.53	97.35	98.58	94.99	98.72	99.76
	96.64	97.16	92.86	94.99	99.05	94.47	99.67	99.53	96.45	97.07
	92.06	99.53	96.41	97.49	98.72	99.24	99.29	99.53	97.97	99.72
	96.50	96.26	92.91	95.18	91.49	93.62	99.57	98.11	99.48	93.81
	86.05	96.12	95.79	99.76	97.12	99.67	98.72	99.62	99.62	99.81
Mean	89.24	97.42	94.67	97.84	97.26	96.74	98.46	98.69	98.92	98.41
Standard deviation	6.49	1.57	2.81	1.71	2.22	1.93	1.26	1.40	0.99	2.09

Table 1: ZX pooling with the 0 vs. 1 classification result. Most average accuracies are greater than 95%. The average accuracy of convolution circuit 1 is lower than that of the others because it has 0 trainable parameters for convolution layers.

Convolution circuit	1	2	3	4	5	6	7	8	9	10
Accuracy	69.44	69.49	74.98	81.59	87.76	83.06	78.60	77.96	88.15	88.79
	69.05	76.00	83.59	86.34	75.22	84.33	91.04	80.90	86.14	87.46
	68.46	77.47	65.38	86.43	83.74	79.73	82.91	76.40	83.94	88.69
	69.39	74.39	81.34	81.00	80.56	84.72	86.43	78.21	86.88	88.88
	69.39	76.10	71.60	84.33	77.72	80.22	77.86	75.27	80.71	80.56
	69.64	77.47	79.43	85.21	81.68	82.96	87.76	81.29	84.38	89.28
	64.94	79.04	73.70	85.01	77.38	71.94	85.16	77.91	86.34	86.68
	68.46	74.39	78.99	85.95	80.56	74.19	84.62	87.17	80.61	85.55
	62.19	85.26	68.17	81.64	81.39	85.85	89.03	78.40	80.95	85.06
	69.93	68.51	77.28	80.90	77.28	85.65	87.81	81.00	83.01	89.08
Mean	68.09	75.81	75.45	83.84	80.33	81.26	85.12	79.45	84.11	87.00
Standard deviation	2.39	4.50	5.53	2.18	3.48	4.56	4.08	3.19	2.61	2.58

Table 2: ZX pooling with the 2 vs. 3 classification result. The average accuracies are mostly between 70% and 90%. The classification accuracy tends to be higher when the convolution circuit has more trainable parameters.

Convolution circuit	1	2	3	4	5	6	7	8	9	10
Accuracy	57.79	66.41	65.15	79.27	79.73	85.12	86.84	83.36	82.40	87.64
	69.09	84.06	87.80	81.49	87.14	83.26	87.64	85.53	86.74	88.86
	55.93	84.97	74.38	81.95	79.53	84.47	83.91	87.29	86.89	86.08
	67.73	79.88	77.96	82.90	83.86	89.36	88.91	81.85	87.44	88.15
	64.55	80.84	77.31	85.88	86.69	86.38	88.80	89.06	89.36	88.80
	67.52	69.69	75.54	87.80	83.66	84.47	84.17	84.92	88.96	87.64
	60.97	84.87	73.32	86.23	84.77	84.17	90.82	86.54	83.81	89.51
	55.62	85.53	78.21	85.48	86.59	80.38	85.58	84.52	89.91	90.82
	69.79	70.80	78.92	87.14	88.60	89.16	82.25	86.79	85.58	86.43
	61.12	79.78	80.58	79.58	82.00	85.33	80.23	86.13	87.54	87.80
Mean	63.01	78.68	76.92	83.77	84.26	85.21	85.92	85.60	86.86	88.17
Standard deviation	5.17	6.74	5.46	2.97	2.95	2.52	3.13	1.95	2.27	1.33

Table 3: ZX pooling with the 8 vs. 9 classification result. The average accuracies are typically between 70% and 90%. The classification accuracy tends to be higher when the convolution circuit has more trainable parameters.

Convolution circuit	1	2	3	4	5	6	7	8	9	10
Accuracy	98.35	99.20	99.43	97.59	99.10	97.35	99.01	99.48	96.64	99.62
	98.06	99.24	98.25	99.01	98.91	98.82	99.43	98.87	99.43	99.57
	98.11	98.87	98.82	98.63	99.39	99.15	98.96	99.43	99.05	98.87
	96.12	97.64	98.87	99.24	98.82	98.82	96.03	99.39	99.10	99.20
	96.36	99.05	97.87	98.96	98.82	96.88	98.25	98.91	99.20	99.24
	98.49	99.01	98.53	99.01	99.15	98.91	99.01	98.68	99.29	99.20
	98.01	98.44	99.10	98.72	99.01	99.10	98.96	99.24	98.82	99.29
	98.77	99.67	95.60	98.91	98.96	99.01	99.24	95.22	99.57	99.48
	98.06	98.72	98.72	99.24	99.10	97.97	99.39	99.20	99.24	99.20
	94.85	97.21	99.57	99.48	99.01	98.63	98.82	99.34	99.39	98.91
Mean	97.52	98.70	98.48	98.88	99.03	98.46	98.71	98.78	98.97	99.26
Standard deviation	1.22	0.72	1.07	0.49	0.16	0.75	0.95	1.21	0.80	0.24

Table 4: Results of generalized pooling with 0 vs. 1 classification. The average accuracies are mostly greater than 98%. The average accuracy tends to be higher than that with ZX pooling.

Convolution circuit	1	2	3	4	5	6	7	8	9	10
Accuracy	86.39	87.81	85.50	85.95	89.32	84.13	85.55	87.51	86.97	88.49
	81.68	79.87	84.13	89.23	88.64	87.66	86.48	90.16	88.83	87.61
	88.30	84.18	85.85	85.46	88.59	85.70	88.49	89.32	86.53	87.32
	80.90	85.11	83.45	88.44	87.90	87.71	88.98	85.70	88.54	89.57
	84.48	82.76	81.29	88.49	87.37	89.72	86.68	84.97	88.88	90.45
	79.97	88.54	83.20	85.65	86.63	86.68	86.19	91.04	87.07	90.55
	83.20	84.77	86.68	89.28	84.77	86.97	85.06	86.53	87.76	89.08
	76.64	86.48	84.67	83.99	86.43	86.34	89.72	89.37	88.30	89.42
	77.23	86.34	83.74	88.54	86.88	85.99	86.92	88.00	90.99	91.14
	87.46	84.57	84.97	87.41	87.41	85.55	87.61	88.05	88.69	87.12
Mean	82.62	85.04	84.35	87.24	87.39	86.65	87.17	88.07	88.26	89.07
Standard deviation	3.86	2.38	1.46	1.75	1.25	1.44	1.43	1.85	1.21	1.34

Table 5: Results of generalized pooling with 2 vs. 3 classification. The average accuracies are typically between 85% and 90%. The classification accuracy tends to be higher when the convolution circuit has more trainable parameters. The average accuracy tends to be higher than that with ZX pooling.

Convolution circuit	1	2	3	4	5	6	7	8	9	10
Accuracy	78.52	83.86	81.29	88.45	86.18	84.97	81.64	85.73	88.15	86.74
	79.37	79.12	82.35	84.17	88.50	84.57	86.18	83.91	87.49	88.60
	86.28	80.38	87.39	82.15	87.59	87.29	84.87	85.83	85.63	86.79
	82.70	80.58	84.06	87.24	87.09	84.27	84.22	89.71	87.75	85.48
	85.33	80.69	86.74	85.58	86.69	85.07	86.89	88.65	83.06	90.77
	87.59	85.58	88.35	87.70	82.55	85.22	85.17	87.29	84.72	88.15
	85.98	86.23	87.70	87.54	86.08	85.58	81.19	90.07	85.27	85.22
	83.46	84.42	83.56	84.22	86.13	87.04	84.01	84.42	87.75	89.71
	80.43	80.64	86.54	82.50	77.26	85.12	87.80	89.41	86.33	88.15
	77.76	84.57	83.51	87.09	84.87	87.54	86.18	85.88	84.47	90.42
Mean	82.74	82.61	85.15	85.66	85.30	85.67	84.82	87.09	86.06	88.00
Standard deviation	3.36	2.44	2.35	2.16	3.08	1.12	2.03	2.14	1.62	1.84

Table 6: Results of generalized pooling with 8,9 classification. The average accuracies are mostly between 85% and 90%. The classification accuracy tends to be higher when the convolution circuit has more trainable parameters. The average accuracy tends to be higher than that with ZX pooling.

Classification	0 vs 1		2 vs 3		8 vs 9
Convolution	10	11	10	11	10	11
Accuracy	97.87	97.16	82.37	83.89	89.06	81.74
	97.49	97.83	78.84	82.03	86.69	87.75
	98.63	88.37	79.87	88.20	86.69	81.64
	98.63	94.56	79.97	76.49	89.31	86.79
	97.12	95.84	79.48	78.26	88.10	83.11
	98.82	97.21	89.28	75.12	87.34	87.90
	99.39	97.21	87.12	83.74	86.13	90.77
	98.77	90.17	86.63	85.70	89.31	88.91
	98.72	96.74	84.33	84.23	89.51	86.99
	98.11	97.68	87.81	84.77	87.54	90.02
Mean	98.35	95.28	83.57	82.24	87.97	86.56
Standard deviation	0.66	3.16	3.74	4.03	1.20	3.13

Table 7: Results of trivial pooling. The average accuracies of trivial pooling with convolution 10 are higher than those with trivial pooling with convolution 11.

Finally, the results for the purely classical TL, namely C2C TL, are listed in Table 8.

Classification	0 vs 1		2 vs 3		8 vs 9
Transfer model	1D 64	2D 76	1D 64	2D 76	1D 64	2D 76
Accuracy	90.73	99.01	85.99	84.82	82.10	79.17
	98.30	96.88	65.03	81.05	64.20	85.83
	97.78	99.29	78.40	71.30	65.46	74.33
	96.22	99.57	82.52	70.96	71.10	81.59
	95.08	97.40	51.81	83.40	85.22	83.26
	98.96	97.02	77.77	78.26	79.07	78.62
	94.75	96.83	63.47	70.67	69.19	83.96
	98.20	95.32	85.60	58.08	77.71	79.02
	95.84	98.35	85.70	74.68	72.67	83.91
	86.24	93.29	78.55	73.11	80.84	85.58
Mean	95.21	97.30	75.48	74.63	74.76	81.53
standard deviation	3.76	1.83	10.97	7.43	6.89	3.49

Table 8: Results of C2C transfer learning.