Self-Score: Self-Supervised Learning on Score-Based Models for MRI Reconstruction

Zhuo-Xu Cui    Chentao Cao    Shaonan Liu    Qingyong Zhu    Jing Cheng    Haifeng Wang    Yanjie Zhu    Dong Liang    \IEEEmembershipSenior Member, IEEE This work was supported in part by the National Key RD Program of China (2020YFA0712202, 2017YFC0108802 and 2017YFC0112903); China Postdoctoral Science Foundation under Grant 2020M682990; National Natural Science Foundation of China (61771463, 81830056, U1805261, 81971611, 61871373, 81729003, 81901736); Natural Science Foundation of Guangdong Province (2018A0303130132); Shenzhen Key Laboratory of Ultrasound Imaging and Therapy (ZDSYS20180206180631473); Shenzhen Peacock Plan Team Program (KQTD20180413181834876); Innovation and Technology Commission of the government of Hong Kong SAR (MRP/001/18X); Strategic Priority Research Program of Chinese Academy of Sciences (XDB25000000).Corresponding author:dong.liang@siat.ac.cnZ.-X. Cui and C. Cao contributed equally to this work.Z.-X. Cui, Q. Zhu and D. Liang are with Research Center for Medical AI, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.C. Cao, J. Cheng, H. Wang, Y. Zhu and D. Liang are with Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.S. Liu is with Department of Computer Science, Inner Mongolia University, Hohhot, China.
Abstract

Recently, score-based diffusion models have shown satisfactory performance in MRI reconstruction. Most of these methods require a large amount of fully sampled MRI data as a training set, which, sometimes, is difficult to acquire in practice. This paper proposes a fully-sampled-data-free score-based diffusion model for MRI reconstruction, which learns the fully sampled MR image prior in a self-supervised manner on undersampled data. Specifically, we first infer the fully sampled MR image distribution from the undersampled data by Bayesian deep learning, then perturb the data distribution and approximate their probability density gradient by training a score function. Leveraging the learned score function as a prior, we can reconstruct the MR image by performing conditioned Langevin Markov chain Monte Carlo (MCMC) sampling. Experiments on the public dataset show that the proposed method outperforms existing self-supervised MRI reconstruction methods and achieves comparable performances with the conventional (fully sampled data trained) score-based diffusion methods.

{IEEEkeywords}

self-supervised learning, score-based models, diffusion process, MRI.

1 Introduction

\IEEEPARstart

Accurate reconstruction of MR images from undersampled measurements is at the heart of accelerated MRI. Mathematically, MRI reconstruction can be reduced to solving an inverse problem. In the early years, traditional methods, including parallel imaging [26, 9, 21] and compressed sensing [3, 8, 20], were usually used to solve MRI inverse problems by introducing hand-crafted priors to realize regularization. However, as the MRI acceleration is further improved, the performance of the methods based on hand-crafted prior degrades [19]. Inspired by the tremendous success of deep learning (DL), DL methods based on data-driven prior characterization have received considerable attention for MRI reconstruction [32, 35, 38, 10, 12, 4, 14].

Recently, score-based diffusion DL models, including denoising score-matching models [28, 29], denoising diffusion probabilistic models (DDPMs) [11] and unified stochastic differential equation (SDE) models [30], etc., have received much attention. Specifically, the score-based diffusion model forward process gradually perturbs the real data to random noise, and the probability density function on the diffusion trajectory is characterized by training a score function. Leveraging the learned score function as a prior, we can perform the Langevin Markov chain Monte Carlo (MCMC) sampling to generate the desired data from random noise. Focusing on MRI, if the score-based diffusion model forward process gradually perturbs the fully sampled MR images, its conditioned Langevin MCMC sampling enables MRI reconstruction. Related studies [13, 5] have shown that the scored-based diffusion model outperforms existing conventional DL imaging methods regarding MRI reconstruction accuracy and generalization ability.

The score function must be trained on a fully sampled MRI dataset for existing scored-based diffusion models. However, accessing numerous fully sampled data might be challenging. Thanks to the development of self-supervised learning in computer vision [17, 23, 27], some models have been extended to MRI reconstruction. In particular, no fully sampled dataset is required, and these models learn MR reconstruction mappings only from undersampled data [34, 22, 2]. However, these self-supervised learning methods are often inferior to supervised learning methods in terms of reconstruction accuracy. Additionally, self-supervised learning methods usually do not generalize well, e.g., models trained under a certain sampling trajectory perform poorly when applied to other sampling trajectories. As mentioned above, the score-based diffusion model has advantages in terms of reconstruction accuracy and generalization. Therefore, it is desired to propose a self-supervised learning score-based diffusion model with high reconstruction accuracy and high generalization capability for MRI reconstruction.

1.1 Contributions and Observations

Motivated by the abovementioned problems, this paper proposes a new self-supervised DL method for MRI reconstruction. Specifically, the main contributions and observations of this work are summarized as follows.

  1. We propose a self-supervised learning score-based diffusion model for the scenario without a fully sampled MRI training set and derive the corresponding conditioned Langevin MCMC sampling for MRI reconstruction.

  2. In the proposed model, we show that the score of fully sampled MR image probability density can be estimated accurately under certain conditions depending only on the undersampled data. It provides theoretical guarantees for the proposed self-supervised learning MRI reconstruction.

  3. In terms of reconstruction accuracy and generalization ability (including sampling trajectories and data shifts), experimental results show that our proposed method outperforms traditional parallel imaging, self-supervised DL, and conventional supervised DL methods and achieves comparable performance with conventional (fully sampled data trained) score-based diffusion methods.

The remainder of the paper is organized as follows. Section 2 reviews some related work. Section 3 discusses the proposed self-supervised score-based diffusion model. The implementation details are presented in Section 4. Experiments performed on several data sets are presented in Section 5. The discussions are presented in Section 6. The last section 7 gives some concluding remarks. All the proofs are presented in the Appendix.

2 Related Work

2.1 Score-Based Diffusion Models

Suppose a certain data set contains i.i.d. sampled data from an unknown distribution and the score function with parameter is an approximation of the gradient of its log probability density, i.e., . Then, Langevin MCMC sampling is performed according to the score function to obtain samples that obey the distribution , i.e.,

(1)

where is the stepsize, and is standard normal. is difficult to calculate directly, the denoising score matching method [31] first perturbs the original data to , and the score function on can be obtained by solving the following optimization problem:

where . Given a sequence of positive noise scales , the data noise perturbation process can be considered as a discretization of a continuous diffusion process. In particular, several works model this diffusion process through Markov chains [11] and Itô SDEs [30]. Leveraging the learned score function, we can perform the reverse process (Langevin MCMC sampling) of above diffusion model to generate the data from random noise , where . Typically, if is sufficiently small, it can be assumed that .

Focusing on MRI, data represents the fully sampled MR images. In many practical applications, a large training set of fully sampled MRI images is difficult to acquire, which limits the application of score-based diffusion models.

2.2 Self-Supervised Learning Methods

Self-supervised learning has been studied in MRI for a longer period. For example, the classical -space interpolation method, which first fully samples a calibrated region in the central part of the -space, learns a linear kernel and uses it in a translation-invariant manner to interpolate in the missing -space data [9, 21]. Inspired by deep learning, linear kernels are generalized to convolutional neural networks to improve the accuracy of missing data interpolation [1, 15].

On the one hand, inspired by self-supervised learning denoising methods in computer vision, self-supervised MRI reconstruction methods that do not require fully sampled calibration data have been proposed [34, 22, 7]. More specifically, an undersampled -space data pair is constructed by drawing on the self-supervised denoising method to construct a noisy data pair, and then the interpolation relationship between the missing data and the sampled data is learned from it to achieve missing data interpolation (MRI reconstruction).

However, in terms of imaging quality, many experiments show that self-supervised learning methods tend to perform inferior to supervised learning methods. Moreover, as in -space interpolation methods, the self-supervised learning of the interpolation kernel relies on the sampling trajectory, thus leading to a lack of generalization ability. Therefore, a novel self-supervised learning method with high reconstruction quality and generalization ability has been desired for MRI reconstruction.

3 Methodology

In this section, we first estimate MR image data distribution by self-supervised Bayesian learning and then introduce the probability density gradient estimation by the score matching method to perform Langevin MCMC sampling (MRI reconstruction).

3.1 MRI Forward Model

The forward model of MRI can be expressed as

(2)

where is the encoding matrix, is the measurement, is the image to be reconstructed, and is the measurement Gaussian noise with scare i.e., . In particular, for the case of multichannel acquisition, where denotes the coil sensitivities and represents the Fourier transform, and denotes the undersampling pattern. Since is an ill-conditioned operator, it is difficult to reconstruct the image from accurately and stably. Therefore, it is necessary to introduce a prior of to realize regularization. This paper’s main task is to accurately estimate the distribution of the fully sampled image from the undersampled data .

Illustration of the self-supervised score-based diffusion model. (a) first, the undersampled
Figure 1: Illustration of the self-supervised score-based diffusion model. (a) first, the undersampled -space data pairs are constructed, the BCNN is trained by minimizing the KL divergence, and finally the model parameter distribution is output. Specifically, the the architecture of BCNN follows the POCS-SPIRiT model driven neural network [6] and its network parameters contain standard deviation and mean . When tested and trained, a normal distribution noise is first generated and then the convolutional kernel is determined by sampling, i.e., . (b) Forward process: learn the score function approximating the probability density gradient of by perturbing the with Gaussian noise at different scales. Reverse process: perform MCMC sampling conditional on the measurement to reconstruct MR image using the learned score function as a prior.

3.2 Self-Supervised Bayesian Learning

Before presenting our method, we make the following key assumption:

Assumption 1

Let denote the MR image, denote the measurement, and denotes a further subsample of . Suppose there exists a mapping with parameter such that

where denotes the image obtained by inverse Fourier transform and channel merging of , i.e., , and denote the Gaussian noise with scales and .

Remark 3.1

This assumption can be considered a generalization of the ”linear interpolability” and ”translation invariance” of the classical -space interpolation method. In addition, it was proved theoretically that this mapping exists by expectation [22].

According to Assumption 1, we can obtain the conditional distribution , and if we can further obtain the distribution of the parameters , we can infer the distribution of image using the following Bayesian formula

(3)

Therefore, our next objective is to estimate the distribution of . Defining undersampled -space data pairs , we estimate by Bayesian inference, i.e., by using to estimate , where

, , and . In this paper, we use a Bayesian convolutional neural network (BCNN) to represent and output the distribution . The framework of self-supervised Bayesian DL is shown in Figure 1. In particular, the architecture of mapping or BCNN can be directly adopted from the POCS-SPIRiT model driven neural network [6]. Introduced randomness, the network parameters contain standard deviation and mean . For parameter , we first generate normal distribution noise , and then determine the convolutional kernel by sampling, i.e., .

The tight approximation is achieved by minimizing the following KL-divergence:

(4)

The final calculation result is as follows:

Proposition 3.1

Suppose that and prior . The minimum of the KL-divergence (4) can be calculated by

(5)

The calculation process of Proposition 3.1 is shown in Appendix .1.

3.3 Score-Based Diffusion Model

Bringing the above estimated into the Bayesian equation (3), we can obtain the distribution . However, the integral in (3) is difficult to calculate exactly in general. If is accessible, samples obeying the distribution can be collected according to Langevin MCMC sampling (1).

Following the score matching method, we can estimate in a self-supervised manner by the following optimization problem.

Proposition 3.2

Suppose Assumption 1 holds. The minimizer of can be obtained by equivalently minimizing the following objective:

The calculation process of Proposition 3.2 is shown in Appendix .2.

Based on Proposition 3.2, the score function at different perturbation levels can be learned, and then the desired samples are obtained by performing MCMC sampling according to them. More specifically, we perturb the by Gaussian noise with scales that satisfies . Let and perturbed data distribution is . If is chosen such that , then holds. Based on Proposition 3.2, we can estimate at all the scales by training a joint score function with the following loss:

(6)

where is the minimizer of objective (5).

After training an score function, we can perform the condition Langevin MCMC sampling to solve problem (2), i.e.,

In particular, the choice of parameter follows that of literature [29], and the conditional Langevin MCMC sampling is detailed in Algorithm 1.

1:  Input: , and ;
2:  Initialize: ;
3:  for  do
4:     
5:     for  do
6:        Draw :
7:     end for
8:     
9:  end for
10:  Output: .
Algorithm 1 Conditional Langevin MCMC Sampling.

The advantages of our proposed method over existing self-supervised learning methods (learning the mapping directly from undersampled data) are twofold.

  1. Intuitively, the score matching model (6) works by separating out the noise from the current image , so even if is not a perfectly clean image, the impact on the noise separation mechanism () learning is relatively small.

  2. Methodologically, we utilize the prior on a set of model over a set of data rather than the prior of a single model over a single set of data i.e., , and . It has been shown that such ensemble models can often outperform single models [37].

4 Implementation

The evaluation was performed on fastMRI public MRI data, and the data acquired by our Siemens scans with various -space trajectories. The details of the MR data are as follows:

4.1 Data Acquisition

4.1.1 FastMRI data

The knee raw data 111https://fastmri.org/ was acquired from a 3T Siemens scanner (Siemens Magnetom Skyra, Prisma and Biograph mMR). Data acquisition used a 15 channel knee coil array and conventional Cartesian 2D TSE protocol employed clinically at NYU School of Medicine. The following sequence parameters were used: Echo train length 4, matrix size , in-plane resolution , slice thickness , no gap between slices. Timing varied between systems, with repetition time (TR) ranging between 2200 and 3000 milliseconds, and echo time (TE) between 27 and 34 milliseconds. From them, we randomly select T1-weighted data of 34 individuals (1002 slices in total) as the training set and data of 3 individuals (95 slices in total) as the test set.

To verify the generalizability, we will test the knee data trained model’s performance on brain MRI reconstruction. Therefore, a randomly selected set of brain (10 slices in total) data from the fastMRI dataset is also used as the training set.

4.1.2 SIAT data

For the training data, overall 1000 fully sampled multi contrast data from 10 subjects with a 3T scanner (Siemens Trio, Siemens AG, Erlangen, Germany) were collected and informed consent was obtained from the imaging object in compliance with the IRB policy. The fully sampled data was acquired by a 12-channel head coil with matrix size of and combined to single-channel complex-valued data. For the testing set, we draw 50 slices from the data of 7 human brain datasets acquired from three different commercial 3T scanners (SIEMENS AG, Erlangen, Germany; GE Healthcare, Waukesha, WI; United Imaging Healthcare, Shanghai, China).

4.2 Network Architecture and Training

The schematic diagram of the BCNN (including ) architecture is illustrated in Figures 2. Since Assumption 1 stems from the generalization of ”linear predictability” and ”flat invariance” of the -space data, we followed the SPIRiT-POCS model neural network [6] for representing function . In particular, to introduce randomness, we replaced the last layer of the module with a Bayesian neural network. On the other hand, we used the NCSNv2 network [29] to learn the score function, whose specific parameters are set: , , number of classes is 266, ema is true, ema rate is 0.999.

Schematic diagram of the network architecture of the
Figure 2: Schematic diagram of the network architecture of the . The upper and lower convolutional network modules exploit redundancies in the image domain and self-consistency in the -space. denotes the projection onto , i.e., , where is the undersampling pattern of . In particular, the above module contains five (Bayesian) convolutional layers and recurses it ten times to obtain the architecture of . In particular, , as well as the distribution of , constructs BCNN.

The ADAM [16] optimizer with is chosen for optimizing loss functions (6) and (5). The size of the mini batch is 1, the number of epochs is 200 and the learning rate is . As shown in Figure 3, in constructing undersampled data pairs, we followed [34] to obtain by further Gaussian undersampling of . The models were implemented on an Ubuntu 20.04 operating system equipped with an NVIDIA A6000 Tensor Core (GPU, 48 GB memory) in the open PyTorch 1.10 framework [25] with CUDA 11.3 and CUDNN support.

An example of an undersampling trajectory on
Figure 3: An example of an undersampling trajectory on and an undersampling trajectory on .

4.3 Performance Evaluation

In this study, the quantitative evaluations were all calculated on the image domain. The reconstructed and reference images were derived using an inverse Fourier transform followed by an elementwise square-root of sum-of-the squares (SSoS) operation, i.e. , where denotes the -th element of image , and denotes the -th element of the th coil image . For quantitative evaluation, the peak signal-to-noise ratio (PSNR), normalized mean square error (NMSE) value, and structural similarity (SSIM) index [33] were adopted.

5 Experimentation Results

In this section, a series of extensive comparative experiments were studied. In particular, we compared the traditional PI method SENSE [26]. Since the BCNN (termed ”self-supervised”) proposed in Section 3 can be considered an improved version of the self-supervised MRI reconstruction method SSDU [34], we made it a comparison method and an ablation method. To further verify the superiority of the proposed method, we also compared it with the supervised MRI reconstruction method ISTA-Net (termed ”supervised”) [36]. Specifically, our code is available at https://github.com/ZhuoxuCui/Self-Score.

5.1 In-Distribution Performance

In this section, we tested the performance of various methods when sampling patterns and data types (anatomies) are consistent in training and testing. Figure 4 shows the reconstruction results for 5-fold random undersampling of fastMRI knee data and 4-fold uniform sampling of SIAT brain data. For the traditional PI method SENSE, the aliasing pattern remains in the reconstructed image and noise is amplified to the point that image details are obliterated. An obvious aliasing pattern remains in the reconstructed images for the self-supervised learning method. Although the noise is well suppressed for the supervised learning method, a tiny aliasing pattern remains in the reconstructed image where the red arrow points. The reconstructed image is also blurrier, and some texture information is lost from the visual perception. Our proposed method performs well in aliasing pattern suppression and image texture detail recovery. In particular, it is worth mentioning that the proposed fully-sampled-data-free method outperforms the conventional supervised DL method that requires a fully sampled training dataset, which is of practical significance.

Reconstruction of the fastMRI knee (first two rows) and SIAT brain (last two rows) data at random sampling of
Figure 4: Reconstruction of the fastMRI knee (first two rows) and SIAT brain (last two rows) data at random sampling of and uniform undersampling of , respectively. The values in the corner are NMSE/PSNR/SSIM values. Second and fourth rows illustrate the enlarged views. The grayscale of the reconstructed images is at the right of the figure.

The competitive quantitative results of the above methods are shown in Table 1. Our method consistently outperforms traditional SENSE, conventional self-supervised and supervised methods characterized by quantitative evaluations. The above experiments confirm our method’s competitiveness under consistent training and testing environments.

Datasets Quantitative Evaluation
& Methods NMSE PSNR(dB) SSIM
fastMRI Knee SENSE 0.04120.0457 27.012.96 0.520.14
Supervised 0.00720.0053 34.002.06 0.900.03
Self-Supervised 0.00890.0036 32.771.45 0.880.02
Self-Score 0.00370.0012 36.461.70 0.910.02
SIAT Brain SENSE 0.03120.0136 31.401.57 0.710.05
Supervised 0.00610.0021 38.251.55 0.940.01
Self-Supervised 0.00880.0022 36.541.24 0.940.01
Self-Score 0.00510.0016 39.041.08 0.940.01
Table 1: Quantitative comparison for various methods on the fastMRI and SIAT dataset.

5.2 Out-of-Distribution Performance

Good generalization ability is necessary for realistic applications of DL methods. However, as mentioned earlier, some existing self-supervised learning MR reconstruction methods are generalized from -space interpolation, which will not work when sampling patterns and -space data structures are inconsistent in training and testing. The proposed method eventually learns the distribution of fully sampled MR images, free from sampling patterns and data structure limitations. Therefore, the proposed method tends to have better generalization ability. To verify this claim, we designed two experiments as follows.

5.2.1 Pattern Shift

In this section, we verified the performance of various methods when the sampling patterns were inconsistent during training and testing. Figure 5 shows the results of various methods trained on a 5-fold randomly sampling pattern and reconstructed on 6-fold uniformly sampled data. Although SENSE is not affected by the pattern shift, it performs poorly because the prior is not sufficiently introduced. The aliasing pattern remains in the reconstruction results of both supervised and self-supervised learning methods. In particular, comparing Figure 4 with Figure 5, we can see that both supervised and self-supervised methods degrade significantly due to the pattern shift. On the other hand, it is easy to see that our proposed method achieves satisfactory performance in both aliasing pattern suppression and detail recovery. The quantitative metrics are shown in Table 2, confirming our method’s superiority.

Reconstruction results under uniform undersampling at
Figure 5: Reconstruction results under uniform undersampling at . The corresponding models were trained on a randomly sampled pattern at . The values in the corner are NMSE/PSNR/SSIM values of each slice. Second and third rows illustrate the enlarged and error views, respectively. The grayscale of the reconstructed images and the color bar of the error images are at the right of the figure.
Datasets Quantitative Evaluation
& Methods NMSE PSNR(dB) SSIM
fastMRI Knee SENSE 0.05290.0535 25.842.97 0.480.14
Supervised 0.01520.0091 30.672.06 0.880.02
Self-Supervised 0.01710.0094 30.192.09 0.870.02
Self-Score 0.00490.0023 35.491.54 0.900.03
Table 2: Quantitative comparison for various methods on the fastMRI knee dataset.

The above experiments validate the superior generalization ability of the proposed method in terms of sampling pattern shifts.

5.2.2 Data Shift

In this section, we verified the performance of various methods when the data type (anatomies) were inconsistent during training and testing. Figure 6 shows the results of various methods trained on fastMRI knee data and reconstructed on fastMRI brain. The self-supervised method based on -space interpolation in the paper is no longer applicable due to the different number of channels in the two data sets. Similar to the pattern shift experiment results, SENSE performs poorly, although it is not affected by data shift. The supervised learning method reconstructs the image aliasing pattern residuals, verifying that it also generalizes poorly in data shift. The proposed method in this paper can accurately reconstruct images, thus verifying its superior generalization in data shift.

Reconstruction results of fastMRI brain data under uniform undersampling at
Figure 6: Reconstruction results of fastMRI brain data under uniform undersampling at . The corresponding models were trained on fastMRI knee data. The values in the corner are NMSE/PSNR/SSIM values of each slice. Second and third rows illustrate the enlarged and error views, respectively. The grayscale of the reconstructed images and the color bar of the error images are at the right of the figure.
Datasets Quantitative Evaluation
& Methods NMSE PSNR(dB) SSIM
fastMRI Brain SENSE 0.17030.1559 22.051.56 0.310.09
Supervised 0.02730.0124 29.371.22 0.850.03
Self-Score 0.00680.0042 35.531.31 0.900.03
Table 3: Quantitative comparison for various methods on the fastMRI brain dataset.

The above experiments verified that the proposed method can accurately reconstruct images and has good generalization ability without the fully sampled training data.

6 Discussion

In this study, we proposed a self-supervised score-based diffusion model for MRI reconstruction. In the above comparative experiments, we verified the superiority of the proposed method compared to the traditional parallel imaging, self-supervised DL and conventional supervised DL methods. However, some areas still need further discussion or improvement for our proposed model.

6.1 Comparison Experiment with Conventional Score-Based Method

Theoretically, the accuracy of the proposed method regarding the estimation of data distribution depends mainly on Assumption 1. Although it is a natural generalization of ”linear predictability” and ”translational invariance” in -space interpolation methods, there may still be some cases where it is not fully accurate. Therefore, we designed comparison experiments with the conventional score-based diffusion method trained on fully sampled data to verify the accuracy of the proposed self-supervised learning method on the data distribution estimation.

Figure 7 shows the reconstruction results of the conventional score-based method (termed score) and our proposed self-supervised score-based method (termed self-score) for fastMRI knee and SIAT brain data under 6-fold and 4-fold uniform undersampling, respectively. In terms of visual perception, the two methods perform almost identically. The quantitative metrics are shown in Table 4. It can be found that the proposed self-score method performs almost identically to the score method on the fastMRI knee dataset and even slightly better than the score on the SIAT brain dataset. This experiment validates the accuracy of the proposed self-supervised learning method for data distribution estimation.

Reconstruction of the fastMRI knee (first two rows) and SIAT brain (last two rows) data at uniform undersampling of
Figure 7: Reconstruction of the fastMRI knee (first two rows) and SIAT brain (last two rows) data at uniform undersampling of and , respectively. The values in the corner are NMSE/PSNR/SSIM values. Second and fourth rows illustrate the enlarged views. The grayscale of the reconstructed images is at the right of the figure.
Datasets Quantitative Evaluation
& Methods NMSE PSNR(dB) SSIM
fastMRI Knee Score 0.00480.0027 35.641.78 0.900.04
Self-Score 0.00490.0023 35.491.54 0.900.03
SIAT Brain Score 0.00570.0019 38.571.17 0.920.03
Self-Score 0.00510.0016 39.041.08 0.940.02
fastMRI Brain Score 0.00660.0054 36.101.64 0.900.04
Self-Score 0.00680.0042 35.531.31 0.900.03
Table 4: Quantitative comparison for various methods on the fastMRI and SIAT dataset.

We also compared the generalization capabilities of the two methods described above, i.e., trained on the fastMRI knee dataset but applied to reconstruct 6-fold uniformly undersampled fastMRI brain MR images. From the results presented in Figure 8, we can find that the proposed self-supervised score-based method trained on undersampled data performs very close to the method trained on fully sampled data in terms of the visual perception and error of the reconstructed images. The above experimental result further validates the accuracy of the proposed self-supervised learning method for estimating the distribution of the fully sampled MRI data.

Reconstruction of the fastMRI brain at uniform undersampling of
Figure 8: Reconstruction of the fastMRI brain at uniform undersampling of . The values in the corner are NMSE/PSNR/SSIM values. Second row illustrates the error view. The grayscale of the reconstructed images and the color bar of the error images are at the right of the figure.

6.2 Further Discussion and Improvements

As mentioned above, although Assumption 1 is a common assumption in -space interpolation methods, there is no rigorous theory to guarantee its correctness. Therefore, it is a future work for us to design a self-supervised score-based method for MRI reconstruction without relying on Assumption 1.

For the parameter distribution estimation, we use a simple Gaussian distribution to approximate it. On the other hand, it is also possible to collect the sample by MCMC sampling and then approximate the distribution of by . Given the successful application of approximation in computer vision [24, 18], such an approach is adopted in this paper. It is worth noting that the MCMC method and the more accurate distribution approximation methods will be reserved as our options.

Finally, for conventional DL MRI reconstruction methods, the learned network can map the undersampled -space data directly to the reconstructed image, which takes little time. Our method (including other score-based diffusion methods) needs to perform an iteration (MCMC sampling) to reconstruct the image, which takes a relatively long time. Reducing the reconstruction (sampling) time by introducing acceleration means, such as momentum, is a direction worth investigating in the future.

7 Conclusion

This paper proposed a self-supervised score-based diffusion model for MRI reconstruction. Unlike the conventional score-based method, the proposed method learns the full-sampling MRI image prior to the undersampled data in a self-supervised manner. Based on the learned prior, this paper also demonstrated the implementation of MRI reconstruction with MCMC sampling conditional on undersampled -space data. Experiments on the public dataset showed that the proposed method outperforms existing self-supervised and conventional supervised MRI reconstruction methods and achieves comparable performances with the conventional (fully sampled data trained) score-based diffusion methods. Our method could be a powerful framework for MRI reconstruction, and further development of this method may enable even larger gains in the future.

.1 Proof of Proposition 3.1

By Bayesian formula, the KL-divergence can be calculated as

For the first part, we have

For the second part, we have

Combining above two equations together, proof is completed.

.2 Proof of Proposition 3.2

This proof is based on literature [31]. First, we have

where denotes a constant. For the second term, we have

where the first and fourth equations make use of the derivative property of the log function. On the other hand, we have

where denotes a constant. The proof is completed.

References

  • [1] M. Akçakaya, S. Moeller, S. Weingärtner, and K. Uğurbil (2019) Scan-specific robust artificial-neural-networks for k-space interpolation (raki) reconstruction: database-free deep learning for fast imaging. Magnetic Resonance in Medicine 81 (1), pp. 439–453. Cited by: §2.2.
  • [2] M. Akçakaya, B. Yaman, H. Chung, and J. C. Ye (2022) Unsupervised deep learning methods for biological image reconstruction and enhancement: an overview from a signal processing perspective. IEEE Signal Processing Magazine 39 (2), pp. 28–44. Cited by: §1.
  • [3] E. J. Candès, J. Romberg, and T. Tao (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52 (2), pp. 489–509. Cited by: §1.
  • [4] J. Cheng, Z. Cui, W. Huang, Z. Ke, L. Ying, H. Wang, Y. Zhu, and D. Liang (2021) Learning data consistency and its application to dynamic mr imaging. IEEE Transactions on Medical Imaging 40 (11), pp. 3140–3153. Cited by: §1.
  • [5] H. Chung and J. C. Ye (2022) Score-based diffusion models for accelerated mri. Medical Image Analysis, pp. 102479. Cited by: §1.
  • [6] Z. Cui, J. Cheng, Q. Zhu, Y. Liu, S. Jia, K. Zhao, Z. Ke, W. Huang, H. Wang, Y. Zhu, et al. (2021) Equilibrated zeroth-order unrolled deep networks for accelerated mri. arXiv preprint arXiv:2112.09891. Cited by: Figure 1, §3.2, §4.2.
  • [7] Z. Cui, S. Jia, Q. Zhu, C. Liu, Z. Qiu, Y. Liu, J. Cheng, H. Wang, Y. Zhu, and D. Liang (2022) K-unn: k-space interpolation with untrained neural network. arXiv preprint arXiv:2208.05827. Cited by: §2.2.
  • [8] D. L. Donoho (2006) Compressed sensing. IEEE Transactions on information theory 52 (4), pp. 1289–1306. Cited by: §1.
  • [9] M. A. Griswold, P. M. Jakob, R. M. Heidemann, M. Nittka, V. Jellus, J. Wang, B. Kiefer, and A. Haase (2002) Generalized autocalibrating partially parallel acquisitions (grappa). Magnetic Resonance in Medicine 47 (6), pp. 1202–1210. Cited by: §1, §2.2.
  • [10] Y. Han, L. Sunwoo, and J. C. Ye (2020) -Space deep learning for accelerated mri. IEEE Transactions on Medical Imaging 39 (2), pp. 377–386. Cited by: §1.
  • [11] J. Ho, A. Jain, and P. Abbeel (2020) Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Vol. 33, pp. 6840–6851. Cited by: §1, §2.1.
  • [12] W. Huang, Z. Ke, Z. Cui, J. Cheng, and D. Liang (2021) Deep low-rank plus sparse network for dynamic mr imaging. Medical Image Analysis 73 (), pp. 102190. Cited by: §1.
  • [13] A. Jalal, M. Arvinte, G. Daras, E. Price, A. G. Dimakis, and J. Tamir (2021) Robust compressed sensing mri with deep generative priors. In Advances in Neural Information Processing Systems, Vol. 34, pp. 14938–14954. Cited by: §1.
  • [14] Z. Ke, W. Huang, Z. Cui, J. Cheng, S. Jia, H. Wang, X. Liu, H. Zheng, L. Ying, Y. Zhu, and D. Liang (2021) Learned low-rank priors in dynamic mr imaging. IEEE Transactions on Medical Imaging (), pp. 1–1. Cited by: §1.
  • [15] T. H. Kim, P. Garg, and J. P. Haldar (2019) LORAKI: autocalibrated recurrent neural networks for autoregressive mri reconstruction in k-space. arXiv preprint arXiv:1904.09390. Cited by: §2.2.
  • [16] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, Cited by: §4.2.
  • [17] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila (2018) Noise2Noise: learning image restoration without clean data. arXiv preprint arXiv:1803.04189. Cited by: §1.
  • [18] J. Li, Y. Nan, and H. Ji (2022) Un-supervised learning for blind image deconvolution via monte-carlo sampling. Inverse Problems 38 (3), pp. 035012. Cited by: §6.2.
  • [19] D. Liang, J. Cheng, Z. Ke, and L. Ying (2020) Deep magnetic resonance image reconstruction: inverse problems meet neural networks. IEEE Signal Processing Magazine 37 (1), pp. 141–151. Cited by: §1.
  • [20] M. Lustig, D. L. Donoho, and J. M. Pauly (2007) Sparse MRI: the application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. Cited by: §1.
  • [21] M. Lustig and J. M. Pauly (2010) SPIRiT: iterative self-consistent parallel imaging reconstruction from arbitrary k-space. Magnetic Resonance in Medicine 64 (2), pp. 457–471. Cited by: §1, §2.2.
  • [22] C. Millard and M. Chiew (2022) Self-supervised deep learning mri reconstruction with noisier2noise. arXiv preprint arXiv:2205.10278. Cited by: §1, §2.2, Remark 3.1.
  • [23] N. Moran, D. Schmidt, Y. Zhong, and P. Coady (2020) Noisier2Noise: learning to denoise from unpaired noisy data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  • [24] T. Pang, Y. Quan, and H. Ji (2020) Self-supervised bayesian deep learning for image recovery with applications to compressive sensing. In European Conference on Computer Vision, pp. 475–491. Cited by: §6.2.
  • [25] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. External Links: 1912.01703 Cited by: §4.2.
  • [26] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger (1999) SENSE: sensitivity encoding for fast mri. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 42 (5), pp. 952–962. Cited by: §1, §5.
  • [27] Y. Quan, M. Chen, T. Pang, and H. Ji (2020) Self2Self with dropout: learning self-supervised denoising from single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  • [28] Y. Song and S. Ermon (2019) Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, Vol. 32, pp. . Cited by: §1.
  • [29] Y. Song and S. Ermon (2020) Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, Vol. 33, pp. 12438–12448. Cited by: §1, §3.3, §4.2.
  • [30] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole (2021) Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, Cited by: §1, §2.1.
  • [31] P. Vincent (2011) A connection between score matching and denoising autoencoders. Neural Computation 23 (7), pp. 1661–1674. Cited by: 2.§, §2.1.
  • [32] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang (2016) Accelerating magnetic resonance imaging via deep learning. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), Vol. , pp. 514–517. Cited by: §1.
  • [33] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §4.3.
  • [34] B. Yaman, S. A. H. Hosseini, S. Moeller, J. Ellermann, K. Uğurbil, and M. Akçakaya (2020) Self-supervised physics-based deep learning mri reconstruction without fully-sampled data. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Vol. , pp. 921–925. Cited by: §1, §2.2, §4.2, §5.
  • [35] Y. Yang, J. Sun, H. Li, and Z. Xu (2016) Deep ADMM-Net for compressive sensing MRI. In Advances in Neural Information Processing Systems, Cited by: §1.
  • [36] J. Zhang and B. Ghanem (2018-06) ISTA-net: interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §5.
  • [37] Z.H. Zhou (2012) Ensemble methods: foundations and algorithms. Taylor & Francis. External Links: ISBN 9781439830031, LCCN 2012014555 Cited by: item 2.
  • [38] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen (2018) Image reconstruction by domain-transform manifold learning. Nature 555 (7697), pp. 1476–4687. Cited by: §1.