Unsupervised diffeomorphic cardiac image registration using parameterization of the deformation field

Ameneh Sheikhjafari sheikhja@ualberta.ca Deepa Krishnaswamy deepa@ualberta.ca Michelle Noga mnoga@ualberta.ca Nilanjan Ray1 nray1@ualberta.ca Kumaradevan Punithakumar1 punithak@ualberta.ca Department of Computing Science, University of Alberta, Edmonton, AB T6G 2G8, Canada Department of Radiology and Diagnostic Imaging, Edmonton, AB T6G 2G8, Canada
11Nilanjan Ray and Kumaradevan Punithakumar contributed equally as co–senior authors to this work.
Abstract

This study proposes an end-to-end unsupervised diffeomorphic deformable registration framework based on moving mesh parameterization. Using this parameterization, a deformation field can be modeled with its transformation Jacobian determinant and curl of end velocity field. The new model of the deformation field has three important advantages; firstly, it relaxes the need for an explicit regularization term and the corresponding weight in the cost function. The smoothness is implicitly embedded in the solution which results in a physically plausible deformation field. Secondly, it guarantees diffeomorphism through explicit constraints applied to the transformation Jacobian determinant to keep it positive. Finally, it is suitable for cardiac data processing, since the nature of this parameterization is to define the deformation field in terms of the radial and rotational components. The effectiveness of the algorithm is investigated by evaluating the proposed method on three different data sets including 2D and 3D cardiac MRI scans. The results demonstrate that the proposed framework outperforms existing learning-based and non-learning-based methods while generating diffeomorphic transformations.

Keywords: Deformable image registration, Diffeomorphic registration learning, Moving mesh grid generation, Unsupervised deep learning

keywords:
\KWDDeformable image registration, Diffeomorphic registration learning, Moving mesh grid generation, Unsupervised deep learning

Preprint submitted to Medical Image Analysis

1 Introduction

Deformable image registration plays a fundamental role in a variety of medical image analyses such as image guided-surgery han2021fracture, visual stabilization sheikhjafari2015robust, reconstruction liu2021rethinking, and the construction of many other image analysis problems krebs2019learning; haskins2019deep; sheikhjafari20153d. Many existing state-of-the-art deformable registration methods use traditional iterative algorithms, such as standard symmetric normalization (SyN) wu2018automated and log-domain based transformation mansi2011ilogdemons. Due to the important properties such as folding-free and invertiblity dalca2018unsupervised of diffeomorphic transformation, a wide range of researchers utilized diffeomorphisms by adding constraints to their formulation zhang2015finite; avants2008symmetric; vercauteren2008symmetric; punithakumar2013regional. These traditional algorithms are computationally expensive and do not learn the features from data to be registered. Recently, sheikhjafari2022training proposed a convolutional neural network (CNN) to model the optimization problem for deformable registration and shared the parameters through a temporal sequence. However, they still establish the displacement field via iterative optimization between images. Even though traditional deformable image registration techniques can generate promising mappings between images, most of these methods require users to identify parameters that match the characteristics of the problem and manually adjust regularization terms for each application to obtain accurate results.

In recent years, the popularity of learning-based registration algorithms has been increasing due to the lower computational costs and execution times krebs2018unsupervised. In supervised-learning methods, a CNN is trained using examples of medical images along with their ground truth transformations to predict the transformations directly on test images rohe2017svf; cao2017deformable. Even though the accuracy of these approaches is considerable, their performance is highly dependent on the quality of the ground truth sang2020imposing. One of the most significant challenges in applying the supervised methods to medical imaging applications is that the actual ground truth of a desired neural network output is not often available.

With that limitation in mind, several unsupervised learning-based image registrations have been proposed. Most of the unsupervised approaches use spatial transformer layers (STN) to warp the moving image in a differentiable way. In this way, an optimization can be performed by using a similarity metric based on the warped image jaderberg2015spatial; de2017end; de2019deep; balakrishnan2018unsupervised; sheikhjafari2018unsupervised.

When image registration is stated as an optimization of a similarity metric alone, it is commonly understood as an ill-posed problem. To tackle this problem, a regularization approach is commonly used. Without regularization, this may result in multiple and physically non-plausible solutions. For instance, it might lead to tissue folding and tearing of anatomical structures in images. Aside from that, while unsupervised approaches can perform well in minimizing a similarity metric between warped moving images and fixed images, important properties such as symmetry, diffeomorphism, and regularity of the retrieved deformation fields are still unclear and missing. rohlfing2011image; haber2004numerical.

Inspired by punithakumar2015right; punithakumar2017gpu; chen2010parameterization, we tackle aforementioned issues with the help of moving mesh parameterization which was originally designed to generate a suitable grid for solving partial differential equations haber2004numerical. We propose a ConvNet method based on unsupervised learning for deformable cardiac registration, which formulates the deformation field by the moving mesh approach. This parameterization naturally leads to a formulation of diffeomorphic image registration as a constrained optimization problem. It also bypasses the need for an explicit regularization term and the corresponding weight in the cost function. Such a strategy has been adopted in the demons algorithm, where unconstrained optimization is followed by Gaussian filtering to impose a smoothness constraint. Using the moving mesh grid generation, we can define a deformation field with its transformation Jacobian determinant and curl of end velocity field which make it appealing to image registration punithakumar2015right; punithakumar2017gpu. The new formulation of the deformation field ensures diffeomorphic properties on the deformation field by explicitly applying constraints on transformation Jacobian determinant to keep it positive. Since the heart motion could be decomposed of radial expansion and twisting garreau2006assessment, defining the deformation field in terms of radial and rotational components makes this formulation suitable for cardiac analysis bijnens2012myocardial.

2 Methodology

Most of the learning-based algorithms formulate the deformable registration problem as the minimization of the following equation:

(1)

where denotes the pixel location in the image domain , denotes the transformation function, and the dissimilarity metric is denoted by . With the above formulation, introducing a regularization is necessarily to obtain a unique solution. Without regularization, this may result in multiple physically non-plausible solutions.

In our setting, we tackle these issues with the help of the moving mesh parameterization.

2.1 Moving Mesh Grid Generation

To avoid adding extra terms to the above formulation and having a unique solution, more constraints are required to be added using a monitor function and curl of end velocity field .

First a continuous monitor function is defined and constrained by:

(2)

The goal here is to find a transformation : , such that the transformation Jacobian determinant is equal to the monitor function :

(3)

To find the transformation which satisfies 3, the following steps need to be taken,

Step 1: A vector field is defined such that:

(4)

Step 2: A velocity vector field based on artificial-time is then constructed from :

(5)

The desire transformation can be found by solving the following ordinary differential equation (ODE) at , where t=0

(6)

Where is the identity mapping and and . Since the is the desire transformation that we are looking for, we drop the subscript and use for the rest of the paper. \@dblfloatalgocf[hpt]     \end@dblfloat The main problem is how to find such that . There are different methods to solve this problem such as the div-curl system. To solve the problem with the div-curl system, we need to find the divergence and curl at each point and set up the div-curl system of equations for each point. By solving this system we can reconstruct a differentiable and invertible transformation.

(7)

To have a unique a constraint need to be applied to the div of the vector field , 7.

The generated transformation now can be parameterized with transformation Jacobian determinant and the curl of the end velocity field.

2.2 Diffeomorphic Image Registration

Using the above parameterization, the diffeomorphic image registration can be formulated as a constrained optimization problem.

Let and be 2D/3D fixed and moving images/volumes, defined over /. We need to find and , that optimize a similarity metric between the warped moving image and fixed image, subject to the following constraints:

(8)

where the is the upper bound and is the lower bound of the transformation Jacobian determinant which were set by the user. The guarantees the diffeomorphism.

Overview of end-to-end unsupervised architecture. The ConvNet
Figure 1: Overview of end-to-end unsupervised architecture. The ConvNet takes the input fixed image() and moving image() and outputs the transformation Jacobian determinant , and the vector field . Then the diffeomorphic forward and backward transformations and are computed using the moving mesh approach. Finally, the moving and fixed images are warped using and .

2.3 Numerical Methods

2.3.1 2D Div-curl solver

We represent the deformation field by divergence and curl (div-curl) system representation Cheng1989 (7). To find under the null condition we converted the (7) into a set of Poisson equations as follows and used a Fast Fourier Transform (FFT) based Poisson solver. As shown in (9) the radial component is given by and the rotational components is given by :

(9)

2.3.2 3D Div-curl solver

The div-curl system for the 3D case is given in Equation (7). Where the divergence of the deformation field represents the radial motion while the curl operator represents the rotation of the media around every point. The 3D operator directly extends from the 2D curl, where each rotational component represents the rotational motion of the deformation field about each of the three axes. As it shown in (10) the radial component is given by and the three rotational components are given by , and . For the 3D version, there ate three unknowns () with four scalar equations which makes this system overdetermined. Furthermore, a dummy variable is introduced to solve the system. (please check liu2006new for more details.)

(10)

Similar to the 2D version, we converted the (10) into a set of Poisson equations as follows:

(11)

Then the Euler method with arbitrary time steps is used to compute the transformation from via (5) and (6). For derivation and numerical implementation details, we refer the reader to liu2006new

2.4 Data driven parameter computation

Despite the traditional methods that iteratively and manually compute the parameters and update the gradient chen2010parameterization; punithakumar2017gpu which are time-consuming, we use an unsupervised CNN and back-propagation Algorithm LABEL:alg6:cnn_alg. In the proposed framework, the network parameters are learnt in an unsupervised fashion and a diffeomorphic deformation field is generated by moving mesh parameterization Figure 1.

As shown in Figure 1, the network takes and as input and outputs the monitor function and the velocity vector filed . Then using the curl of end velocity and a div-cur system a diffeomorphic transformation is computed. To establish the uniqueness of the solution the Dirichlet boundary condition is used zhou2006uniqueness. Additionally, a diffeomorphism, which is corresponded to a positive transformation Jacobian determinant, is enforced explicitly via the monitor function liu2006new. All of the steps are designed to be differentiable and the network parameters are learnt using stochastic gradient descent optimization.

2.5 Registration

To train the framework a set of pair images were given. Then using the monitor function and curl of end velocity, the desire was computed. Finally, the moving image was warped to have the minimum dissimilarity with fixed image . For each pair of image, we simultaneously calculated the forward transformation which registers the fixed image to moving image and the backward transformation which registers the moving image to fixed image . A symmetric loss function is used as follows:

(12)

Where is the forward transformation and is the backward transformation.

The registration process is performed pairwise on both 2D images and 3D volumes. In the cardiac data sets, the end-diastolic and end-systolic images are passed to the proposed framework as input to compute the forward transformation and the reverse transformation . For the 2D version the mean squared error (MSE) and for the 3D version the normalise cross correlation (NCC) is used as dissimilarity metric.

3 Experiments

We perform a series of experiments to evaluate the registration accuracy of the proposed diffeomorphic CNN method against the state-of-the-art methods. The evaluations were performed over three data sets consisting of clinical 2D cardiac MR images to assess the performance of the 2D version of our method. We also evaluated the 3D version of the proposed framework using ACDC data set in 3D.

3.1 Data sets

The following three data sets are considered in this study:

Automated Cardiac Diagnosis Challenge (ACDC) bernard2018deep

This data set contains multiple temporal 2D short-axis cardiac cine MRI sequences acquired from 100 patients and is one of the publicly available data sets for cardiac MRI assessment. The spatial resolution varies from to with a slice thickness of 5 mm to 8 mm (in average 5mm). The testing set contained 20 cases of each of the following cardiac diseases: dilated cardiomyopathy (DCM), hypertrophic cardiomyopathy (HCM), previous myocardial infarction (MINF), abnormal right ventricle (RV) and healthy (Normal). The images are cropped to a size of , and padded the third dimension to for the 3D voxels.

The Sunnybrook Cardiac Challenge data (SCD) radau2009evaluation

This data set contains multiple temporal 2D short-axis cardiac cine MRI scans acquired from 45 patients. Each cine sequence includes 20 frames to cover the cardiac cycle. The data set is equally divided into 15 patient scans for training, 15 patient scans for validation, and 15 patient scans for testing. The image resolution is , with a pixel spacing of 1.25 mm and slice thickness of 8 mm.

Left Atrium (LA)

This data set includes 100 temporal 2D long-axis cine MRI steady-state sequences from the 2, 3 and 4-chamber views, acquired from the University Alberta Hospital. Each cycle includes 25 or 30 frames with image resolutions and image spacing mm. The ground truth manual segmentation is initially performed by a medical student and edited by an experienced radiologist. The 2ch, 3ch and 4ch are used in the rest of the paper to denote 2, 3 and 4-chamber sequences, respectively. The results are compared on end-diastolic and end-systolic frames.

3.2 Quantitative Evaluation Metrics

The proposed method is evaluated quantitatively using four metrics, namely, Dice metric (DM), Hausdorff distance (HD in mm), determinant of Jacobian of the deformation field , and reliability .

Dice Metric The DM dice1945measures is a segmentation-based metric to measure the similarity (overlap) between two regions, warped moving and fixed images. Where the Dice score of 1 indicates complete overlap and Dice score of 0 indicates no overlap. The DM of two regions A and B is formulated as:

(13)

Hausdorff Distance The HD huttenlocher1993comparing is another metric which measures the maximum deviation between two regions’ contours. The HD between two contours and is formulated as:

(14)

where denote the set of all the points in and respectively. The term denotes the Euclidean distance.

Reliability: We also evaluated the performance of the proposed algorithm using a reliability function computed based on DMs for each data set. The complementary cumulative distribution function is defined for each as the probability of obtaining higher than overall volumes.

(15)

measures how reliable the algorithm is in yielding accuracy .

det(J): To analyze deformation regularity in different algorithms, we calculate the determinant of the Jacobian ashburner1999high. If the value of equals , the area remains constant after the transformation, whereas the value smaller or larger than indicates the local area shrinkage or expansion, respectively. The negative value of implies that local folding and twisting have occurred, which are physically not realizable and mathematically not invertible dalca2018unsupervised.

Method Dice HD
Undeformed 0.71 0.15 10.1
Demonyoo2002engineering 0.76 0.10 8.3 0.27 0.36
SyNavants2008symmetric 0.80 0.09 8.1 0.28 0.66
LPMkrebs2019learning 0.79 0.10 7.6 0.38 0.46
MMpunithakumar2017gpu 0.83 0.15 5.64 0 0.81
Elastixmarstal2016simpleelastix 0.84 0.14 4.51 0.12 0.82
Proposed Method 0.88 0.11 3.85 0 0.89
Table 1: Quantitative evaluation of the results for cardiac MRI registration on the 2D ACDC data set. The following metrics are reported for each method: The Dice score (mean standard deviation), Hausdorff distance , the percentage of the number of pixels with negative Jacobian determinant , and reliability . Smaller values of and larger values of indicate more accurate results. Also the smaller indicates less mesh folding. The higher probability values of show that more patients have the dice score higher or equal to . Values that are highlighted in bold indicate the metric that gave the best performance compared to the other algorithms.
(a) 2ch Methods Dice HD Undeformed 0.79 0.07 7.37 Demonsmccormick2014itk 0.84 0.08 7.41 0.38 0.89 SyNavants2008symmetric 0.87 0.06 6.38 0.18 0.95 MMpunithakumar2017gpu 0.84 0.06 6.58 0 0.92 Elastixmarstal2016simpleelastix 0.82 0.11 7.28 0.28 0.74 Proposed Method 0.88 0.04 6.54 0 0.95 (b) 3ch Methods Dice HD Undeformed 0.78 0.08 7.70 Demonsmccormick2014itk 0.85 0.06 7.33 0.36 0.94 SyNavants2008symmetric 0.86 0.13 7.52 0.21 0.93 MMpunithakumar2017gpu 0.83 0.06 6.48 0 0.88 Elastixmarstal2016simpleelastix 0.86 0.10 6.82 0.26 0.9 Proposed Method 0.87 0.05 6.3 0 0.94 (c) 4ch Methods Dice HD Undeformed 0.78 0.09 8.66 Demonsmccormick2014itk 0.82 0.10 7.84 0.43 0.77 SyNavants2008symmetric 0.84 0.11 7.51 0.20 0.86 MMpunithakumar2017gpu 0.83 0.08 6.77 0 0.87 Elastixmarstal2016simpleelastix 0.82 0.10 7.56 0.38 0.64 Proposed Method 0.87 0.05 6.1 0 0.99
Table 2: Quantitative evaluation of the results for cardiac MRI registration on the 2D LA data set. The following metrics are reported for each method: The Dice score (mean standard deviation), Hausdorff distance , the percentage of the number of pixels with negative Jacobian determinant , and reliability . The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber. Values in bold indicate the best performance.
Method Dice HD
Undeformed 0.62 0.15 16.02
Demonsmccormick2014itk 0.68 0.18 12.46 0.4 0.36
SyNavants2008symmetric 0.81 0.16 8.9 0.02 0.70
LPMkrebs2019learning 0.78 0.08 7.6 0.38 0.63
MMpunithakumar2017gpu 0.72 0.12 12.53 0 0.59
Elastixmarstal2016simpleelastix 0.79 0.08 11.12 0.37 0.62
Proposed Method 0.88 0.09 5.25 0 0.90
Table 3: Quantitative evaluation of the results for cardiac MRI registration on the 2D SCD data set. The following metrics are reported for each method: The Dice score (mean standard deviation), Hausdorff distance , the percentage of the number of pixels with negative Jacobian determinant , and reliability . Smaller values of and larger values of indicate more accurate results. Also the smaller indicates less mesh folding. The higher probability values of show that more patients have the dice score higher or equal to . Values that are highlighted in bold indicate the metric that gave the best performance compared to the other algorithms.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Figure 2: Samples of registered images on the left atrium data set with the corresponding deformation field grid (DF). The end-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively. The 2ch, 3ch and 4ch stand for the 2, 3 and 4-chamber.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Figure 3: Samples of registered images on the SCD with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Figure 4: Samples of registered images on the ACDC with the corresponding deformation filed grid (DF). End-systolic (ES) frame is the moving image and end-diastolic (ED) frame is the fixed image. The warped ES of each row is shown in the third column. The last column labeled ground truth (GT) displays the true segmentation and the predicted segmentation, which are shown by the green line and blue line respectively.
Method Dice HD
Undeformed 0.71 0.145 10.1
Demonyoo2002engineering 0.80 0.17 8.3 0.34 0.28
SyNavants2008symmetric 0.80 0.091 8.1 0.17 0.51
LPMkrebs2019learning 0.81 0.085 7.3 0.12 0.52
LapIRN mok2020large 0.72 0.162 7.4 0 0.35
MMKrishnaswamy2021 0.75 0.156 7.03 0 0.56
Elastixmarstal2016simpleelastix 0.83 0.161 5.75 0.09 0.60
Proposed Method 0.84 0.06 5.3 0 0.78
Table 4: Quantitative evaluation of the results for cardiac MRI registration on the 3D ACDC data set. The following metrics are reported for each method: The Dice score (mean standard deviation), Hausdorff distance , the percentage of the number of pixels with negative Jacobian determinant , and reliability . Smaller values of and larger values of indicate more accurate results. Also the smaller indicates less mesh folding. The higher probability values of show that more patients have the dice score higher or equal to . Values that are highlighted in bold indicate the metric that gave the best performance compared to the other algorithms.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.
Figure 5: 2D registration results for five example patients, where the first column is the end-systolic image and the second column is the end-diastolic image. The grid deformations in the 3rd column displays the deformation from end-systole to end-diastole, while the last column displays the deformation from end-diastole to end-systole. The color represents the value of the Jacobian determinant, where red indicates values below 0, which is where mesh folding occurs. It can be seen that using the proposed method, no mesh folding occurs.

3.3 Baseline Methods

We compared the performance of the proposed framework with state-of-the-art algorithms, SimpleElastix (Elastix) marstal2016simpleelastix,(MM) punithakumar2017gpu; Krishnaswamy2021, Fast Symmetric Forces Demons (Demons) mccormick2014itk, Symmetric Normalization avants2008symmetric which are optimization based methods and diffeomorphic learning-based methods LPM krebs2019learning and LapIRN mok2020large.

3.3.1 2D Image Registration Results

Tables 1, 2 and 3 provide a summary of the results of the proposed method, the mean and standard deviations of DM, HD, the percentage of the number of pixels with negative Jacobian determinant , and reliability on the held out test set on ACDC, LA, and SCD data sets, respectively. Figures 2, 3, 4 show samples of registered images on the LA, SCD, and ACDC data sets with the corresponding deformation field grid. End-systolic frame is the moving image and end-diastolic frame is the fixed image. The registered image of each row is shown in the third column. Also, the true and predicted segmentation maps are shown by the green and blue line respectively. For each new 2D pair of images, the registration process takes an average of seconds on a GPU.

The ACDC data set is originally a 3D data set where a set of 2D axial slices are stacked to form a 3D volume. To evaluate the 2D version of the proposed framework on ACDC, we computed 2D metrics on each slice separately and aggregated the results over all slices to obtain the final values reported in Table 1.

The presented method shows a better performance among the all compared methods in all aspects e.g., there is a noticeable difference between the obtain Dice score and Hausdorff distance. As can be seen, the improvement is not just limited to these two parameters, the Jacobian determinant is zero which means there is no folding or twisting in the transformation. This is in contrast to other methods where the determinant Jacobian is non-zero.

Figure. 5 shows the end-diastolic and end-systolic images and the determinant of the Jacobian () with grid overlay for five example patients. As shown in all tables and Figure. 5, no negative values were observed on the test data for the proposed method which means our approach produced smooth and regular deformations.

3.3.2 3D Image Registration results

The publicly available Automated Cardiac Diagnosis Challenge (ACDC) data set was employed for the evaluation of the proposed 3D-to-3D registration algorithm. Table 4 provides a summary of the results of the proposed method on the ACDC data set. The presented method displays a better performance among all the compared methods in all aspects e.g., there is a noticeable difference between the obtained Dice score and Hausdorff distance. Also, the higher probability values of proves that the proposed method is more reliable than the other compared methods since more patients have the dice score higher or equal to . In addition, similar to 2D version, the Jacobian determinant is also zero in 3D version which means there is no mesh folding in the transformation. The registration process takes an average of seconds on a GPU to register an unseen 3D pair of images.

Figure 6 displays a correlation plot, where the ground truth volume in mL is plotted against the volume from the proposed method. The clustering of the dots to the reference yellow line indicates the high agreement between the proposed method to the ground truth. The analysis produced a Pearson correlation coefficient of 0.98.

3.3.3 Implementation and Parameters Analysis

The proposed method is implemented in Python programming language using Pytorch module. The network is designed based on a UNet-style architecture ronneberger2015u which includes a convolutional layer with 16 filters, three downsampling layers with 32,64,64 convolutional filters and a stride of two, and upsampling convolutional layers with 64,64,32,32,32,16 filters. The Adam optimization with learning rate of is used for all the three datasets. The proposed framework is evaluated on an NVIDIA GeForce GTX 1080 Ti GPU.

To guarantee the diffeomorphism and keep the transformation determinant Jacobian positive, different activation functions are used to apply constraints on and and keep their range in and respectively. Where can be any value in range of , we set in our experiment. Then using the two hyper-parameters, lower bound and upper bound of the transformation Jacobian determinant , the user can control the amount of movement which directly affects the evaluation metrics. By increasing the values of and , each node in a grid (each pixel) can have a larger displacement; however, after a certain point, the results do not change significantly. We vary the precision , and set them to , respectively. The chosen values resulted in the best Dice score and HD distance.

The ground truth volume in mL plotted against the volume from the proposed method, where each patient is represented by a blue dot. The yellow dotted line indicates the y=x line for reference. The Pearson correlation coefficient calculated is 0.98, revealing a high correlation of the proposed method to the ground truth.
Figure 6: The ground truth volume in mL plotted against the volume from the proposed method, where each patient is represented by a blue dot. The yellow dotted line indicates the y=x line for reference. The Pearson correlation coefficient calculated is 0.98, revealing a high correlation of the proposed method to the ground truth.

4 Conclusion

In this work, we build a principled connection between classical registration methods and recent learning-based approaches. We propose an end-to-end framework for diffeomorphic image registration and derive a learning algorithm that leverages a convolutional neural network and unsupervised learning for fast runtime. To achieve diffeomorphic transforms, we integrate a new parameterization of deformation fields for 2D-to-2D and 3D-to-3D diffeomorphic registration algorithm for the application of MRI cardiac registration, which describe a deformation field with its transformation Jacobian determinant and curl of the end velocity field. It also relaxes the need for an explicit regularization to produce a physically plausible result, as smoothness is implicitly embedded in the solution. Removing explicit regularization makes the need for an empirical trade-off between the similarity term and the regularization term, which may cause bias beg2005computing, unnecessary.

Furthermore, by directly requiring the transformation Jacobian to be positive, the deformation can be ensured to be diffeomorphic. The other desirable constraints also can be enforced within the same framework using an explicit restriction on the transformation Jacobian such as incompressibility constraint. Additionally, the proposed parameterization naturally describes a deformation field in terms of radial and rotational components, making it especially suited for processing cardiac data noble2002myocardial. Our algorithm can infer the registration of new image pairs in under a second, which is significantly faster than traditional iterative methods. Compared to recent learning-based methods, our method offers a guarantee of a diffeomorphic transform.

The proposed algorithm was evaluated on end-diastolic to end-systolic cardiac cine-MRI registration on two publicly available ACDC Challenge Bernard2018 and Sunnybrook data sets (SCD)radau2009evaluation as well as a set of left atrium images obtained from the Mazankowski Alberta Heart Institute. The proposed algorithm is diffeomorphic, allowing it to capture the true deformation of the cardiac tissue. Observing the percentage of voxels with a Jacobian determinant less than zero, most of the other registration methods yielded mesh folding for either the MRI data sets. The presence of mesh folding may result in the inability of these methods to capture the true anatomical motion.

Acknowledgment

The authors wish to thank Alberta Innovates for the AICE Concepts funding that supported this research work.

References