Fast Auto-Differentiable Digitally Reconstructed Radiographs for Solving Inverse Problems in Intraoperative Imaging

Vivek Gopalakrishnan 1Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA, USA
12Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
2{vivekg,polina}@csail.mit.edu
   Polina Golland 1Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA, USA
12Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
2{vivekg,polina}@csail.mit.edu
Abstract

The use of digitally reconstructed radiographs (DRRs) to solve inverse problems such as slice-to-volume registration and 3D reconstruction is well-studied in preoperative settings. In intraoperative imaging, the utility of DRRs is limited by the challenges in generating them in real-time and supporting optimization procedures that rely on repeated DRR synthesis. While immense progress has been made in accelerating the generation of DRRs through algorithmic refinements and GPU implementations, DRR-based optimization remains slow because most DRR generators do not offer a straightforward way to obtain gradients with respect to the imaging parameters. To make DRRs interoperable with gradient-based optimization and deep learning frameworks, we have reformulated Siddon’s method, the most popular ray-tracing algorithm used in DRR generation, as a series of vectorized tensor operations. We implemented this vectorized version of Siddon’s method in PyTorch, taking advantage of the library’s strong automatic differentiation engine to make this DRR generator fully differentiable with respect to its parameters. Additionally, using GPU-accelerated tensor computation enables our vectorized implementation to achieve rendering speeds equivalent to state-of-the-art DRR generators implemented in CUDA and C++. We illustrate the resulting method in the context of slice-to-volume registration. Moreover, our simulations suggest that the loss landscapes for the slice-to-volume registration problem are convex in the neighborhood of the optimal solution, and gradient-based registration promises a much faster solution than prevailing gradient-free optimization strategies. The proposed DRR generator enables fast computer vision algorithms to support image guidance in minimally invasive procedures. Our implementation is publically available at https://github.com/v715/DiffDRR.

Keywords:
DRRs Differentiable programming Inverse problems.

1 Introduction

Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D computational tomography (CT) volumes using a variety of ray-tracing techniques. While DRRs are widely used in preoperative settings (e.g., optimizing dose delivery in radiation oncology), many potentially valuable intraoperative use cases (e.g., real-time multimodal registration for image-guided procedures) are infeasible due to computational bottlenecks in generating DRRs and using them in slice-to-volume registration. Most open-source CPU-based implementations take about to generate a single DRR [9], which is not fast enough for intraoperative imaging systems with sampling rates of about 7.5 frames per second [8]. Numerous GPU-accelerated DRR generators have been proposed with run times on the order of [1, 5, 7], but to the best of our knowledge, no publically available implementation exists.

The second limitation of currently available DRR generators is that it is challenging to efficiently compute derivatives using these renderers because they are implemented in low-level languages such as CUDA and C++. DRR generators are often used in conjunction with numerical optimization schemes to solve fundamental medical imaging problems (e.g., slice-to-volume registration), and the difficulty in computing derivatives means that gradient-based optimization techniques are often infeasible [13]. While many end-to-end deep learning approaches can solve X-ray to CT registration problems with high accuracy [2, 3], these methods often require large amounts of training data, which can make them impractical for specialized interventional problems. Instead, many applications use iterative gradient-free methods, such as the Nelder-Mead method [11], to optimize an image similarity metric with respect to the parameters of the DRR generator [3, 13]. While these methods are effective for optimizing highly nonlinear loss functions, we show that the loss landscapes for slice-to-volume registration in particular is convex in a large region around the optimum, making this problem better suited for gradient-based optimization methods.

We present a fast vectorized renderer that generates DRRs and their derivatives with respect to image geometry parameters automatically. We utilize PyTorch as a GPU-accelerated tensor algebra library with robust source-to-source automatic differentiation to implement differentiable DRRs. That is, using our implementation, DRR generation can be used as a differential operator to train deep learning algorithms for fast reconstruction and registration algorithms. We analyze the performance of our implementation and the correctness of the automatically obtained derivatives, and demonstrate an experiment where our differentiable DRR generator solves a slice-to-volume registration problem. Our hope is that this open-source package will be useful for translating computer vision algorithms to real-time implementations for interventional applications.

2 Methods

We start by summarizing Siddon’s method [10], commonly used for ray-tracing in DRR synthesis, and its extensions that accelerate rendering speed. We then describe our vectorized implementation of Siddon’s method, which achieves rendering speeds equivalent to those of existing GPU-accelerated methods while also being fully differentiable.

 (a) We assume an idealized model of a projectional radiography imaging system: X-ray beams are emitted with a fixed initial energy from a point source
Figure 1: DRR synthesis. (a) We assume an idealized model of a projectional radiography imaging system: X-ray beams are emitted with a fixed initial energy from a point source , beam energy diminishes as the X-rays travel through the CT volume , and energy in the attenuated beams is measured when the X-rays hit a point on the detector , producing the DRR. (b) In Siddon’s method, the image location value at is a weighted average of the intensities of the voxels through which the ray passes, where the weight is the length of the ray’s intersection with the voxel. The values and parameterize the intersection of the ray with two adjacent planes, and the midpoint identifies the current voxel through which the ray is passing. (c) Our vectorized Siddon’s method generates a DRR in on an NVIDIA GeForce RTX 2080 Ti.

2.1 DRR Generation

The process of generating a DRR models the geometry of an idealized projectional radiography system (Fig. 1a). Let be the X-ray source, and be a target pixel on the detector plane. Then is a ray that originates from (), passes through the imaged volume, and hits the detector plane at (). The total energy attenuation experienced by the X-ray by the time it reaches pixel is given by the following line integral:

(1)

where is the imaged volume. The term endows the unit-free with the physical unit of length. For DRR synthesis, is approximated by a discrete 3D CT volume, and Eq. (1) becomes

(2)

where parameterizes the locations where ray intersects one of the orthogonal planes comprising the CT volume, and is the number of such intersections (Fig. 1b). Note that this model does not account for patterns of reflection and scattering that are present in real X-ray systems. While these simplifications preclude synthesis of realistic X-rays, the model in Eq. (2) has been widely and successfully used in slice-to-volume registration [13]. Additionally, our approach of vectorizing DRR generation might also be interoperable with more sophisticated image synthesis models, an extension we examine further in the Discussion.

2.2 Siddon’s Method and Its GPU Extensions

Siddon’s method [10] provides a parametric method to identify the plane intersections . Let be the CT voxel size in the -direction and be the location of the -th plane in this direction. Then the intersection of ray with the -th plane in the -direction is given by

(3)

with analogous expressions for and . We can use Eq. (3) to compute the values for all the intersections between and the planes in the -direction:

where and denote the first and last intersections of with the -direction planes. Defining and analogously, we construct the array

(4)

which contains values of parameterizing the intersections between and the orthogonal -, -, and -directional planes. We substitute values in the sorted set into Eq. (2) to evaluate , which corresponds to the intensity of pixel in the synthesized DRR.

A faster variant determines consecutive intersecting planes iteratively [4]. For example, the value of at the second plane intersected by is given by . The algorithm iteratively finds the next value of until we reach the edge of the CT volume, making this approach more memory efficient by requiring fewer intermediate values to be stored. This modified algorithm, known as Siddon-Jacobs’ method, is commonly implemented in CUDA and C++ to create multi-threaded GPU-accelerated DRR generators that exploit data parallelism by assigning each thread to trace an independent ray intersecting the detector plane [1, 5, 7].

2.3 Vectorizing Siddon’s Method

While Siddon-Jacobs’ method is more memory efficient, the iterative loop it relies on is not amenable to implementations in vectorized tensor algebra libraries. Thus we vectorize the original Siddon’s method as follows. Let contain the 3D pixel coordinates of a DRR with dimension . We compute the values for intersections with all of the -, -, and -planes for all in parallel:

(5)

where are the dimensions of the CT volume , are the CT voxel indices, are the CT voxel sizes, and and are the Hadamard product and division operators, respectively. Rather than explicitly compute the indices , , and for each ray, as is done in Siddon’s original method, we instead compute the minimum and maximum values of , corresponding to when each ray enters and exits the volume:

where . We filter to include only values in the range and sort each row for . Finally, we evaluate Eq. (2) with this sorted tensor to compute the intensity for each pixel in the DRR, completing a chain of vectorized tensor operations.

Because we reformulated the original Siddon’s method as a series of tensor operations, our vectorized version benefits from the mature GPU compilers and memory allocators developed for optimizing large-scale deep learning models. For empirical evaluation of our method, we also implemented a partially-vectorized version of Siddon-Jacobs’ method in which the updates are still computed iteratively (i.e., with a loop), but the updates are applied in a vectorized form to every target pixel in the detector plane.

2.4 Differentiating DRRs with Respect to Imaging Parameters

We specify the 3D position of the X-ray source and detector plane relative to the CT volume with the following seven geometric parameters: radius that acts as a scaling factor; three rotational degrees of freedom (DoF) ; and three translational DoF . Using spherical coordinates, we express the position of the X-ray source as , where is half of the source-to-detector distance, and and are the azimuthal and polar angles, respectively. We assume the detector plane is tangent to this implied sphere at the point opposite . The orientation of this plane is determined by a rotation about the -axis by the angle . We add the translation to the coordinates of the X-ray source and detector plane to create a reference frame wherein the patient is not perfectly centered relative to the X-ray scanner.

Since every step of our pipeline, from the generation of the pixels on the detector plane to the computation of the pixel intensities, is performed in PyTorch’s tensor framework, the resulting DRRs are differentiable with respect to the parameters described above. That is, the gradient of the loss function evaluated for the DRR with respect to the parameters is obtained automatically for any differentiable .

3 Experiments

We evaluate the proposed efficient implementation of the algorithm on a reference chest CT scan from Slicer3D [6]. We compare the performance of the method to two baseline approaches and illustrate its application for gradient-based slice-to-volume registration.

3.1 Performance Analysis

We compare our vectorized GPU version of Siddon’s method (VGS) to two baseline approaches: a widely used CPU implementation in the Plastimatch package (CP) [9], and our vectorized GPU implementation of Siddon-Jacobs’ method (VGSJ) which we described in Section 2.3. Note that the CUDA-accelerated DRR generator in Plastimatch is not working at the time of publication. GPU benchmarks are run on an NVIDIA GeForce RTX 2080 Ti, and CPU benchmarks are run on an 18-core Linux computer with Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz processors.

3.1.1 Results.

Table 1 summarizes statistics for the run time and accuracy of our method and the two baseline approaches, as well as intensity differences between our method and Plastimatch, and gradient differences between our method and FFD. Our implementation is much faster than Plastimatch, which is to be expected as Plastimatch is executed on the CPU. Numerically, the two implementations are very similar with an average root-mean-square error (RMSE) of (8.31.9) where the images are normalized to the range of . For DRRs smaller than , our method is faster than the vectorized version of Siddon-Jacobs’ despite our method’s high memory requirements. However, at , they have roughly equivalent run times. For smaller image sizes, our DRR generator achieves equivalent run times to previously reported GPU-accelerated implementations [1, 5, 7]. The gradients obtained via PyTorch auto-differentiation for our method are within 0.050.01 of those computed via forward finite differences with a step size of and are an order of magnitude more efficient to generate (35.1 ms ± 73.3 µs vs 400.5 ms ± 821.4 µs).

Timing (ms) RMSE Autograd vs FFD
VGS VGSJ CP VGS vs CP VGS
100 17.6 0.05 380 20.9 1028 132 (6.92.2) 0.03 ± 0.02
200 72.7 0.01 424 4.2 1784 488 (8.72.7) 0.06 ± 0.01
300 165 0.13 432 19.2 2941 821 (6.41.2) 0.08 ± 0.02
400 296 0.06 425 2.8 6472 643 (9.01.9) 0.03 ± 0.005
500 453 41.2 425 4.6 8472 478 (11.74.3) 0.07 ± 0.006
Table 1: Benchmark results. The dimension of the DRRs is . Each metric is averaged over runs. (VGS = Vectorized GPU Siddon’s method, VGSJ = Vectorized GPU Siddon-Jacobs’ method, and CP = CPU Plastimatch.)

3.2 DRR-based Gradient Descent for Slice-to-Volume Registration

We use our auto-differentiable DRR generator to implement slice-to-volume registration with synthetic DRRs. Specifically, we generate a fixed DRR from a set of ground truth parameters , and generate a second moving DRR from a set of random initial parameters . We use basic gradient descent to minimize the negative Zero-Normalized Cross-Correlation (ZNCC) between the fixed DRR and the moving DRR.

3.2.1 Image Similarity Metrics are Locally Convex.

First, we conduct a simulation study to show that the loss landscape generated by negative ZNCC is convex in a neighborhood around . We generate moving DRRs by sampling rotational and positional displacements. We sample all parameters uniformly from ranges of for and , for , and for , , and around the ground truth parameters . Although our generator provides gradients with respect to other model parameters like the source-to-detector distance () and the DRR’s dimensions and pixel spacing (), we assume that those parameters are fixed (e.g., provided in the DICOM header), and instead focus our analysis on this 6 DoF registration problem. We observe that negative ZNCC is locally convex (Fig. 2), suggesting that it would be an apt loss function to optimize with gradient descent. We observed similar loss landscapes for the norm.

 Rotational and positional displacements were sampled uniformly from ranges of
Figure 2: Negative ZNCC is convex around the optimal DRR parameters. Rotational and positional displacements were sampled uniformly from ranges of for and , for , and for , , and .
 We generated a moving DRR from randomly initialized parameters and used gradient descent to maximize similarity with a fixed DRR. Convergence was achieved for 745/1000 simulated DRRs in an average 66 iterations (
Figure 3: Differentiable DRRs can be used to perform slice-to-volume registration. We generated a moving DRR from randomly initialized parameters and used gradient descent to maximize similarity with a fixed DRR. Convergence was achieved for 745/1000 simulated DRRs in an average 66 iterations (). Examples of the optimization process are visualized at initial, intermediate (the 20th and 40th iterations), and final steps. DRRs for which convergence fails to occur get stuck in a local minimum with low negative ZNCC.

3.2.2 Differentiable DRR Registration Converges Quickly.

Given a fixed DRR and a moving DRR, we optimized the parameters of the moving DRR with a basic implementation of gradient descent. We used different update rates for the rotational and translational parameters because they have different units ( and ), and momentum . Additionally, to investigate the local minima in which our gradient descent algorithm could get stuck, we expanded the space of possible initializations beyond a convex neighborhood to for , , and , and for , , and . The size of the CT volume is .

For each randomly initialized DRR, we ran 250 iterations of gradient descent and determined that the moving DRR had converged to the fixed DRR if the negative ZNCC between the two images was less than . If this did not occur within 250 iterations, we treated the run as having failed to converge. Of the 1,000 randomly initialized DRRs we generated, 745 converged and 255 failed to converge (Fig. 3a). The initializations that converged solved the 6 DoF slice-to-volume registration problem iterations ( ).

We visualize multiple optimization steps for our gradient descent registration algorithm (Fig. 3c). From the examples that converged, our model successfully recovers the true pose parameters from challenging initializations, reaching a more reasonable estimate by the intermediate 20th iteration (Fig. 3b). In one example of an initialization that failed to converge, we see that the model gets stuck in a local minimum (Fig. 3c). In the final iteration in this example, the estimated DRR is orthogonal to the sagittal plane. The geometry of this scene looks similar to the coronal DRR (ground truth), giving the illusion of two lungs. We emphasize that our goal is not to propose a novel registration algorithm, but to provide an efficient DRR synthesis procedure that can support numerous downstream optimization applications, including registration.

4 Discussion

We present a fast auto-differentiable DRR generator that can be used to solve inverse problems in intraoperative imaging. By reformulating Siddon’s method for ray-tracing through a CT volume as series of vectorized tensor operations, we obtain gradients of image loss functions with respect to DRR generating parameters while achieving rendering speeds equivalent to multi-threaded GPU-accelerated generators written in low-level languages such as CUDA and C++. This approach promises to enable fast solutions to DRR-based optimization problems with gradient methods, a strategy which was previously infeasible due to the inefficiency of generating gradients with finite differences. We demonstrate the effectiveness of this approach by solving a 6 DoF slice-to-volume registration problem using a locally convex image loss function.

Our future research on auto-differentiable DRRs will investigate their use in a deep learning framework to pre-train for specific intraoperative imaging tasks (e.g., spatiotemporal registration in interventional cardiology). Such pre-training could yield even faster solutions to inverse problems by finding better initializations or powering end-to-end models. One issue that might limit the effectiveness of such pre-training is that DRRs generated using Siddon’s method are not realistic because they do not model any form of scattering. We will investigate fusing our method with DeepDRR [12], a deep learning framework that estimates scattering effects using Monte Carlo simulations, to produce DRRs that are simultaneously realistic, fast, and differentiable.


Acknowledgements. The authors thank Ruby Liu and Allie Forman for helpful feedback and support. This work was supported, in part, by NIH NIBIB 5T32EB1680, NIH NIBIB NAC P41EB015902, and NIH NINDS U19NS115388.

References

  • [1] M. De Greef, J. Crezee, J. Van Eijk, R. Pool, and A. Bel (2009) Accelerated ray tracing for radiotherapy dose calculations on a gpu. Medical physics 36 (9Part1), pp. 4095–4102. Cited by: §1, §2.2, §3.1.1.
  • [2] J. Esteban, M. Grimm, M. Unberath, G. Zahnd, and N. Navab (2019) Towards fully automatic x-ray to ct registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 631–639. Cited by: §1.
  • [3] B. Hou, A. Alansary, S. McDonagh, A. Davidson, M. Rutherford, J. V. Hajnal, D. Rueckert, B. Glocker, and B. Kainz (2017) Predicting slice-to-volume transformation in presence of arbitrary subject motion. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 296–304. Cited by: §1.
  • [4] F. Jacobs, E. Sundermann, B. De Sutter, M. Christiaens, and I. Lemahieu (1998) A fast algorithm to calculate the exact radiological path through a pixel or voxel space. Journal of computing and information technology 6 (1), pp. 89–94. Cited by: §2.2.
  • [5] S. Mori, M. Kobayashi, M. Kumagai, and S. Minohara (2009) Development of a gpu-based multithreaded software application to calculate digitally reconstructed radiographs for radiotherapy. Radiological physics and technology 2 (1), pp. 40–45. Cited by: §1, §2.2, §3.1.1.
  • [6] S. Pieper, M. Halle, and R. Kikinis (2004) 3D slicer. In 2004 2nd IEEE international symposium on biomedical imaging: nano to macro, pp. 632–635. Cited by: §3.
  • [7] D. Ruijters, B. M. ter Haar Romeny, and P. Suetens (2008) GPU-accelerated digitally reconstructed radiographs. BioMED 8, pp. 431–435. Cited by: §1, §2.2, §3.1.1.
  • [8] K. Sadamatsu and Y. Nakano (2016) The effect of low frame rate fluoroscopy on the x-ray dose during coronary intervention.. Internal medicine 55 15, pp. 1943–6. Cited by: §1.
  • [9] G. C. Sharp, R. Li, J. Wolfgang, G. Chen, M. Peroni, M. F. Spadea, S. Mori, J. Zhang, J. Shackleford, and N. Kandasamy (2010) Plastimatch: an open source software suite for radiotherapy image processing. In Proceedings of the XVI’th International Conference on the use of Computers in Radiotherapy (ICCR), Cited by: §1, §3.1.
  • [10] R. L. Siddon (1985) Fast calculation of the exact radiological path for a three-dimensional ct array. Medical physics 12 (2), pp. 252–255. Cited by: §2.2, §2.
  • [11] S. Singer and J. Nelder (2009) Nelder-mead algorithm. Scholarpedia 4 (7), pp. 2928. Cited by: §1.
  • [12] M. Unberath, J. Zaech, S. C. Lee, B. Bier, J. Fotouhi, M. Armand, and N. Navab (2018) DeepDRR–a catalyst for machine learning in fluoroscopy-guided procedures. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 98–106. Cited by: §4.
  • [13] I. Van Der Bom, S. Klein, M. Staring, R. Homan, L. W. Bartels, and J. P. Pluim (2011) Evaluation of optimization methods for intensity-based 2d-3d registration in x-ray guided interventions. In Medical Imaging 2011: Image Processing, Vol. 7962, pp. 657–671. Cited by: §1, §2.1.