Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-03-28T04:20:01.795Z Has data issue: false hasContentIssue false

Solving inverse problems using data-driven models

Published online by Cambridge University Press:  14 June 2019

Simon Arridge
Affiliation:
Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK E-mail: S.Arridge@cs.ucl.ac.uk
Peter Maass
Affiliation:
Department of Mathematics, University of Bremen, Postfach 330 440, 28344 Bremen,Germany E-mail: pmaass@math.uni-bremen.de
Ozan Öktem
Affiliation:
Department of Mathematics, KTH – Royal Institute of Technology, SE-100 44 Stockholm, Sweden E-mail: ozan@kth.se
Carola-Bibiane Schönlieb
Affiliation:
Department of Applied Mathematics and Theoretical Physics, Cambridge University, Wilberforce Road,Cambridge, CB3 0WA,UK E-mail: C.B.Schoenlieb@damtp.cam.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Recent research in inverse problems seeks to develop a mathematically coherent foundation for combining data-driven models, and in particular those based on deep learning, with domain-specific knowledge contained in physical–analytical models. The focus is on solving ill-posed inverse problems that are at the core of many challenging applications in the natural sciences, medicine and life sciences, as well as in engineering and industrial applications. This survey paper aims to give an account of some of the main contributions in data-driven inverse problems.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s) 2019
Figure 0

Figure 4.1. Parameter optimality for TV denoising in Theorem 4.2. The non-convexity of the loss function, even for this one-parameter optimization problem, is clearly visible. Courtesy of Pan Liu.

Figure 1

Figure 4.2. Example of $g$ for $f_{\text{true}}=u_{5}$ and $10\%$ of noise.

Figure 2

Figure 4.3. Comparison of Tikhonov reconstructions and results obtained with DIP. Reconstructions are shown for different fixed values of $\unicode[STIX]{x1D706}$. The network was trained with the standard gradient descent method and a learning rate of $0.05$. In (a) $500$ epochs were used whereas in (b) $2000$ were used.

Figure 3

Figure 4.4. Reconstructions with an adaptive $\unicode[STIX]{x1D706}$ for different starting values $\unicode[STIX]{x1D706}_{0}$. The networks were trained with gradient descent using $0.1$ as learning rate. In all cases $3000$ epochs were used.

Figure 4

Figure 5.1. Learned iterative method in model parameter space. Illustration of the unrolled scheme in (5.11) for $N=2$ in the context of CT image reconstruction (Section 7.3.1). Each $\unicode[STIX]{x1D6E4}_{\unicode[STIX]{x1D703}_{1}}:X\rightarrow X$ is a CNN, $g\in X$ is the measured data, and $f^{0}$ is an initial image, usually taken as zero.

Figure 5

Figure 5.2. Learned iterative method in both model parameter and data spaces. Illustration of the operator obtained by unrolling the scheme in (5.14) for $N=3$ in the context of CT image reconstruction (Section 7.3.1).

Figure 6

Figure 6.1. Reconstructions of a five-point phantom (pixel size 1 mm) provided by Knopp et al. (2016) obtained using Tikhonov (with $\unicode[STIX]{x1D6FC}=0.1\times 10^{-6}$) and sparsity-promoting (with $\unicode[STIX]{x1D6FC}=0.1$) regularization with and without TLS. (a–d) Results from using a measured noisy forward operator. (e–h) Results from a knowledge-driven forward operator. Figure adapted from Kluth and Maass (2017).

Figure 7

Figure 7.1. The network design with eight parameters, a setting that yields a matrix–vector multiplication of the input.

Figure 8

Table 7.1. The errors of the inverse net with an ill-conditioned matrix $\mathbf{\mathsf{A}}_{\unicode[STIX]{x1D700}}$ (i.e. $\unicode[STIX]{x1D700}\!\ll \!1$) are large and the computed reconstructions with the test data are meaningless.

Figure 9

Figure 7.2. (a–c) Effect of choosing $\unicode[STIX]{x1D6FD}$ on TGV$^{2}$ denoising with optimal $\unicode[STIX]{x1D6FC}$. (d–f) Effect of choosing $\unicode[STIX]{x1D6FC}$ too large in TGV$^{2}$ denoising.

Figure 10

Figure 7.3. Contour plot of the objective functional in $\operatorname{TGV}^{2}$ denoising in the $(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})$-plane.

Figure 11

Figure 7.4. Optimal denoising results for $\operatorname{TGV}^{2}$, $\operatorname{ICTV}$ and TV, all with $L_{2}^{2}$ as data discrepancy.

Figure 12

Table 7.2. Quantified results for the parrot image ($s:=$ image width/height in pixels $\,=256$), using $L_{2}^{2}$ discrepancy.

Figure 13

Table 7.3. Cross-validated computations on the BSDS300 data set (Martin et al.2001) split into two halves of 100 images each. TV regularization with $L^{2}$-discrepancy and fidelity function. ‘Learning’ and ‘validation’ indicate the halves used for learning $\unicode[STIX]{x1D6FC}$ and for computing the average PSNR and SSIM, respectively. Noise variance $\unicode[STIX]{x1D70E}=10$.

Figure 14

Table 7.4. Cross-validated computations on the BSDS300 data set (Martin et al.2001) split into two halves of 100 images each. $\operatorname{TGV}^{2}$ regularization with $L^{2}$-discrepancy. ‘Learning’ and ‘validation’ indicate the halves used for learning $\unicode[STIX]{x1D6FC}$ and for computing the average PSNR and SSIM, respectively. Noise variance $\unicode[STIX]{x1D70E}=10$.

Figure 15

Figure 7.5. Optimized impulse-Gaussian denoising: (a) original image, (b) noisy image with Gaussian noise of variance $0.005$ and (c) with $5\%$ of pixels corrupted with impulse noise, (d) impulse noise residuum, (e) Gaussian noise residuum. Optimal parameters $\hat{\unicode[STIX]{x1D706}}_{1}=734.25$ and $\hat{\unicode[STIX]{x1D706}}_{2}=3401.2$.

Figure 16

Figure 7.6. Optimized Poisson–Gauss denoising: (a) original image, (b) noisy image corrupted by Poisson noise and Gaussian noise with mean zero and variance $0.001$, (c) denoised image. Optimal parameters $\hat{\unicode[STIX]{x1D706}}_{1}=1847.75$ and $\hat{\unicode[STIX]{x1D706}}_{2}=73.45$.

Figure 17

Figure 7.7. Example from supervised training data used to train the learned iterative and learned post-processing methods used in Figure 7.8.

Figure 18

Figure 7.8. Reconstructions of the Shepp–Logan phantom using different methods. The window is set to $[0.1,0.4]$, corresponding to the soft tissue of the modified Shepp–Logan phantom. We can see that the learned iterative method does indeed approximate the Bayes estimator, which here equals the conditional mean.

Figure 19

Figure 7.9. Reconstructions of a $512\times 512$ pixel human phantom along with two zoom-in regions indicated by small circles. The left zoom-in has a true feature whereas texture in the right zoom-in is uniform. The window is set to $[-200,200]$ Hounsfield units. Among the methods tested, only the learned iterative method (learned primal–dual algorithm) correctly recovers these regions. In the others, the true feature in the left zoom-in is indistinguishable from other false features of the same size/contrast, and the right-zoom in has a streak artefact. The improvement that comes with using a learned iterative method thus translates into true clinical usefulness.

Figure 20

Table 7.5. Summary of results shown in Figures 7.8 and 7.9 where an SSIM score of $1$ corresponds to a perfect match. Note that the learned iterative method (learned primal–dual algorithm) significantly outperforms TV regularization even when reconstructing the Shepp–Logan phantom. With respect to run-time, the learned iterative method involves calls to the forward operator, and is therefore slower than learned post-processing by a factor of ${\approx}6$. Compared with TV-regularized reconstruction, all learned methods are at least two orders of magnitude faster.

Figure 21

Figure 7.10. Reconstruction from real measurement data of a human palm, without adjustments of the training data. The images shown are top-down maximum intensity projections. (a) Result of the deep gradient descent (DGD) trained on images without added background. (b) TV reconstruction obtained from fully sampled data.

Figure 22

Figure 7.11. Example of real measurement data of a human palm. Volumetric images are shown using top-down maximum intensity projections. (a) Initialization from subsampled data, and (b) the DGD $G_{\hat{\unicode[STIX]{x1D703}}_{k}}$ after five iterations. (c) TV reconstruction of subsampled data with an emphasis on the data fit. (d) Reference TV reconstruction from fully sampled limited-view data. All TV reconstructions were computed with 20 iterations.

Figure 23

Table 7.6. CT reconstruction on the LIDC dataset using various methods. Note that the learned post-processing and RED methods require training on supervised data, while the adversarial regularizer only requires training on unsupervised data.

Figure 24

Figure 7.12. Exemplar CT reconstructions on the LIDC dataset under low-noise corruption. (a, b) Left to right: ground truth, FBP, TV, post-processing and adversarial regularization. (c,d) Data (CT sinograms): (c) data used for reconstructions in (a); (d) data used for reconstructions in (b).

Figure 25

Figure 7.13. MPI reconstructions of two phantoms using different methods: (a)–(d) phantom with 4 mm distance between tubes containing ferromagnetic nanoparticles; (e)–(h) phantom with 2 mm distance. The methods used are Kaczmarz with $L^{2}$-discrepancy ($\tilde{\unicode[STIX]{x1D706}}=5\times 10^{-4}$),$\ell _{1}$-regularization ($\tilde{\unicode[STIX]{x1D706}}=5\times 10^{-3}$) and DIP ($\unicode[STIX]{x1D702}=5\times 10^{-5}$) for both cases. Photos of phantoms taken by T. Kluth at the University Medical Center, Hamburg–Eppendorf.

Figure 26

Figure 7.14. Joint tomographic reconstruction and segmentation of grey matter. Images shown using a $[-100,100]$ HU window and segmentation using a $[0,1]$ window. The choice $C=0.9$ seems to be a good compromise for good reconstruction and segmentation, so clearly it helps to use a loss that includes the reconstruction and not only the task.

Figure 27

Figure 7.15. Test data: (a) subset of CT data from an ultra-low-dose three-dimensional helical scan and (b) the corresponding FBP reconstruction. Images are shown using a display window set to $[-150,200]$ Hounsfield units.

Figure 28

Figure 7.16. Conditional mean and pointwise standard deviation (pStd) computed from test data (Figure 7.15) using posterior sampling (Section 5.2.1) and direct estimation (Section 5.1.6).

Figure 29

Figure 7.17. (b) Suspected tumour (red) and reference region (blue) shown in the sample posterior mean image. (c) Average contrast differences between the tumour and reference region. The histogram is computed by posterior sampling applied to test data (Figure 7.15); the yellow curve is from direct estimation (Section 5.1.6), and the true value is the red threshold. (a) The normal dose image that confirms the presence of the feature.