Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-03-28T16:35:36.781Z Has data issue: false hasContentIssue false

An introduction to continuous optimization for imaging

Published online by Cambridge University Press:  23 May 2016

Antonin Chambolle
Affiliation:
CMAP, Ecole Polytechnique, CNRS, France E-mail: antonin.chambolle@cmap.polytechnique.fr
Thomas Pock
Affiliation:
ICG, Graz University of Technology, AIT, Austria E-mail: pock@icg.tugraz.at
Rights & Permissions [Opens in a new window]

Abstract

A large number of imaging problems reduce to the optimization of a cost function, with typical structural properties. The aim of this paper is to describe the state of the art in continuous optimization methods for such problems, and present the most successful approaches and their interconnections. We place particular emphasis on optimal first-order schemes that can deal with typical non-smooth and large-scale objective functions used in imaging problems. We illustrate and compare the different algorithms using classical non-smooth problems in imaging, such as denoising and deblurring. Moreover, we present applications of the algorithms to more advanced problems, such as magnetic resonance imaging, multilabel image segmentation, optical flow estimation, stereo matching, and classification.

Information

Type
Research Article
Copyright
© Cambridge University Press, 2016 
Figure 0

Figure 2.1. Total variation based image denoising. (a) Original input image, and (b) noisy image containing additive Gaussian noise with standard deviation ${\it\sigma}=0.1$. (c) Denoised image obtained by minimizing the ROF model using ${\it\lambda}=0.1$.

Figure 1

Figure 2.2. An image deblurring problem. (a) Original image, and (b) blurry and noisy image (Gaussian noise with standard deviation ${\it\sigma}=0.01$) together with the known blur kernel. (c, d) Image deblurring without (${\it\lambda}=0$) and with (${\it\lambda}=5\times 10^{-4}$) total variation regularization. Observe the noise amplification when there is no regularization.

Figure 2

Figure 2.3. Denoising an image containing salt-and-pepper noise. (a) Original image, and (b) noisy image that has been degraded by adding $20\%$ salt-and-pepper noise. (c) Denoised image obtained from the TV-$\ell _{1}$ model, and (d) result obtained from the ROF model.

Figure 3

Figure 4.1. Comparison of accelerated and non-accelerated gradient schemes. (a) Comparisons of the solutions $x$ of GD and AGD after $10\,000$(!) iterations. (b) Rate of convergence for GD and AGD together with their theoretical worst-case rates, and the lower bound for smooth optimization. For comparison we also provide the rate of convergence for CG. Note that CG exactly touches the lower bound at $k=99$.

Figure 4

Figure 4.2. Minimizing the primal ROF model using smoothed (Huber) total variation applied to the image in Figure 2.1. The figure shows the convergence of the primal–dual gap using plain gradient descent for different settings of the smoothing parameter ${\it\varepsilon}$.

Figure 5

Figure 4.3. Comparison of different gradient-based methods applied to Moreau–Yosida regularization of the dual ROF model using the image in Figure 2.1. Accelerated gradient descent (AGD) and the quasi-Newton method (l-BFGS) are significantly faster than plain gradient descent (GD).

Figure 6

Figure 4.4. Convergence of accelerated proximal gradient descent methods for minimizing the dual Huber-ROF model using the image in Figure 2.1. Using the correct modulus of strong convexity (${\it\mu}={\it\varepsilon}/{\it\lambda}$), the FISTA algorithm performs much better than the FISTA algorithm, which does not take into account the correct value of ${\it\mu}$. Interestingly, a tuned proximal heavy ball (HB) algorithm that uses the correct value of ${\it\mu}$ clearly outperforms FISTA and seems to coincide with the lower bound of first-order methods.

Figure 7

Figure 4.5. Minimizing the dual ROF model applied to the image in Figure 2.1. This experiment shows that an accelerated proximal block descent algorithm (FISTA-chains) that exactly solves the ROF problem on horizontal and vertical chains significantly outperforms a standard accelerated proximal gradient descent (FISTA) implementation. (a) Comparison based on iterations, (b) comparison based on the CPU time.

Figure 8

Figure 5.1. Minimizing the ROF model applied to the image in Figure 2.1. This experiment shows that the accelerated primal–dual method with optimal dynamic step sizes (aPDHG) is significantly faster than a primal–dual algorithm that uses fixed step sizes (PDGH). For comparison we also show the performance of accelerated proximal gradient descent (FISTA).

Figure 9

Figure 5.2. Minimizing the TV-deblurring problem applied to the image in Figure 2.2. We compare the performance of a primal–dual algorithm with explicit gradient steps (PD-explicit) and a primal–dual algorithm that uses a full splitting of the objective function (PD-split). PD-explicit seems to perform slightly better at the beginning, but PD-split performs better for higher accuracy.

Figure 10

Figure 5.3. Minimizing the TV-$\ell _{1}$ model applied to the image in Figure 2.3. The plot shows a comparison of the convergence of the primal gap between the primal–dual (PDHG) algorithm and the forward–backward–forward (FBF) algorithm. PDHG and FBF perform almost equally well, but FBF requires twice as many evaluations of the linear operator. We also show the performance of a plain subgradient method (SGM) in order to demonstrate the clear advantage of PDHG and FBF exploiting the structure of the problem.

Figure 11

Figure 5.4. Comparison of ADMM and accelerated ADMM (aADMM) for solving the ROF model applied to the image in Figure 2.1. For comparison we also plot the convergence of the accelerated primal–dual algorithm (aPDHG). The ADMM methods are fast, especially at the beginning.

Figure 12

Figure 5.5. Solving the image deblurring problem from Example 2.2. (a) Problem (2.7) after 150 iterations of Douglas–Rachford (DR) splitting. (b) Huber variant after 150 iterations with accelerated DR splitting. The figure shows that after the same number of iterations, the accelerated algorithm yields a higher PSNR value.

Figure 13

Figure 6.1. Image deblurring using a non-convex variant of the total variation. The plot shows the convergence of the primal energy for the non-convex TV model using ADMM and iPiano. In order to improve the presentation in the plot, we have subtracted a strict lower bound from the primal energy. ADMM is faster at the beginning but iPiano finds a slightly lower energy.

Figure 14

Figure 6.2. Image deblurring using non-convex functions after $150$ iterations. (a, b) Results of the non-convex TV-deblurring energy obtained from ADMM and iPiano. (c) Result obtained from the non-convex learned energy, and (d) convolution filters $D_{k}$ sorted by their corresponding ${\it\lambda}_{k}$ value (in descending order) used in the non-convex learned model. Observe that the learned non-convex model leads to a significantly better PSNR value.

Figure 15

Figure 7.1. Contrast invariance of the TV-$\ell _{1}$ model. (a–d) Result of the TV-$\ell _{1}$ model for varying values of the regularization parameter ${\it\lambda}$. (e–h) Result of the ROF model for varying values of ${\it\lambda}$. Observe the morphological property of the TV-$\ell _{1}$ model. Structures are removed only with respect to their size, but independent of their contrast.

Figure 16

Figure 7.2. Total variation based image denoising in the presence of Poisson noise. (a) Aerial view of Graz, Austria, (b) noisy image degraded by Poisson noise. (c) Result using the ROF model, and (d) result using the TV-entropy model. One can see that the TV-entropy model leads to improved results, especially in dark regions, and exhibits better contrast.

Figure 17

Figure 7.3. Comparison of TV  and TGV$^{2}$ denoising. (a) Original input image, and (b) noisy image, where we have added Gaussian noise with standard deviation ${\it\sigma}=0.1$. (c) Result obtained from the ROF model, and (d) result obtained by minimizing the TGV$^{2}$ model. The main advantage of the TGV$^{2}$ model over the ROF model is that it is better at reconstructing smooth regions while still preserving sharp discontinuities.

Figure 18

Figure 7.4. Denoising a colour image using the vectorial ROF model. (a) Original RGB colour image, and (b) its noisy variant, where Gaussian noise with standard deviation ${\it\sigma}=0.1$ has been added. (c) Solution of the vectorial ROF model using the Frobenius norm, and (d) solution using the nuclear norm. In smooth regions the two variants lead to similar results, while in textured regions the nuclear norm leads to significantly better preservation of small details (see the close-up views).

Figure 19

Figure 7.5. TV regularized reconstruction of one slice of an MRI of a knee from partial Fourier data. (a) Least-squares reconstruction without using total variation regularization, and (b) reconstruction obtained from the total variation regularized reconstruction model.

Figure 20

Figure 7.6. Optical flow estimation using total variation. (a) A blending of the two input images. (b) A colour coding of the computed velocity field. The colour coding of the velocity field is shown in the upper left corner of the image.

Figure 21

Figure 7.7. Image inpainting using shearlet regularization. (a) Original image, and (b) input image with a randomly chosen fraction of $10\%$ of the image pixels. (c) Reconstruction using TV  regularization, and (d) reconstruction using the shearlet model. Observe that the shearlet-based model leads to significantly better reconstruction of small-scale and elongated structures.

Figure 22

Figure 7.8. Piecewise smooth approximation using the Mumford–Shah functional. (a) Original image $u^{\diamond }$, and (b) piecewise smooth approximation $u$ extracted from the convex relaxation. (c) Three-dimensional rendering of the subgraph of the relaxed function $v$ which approximates the subgraph $\mathbf{1}_{u}$ of the image $u$. Note the tendency of the Mumford–Shah functional to produce smooth regions terminated by sharp discontinuities.

Figure 23

Figure 7.9. Computing a globally optimal solution of a large-scale stereo problem using the calibration method. (a) Left input image showing the region around the Freiheitsplatz in the city of Graz. (b) Computed disparity map, where the intensity is proportional to the height above the ground. Black pixels indicate occluded pixels that have been determined by a left–right consistency check.

Figure 24

Figure 7.10. Interactive image segmentation using the continuous two-label image segmentation model. (a) Input image overlaid with the initial segmentation provided by the user. (b) The weighting function $w$, computed using the negative $\log$-ratio of two Gaussian mixture models fitted to the initial segments. (c) Binary solution of the segmentation problem, and (d) the result of performing background removal.

Figure 25

Figure 7.11. Demonstration of the quality using different relaxations. (a) Input image, where the task is to compute a partition of the grey zone in the middle of the image using the three colours as boundary constraints. (b) Colour-coded solution using the simple relaxation $C_{{\mathcal{P}}_{1}}$, and (c) result using the stronger relaxation $C_{{\mathcal{P}}_{2}}$. Observe that the stronger relaxation exactly recovers the true solution, which is a triple junction.

Figure 26

Figure 7.12. Interactive image segmentation using the multilabel Potts model. (a) Input image overlaid with the initial segmentation provided by the user. (b) Final segmentation, where the colour values correspond to the average colours of the segments. (c–f) The corresponding phases $u_{k}$. Observe that the phases are close to binary and hence the algorithm was able to find an almost optimal solution.

Figure 27

Figure 7.13. A $16$-neighbourhood system on the grid. The black dots refer to the grid points $x_{i,j}$, the shaded squares represent the image pixels ${\rm\Omega}_{i,j}$, and the line segments $l_{i,j,k}$ connecting the grid points are depicted by thick lines.

Figure 28

Figure 7.14. Comparison of TVX$^{0}$ (b–e) and TVX$^{1}$ (f–i) regularization for shape denoising. One can see that TVX$^{0}$ leads to a gradually simplified polygonal approximation of the shape in $F$ whereas TVX$^{1}$ leads to an approximation by piecewise smooth shapes.

Figure 29

Figure 7.15. Visualization of the measure ${\it\mu}$ in the roto-translation space for the image of Figure 7.14(e), obtained using TVX$^{0}$ regularization. Observe that the measure ${\it\mu}$ indeed concentrates on thin lines in this space.

Figure 30

Figure 7.16. Image inpainting using TVX$^{1}$ regularization. (a,c,e) Input image with $90\%$ missing pixels and recovered solutions. (b,d,f) Input image with $80\%$ missing lines and recovered solutions.

Figure 31

Figure 7.17. Denoising an image containing salt-and-pepper noise. (a) Noisy image degraded by $20\%$ salt-and-pepper noise. (b) Denoised image using TVX$^{1}$ regularization. Note the significant improvement over the result of the TV-$\ell _{1}$ model, shown in Figure 2.3.

Figure 32

Figure 7.18. Image denoising using a patch-based Lasso model. (a) Original image, and (b) its noisy variant, where additive Gaussian noise with standard deviation $0.1$ has been added. (c) Learned dictionary containing $81$ atoms with patch size $9\times 9$, and (d) final denoised image.

Figure 33

Figure 7.19. Image denoising using the convolutional Lasso model. (a) The $81$ convolution filters of size $9\times 9$ that have been learned on the original image. (b) Denoised image obtained by minimizing the convolutional Lasso model.

Figure 34

Figure 7.20. MNIST training images and dictionary.

Figure 35

Figure 7.21. MNIST classification results.

Figure 36

Figure 7.22. Inverting a convolutional neural network. (a) Original image used to compute the initial feature vector ${\it\phi}^{\diamond }$. (b) Image recovered from the non-linear deconvolution problem. Due to the high degree of invariances of the CNN with respect to scale and spatial position, the recovered image contains structures from the same object class, but the image looks very different.