From Forward Modeling to Regularized Inversion in Exploration Seismology
Felix J. Herrmann
School of Earth & Atmospheric Sciences — Georgia Institute of Technology
2026-02-16
Inverse Problems
From Forward Modeling to Regularized Inversion in Exploration Seismology
Felix J. Herrmann
School of Earth & Atmospheric Sciences — Georgia Institute of Technology
Outline
Part I — Foundations
Forward & inverse problems
Linear forward problem
Strict inverse for invertible systems
When the inverse does not exist
Part II — Least-Squares Inversion
Least-squares formulation
Normal equations
Non-uniqueness & minimum-norm solutions
Part III — SVD & Generalized Inverses
Singular Value Decomposition
Fundamental subspaces
Pseudo-inverse & four cases
Part IV — Regularization
Condition number & ill-posedness
Truncated SVD
Tikhonov regularization
Bayesian interpretation
Part V — Application
Deconvolution as an inverse problem
Numerical examples in Julia
Part I — Foundations
The Forward Problem
In exploration seismology, we relate model parameters\(\mathbf{m}\) (e.g., velocity, density, reflectivity) to observed data\(\mathbf{d}\) (e.g., seismograms) through a forward operator:
\[
\mathbf{d} = F(\mathbf{m})
\]
where \(F : \mathbb{R}^M \to \mathbb{R}^N\) maps from model space to data space.
Key distinction
The forward problem — given \(\mathbf{m}\), compute \(\mathbf{d}\) — is typically well-posed. The inverse problem — given \(\mathbf{d}\), recover \(\mathbf{m}\) — is typically ill-posed.
Forward Problem — Examples
Forward Operator
Model \(\mathbf{m}\)
Data \(\mathbf{d}\)
Wave equation
Velocity field
Seismograms
Born approximation
Reflectivity
linearized data
Convolution
Reflectivity series
Seismic trace
All share the common mathematical structure \(\mathbf{d} = F(\mathbf{m})\).
The Linear Forward Problem
When \(F\) is linear, we write
\[
\mathbf{d} = \mathbf{A}\,\mathbf{m}
\]
where \(\mathbf{A} \in \mathbb{R}^{N \times M}\) is the forward modeling matrix.
\(\mathbf{A}^\top\) is the transpose of the matrix \(\mathbf{A}\). \(\mathbf{A}^\top\mathbf{A}\) is the Gram matrix — symmetric positive semi-definite, and positive definite iff \(\mathbf{A}\) has full column rank.
Transpose and Adjoint
The adjoint (or transpose for real-valued matrices) \(\mathbf{A}^H\) satisfies:
Equivalently, \(\mathbf{A}(\mathbf{m}_1 - \mathbf{m}_2) = \mathbf{0}\), i.e., the null space\(\mathcal{N}(\mathbf{A})\) is nontrivial.
If \(\hat{\mathbf{m}}\) is any solution, then \(\hat{\mathbf{m}} + \mathbf{m}_0\) is also a solution for any \(\mathbf{m}_0 \in \mathcal{N}(\mathbf{A})\).
Important
Non-uniqueness is fundamental in seismology — limited acquisition geometry means parts of the model are invisible to the data.
Minimum-Norm Solution
Among all solutions, select the one with smallest norm:
\(\mathcal{R}(\mathbf{A})\) denotes the range (or column space) of \(\mathbf{A}\) — the set of all vectors that can be written as \(\mathbf{A}\mathbf{m}\) for some \(\mathbf{m}\).
\(\mathcal{N}(\mathbf{A})\) is the null space (or kernel) of \(\mathbf{A}\) — the set of all vectors that \(\mathbf{A}\) maps to zero:
These are the model components that produce no data whatsoever. In seismology, think of structure in the subsurface that your acquisition geometry simply cannot see — it’s there, but it generates no response at the receivers.
The notation parallels \(\mathcal{R}\): where \(\mathcal{R}(\mathbf{A})\) collects everything \(\mathbf{A}\)can produce, \(\mathcal{N}(\mathbf{A})\) collects everything \(\mathbf{A}\)annihilates.
Since \(\mathbf{A}\mathbf{m}_0 = \mathbf{0}\) by definition of the null space, the constraint \(\mathbf{A}\mathbf{m} = \mathbf{d}\) only sees the row-space component:
So the null-space component \(\mathbf{m}_0\) is invisible to the data — it’s a free parameter. Now look at what it does to the norm. Because the two components are orthogonal:
The first term is fixed (determined by the constraint), and the second term is non-negative. The minimum is achieved when \(\mathbf{m}_0 = \mathbf{0}\), which means the solution lives entirely in \(\mathcal{R}(\mathbf{A}^\top)\).
You can also see this directly from the solution formula \(\hat{\mathbf{m}} = \mathbf{A}^\top(\mathbf{A}\mathbf{A}^\top)^{-1}\mathbf{d}\). The leading \(\mathbf{A}^\top\) guarantees the result is a linear combination of rows of \(\mathbf{A}\), i.e., it lives in the row space by construction.
Part III — SVD & Generalized Inverses
Singular Value Decomposition (SVD)
Any matrix \(\mathbf{A} \in \mathbb{R}^{N \times M}\) admits the factorization:
This is an outer-product expansion — each term is a rank-1 matrix weighted by \(\sigma_i\).
Computational note
We rarely form \(\mathbf{A}\) explicitly in seismology. We work with \(\mathbf{A}\,\mathbf{v}\) (forward) and \(\mathbf{A}^\top\mathbf{u}\) (adjoint/migration) as matrix-free operations.
The Four Fundamental Subspaces
The SVD reveals the four fundamental subspaces of \(\mathbf{A}\) (Strang):
where \(\boldsymbol{\Sigma}^{\dagger}\) inverts the nonzero singular values: \(\sigma_i^{\dagger} = 1/\sigma_i\) for \(i = 1,\ldots,r\) and zero otherwise.
. . . \(\mathcal{N}(\mathbf{A})\) is the null space (or kernel) of \(\mathbf{A}\) — the set of all vectors that \(\mathbf{A}\) maps to zero:
These are the model components that produce no data whatsoever. In seismology, think of structure in the subsurface that your acquisition geometry simply cannot see — it’s there, but it generates no response at the receivers.
The notation parallels \(\mathcal{R}\): where \(\mathcal{R}(\mathbf{A})\) collects everything \(\mathbf{A}\)can produce, \(\mathcal{N}(\mathbf{A})\) collects everything \(\mathbf{A}\)annihilates. Some texts write \(\text{Ker}(\mathbf{A})\) instead (from the German Kern). Explicitly:
Defining \(\tilde{\mathbf{A}} = \begin{bmatrix} \mathbf{A} \\ \sqrt{\lambda}\,\mathbf{I} \end{bmatrix}\) and \(\tilde{\mathbf{d}} = \begin{bmatrix} \mathbf{d} \\ \mathbf{0} \end{bmatrix}\), verify that the normal equations \(\tilde{\mathbf{A}}^\top\tilde{\mathbf{A}}\,\mathbf{m} = \tilde{\mathbf{A}}^\top\tilde{\mathbf{d}}\) give: \(\left(\mathbf{A}^\top\mathbf{A} + \lambda\,\mathbf{I}\right)\mathbf{m} = \mathbf{A}^\top\mathbf{d}\).
Why this matters
This formulation turns regularized inversion into a standard overdetermined least-squares problem. Solvers like LSQR work directly on the augmented system — no need to form \(\mathbf{A}^\top\mathbf{A}\) explicitly. The damp parameter in LSQR is precisely \(\sqrt{\lambda}\).
Plot \(\|\hat{\mathbf{m}}_\lambda\|_2\) vs. \(\|\mathbf{A}\hat{\mathbf{m}}_\lambda - \mathbf{d}\|_2\) on a log-log scale as \(\lambda\) varies:
The corner of the L marks the optimal \(\lambda\) — the point of maximum curvature where the trade-off between fitting the data and controlling the solution norm is most favorable.
Bayesian Interpretation
Rewrite Tikhonov regularization as maximum a posteriori (MAP) estimation:
Model penalty (= \(\lambda\,\mathbf{L}^\top\mathbf{L}\))
\(\mathbf{C}_m^{-1/2}\)
Square root of model precision
Part V — Application
Deconvolution — The Setup
A sparse reflectivity \(r(t)\) convolved with a bandlimited Ricker wavelet \(w(t)\) produces the observed trace \(d(t)\). The wavelet spectrum \(|W(f)|\) shows the bandpass character — frequencies near zero and above ~60 Hz carry no signal, making inversion ill-posed.
Why Toeplitz? Because the system is time-invariant — shifting \(r\) by one sample shifts \(d\) by one sample. Each column of \(\mathbf{W}\) is the wavelet, shifted down by one row. This means every descending diagonal is constant, which is the defining property of a Toeplitz matrix.
Computational consequence
A Toeplitz matrix–vector product is a convolution and costs \(\mathcal{O}(N \log N)\) via the FFT, not \(\mathcal{O}(NM)\). This is what makes matrix-free seismic operators practical.
Deconvolution — Why It’s Ill-Posed
The wavelet \(w(t)\) is bandlimited, so:
\(\mathbf{W}\) has rapidly decaying singular values
Low and high-frequency reflectivity components are attenuated
usingJOLI, LinearAlgebra# Define a convolution operator (matrix-free)n =256w =ricker(n, 25.0) # Ricker wavelet, 25 HzW =joConvolve(n, w) # JOLI linear operator# W acts like a matrix but stores only the wavelet# W * r → forward (convolution)# W' * d → adjoint (correlation)
The adjoint W' is defined to satisfy \(\langle \mathbf{W}\mathbf{r}, \mathbf{d}\rangle = \langle \mathbf{r}, \mathbf{W}^\top\mathbf{d}\rangle\).