Motivation, Theory, and Challenges
Schools of EAS, CSE, ECE — Georgia Institute of Technology
2026-03-30

Felix J. Herrmann
School of Earth and Ocean Sciences — Georgia Institute of Technology
Based on Virieux and Operto (2009), Virieux et al. (2017), Jones (2019)
The velocity model problem:
Migration can only construct reliable images of the subsurface if we have an accurate representation of the parameter variation within the Earth. (Jones 2019)
Consequently, much effort has gone into developing model-building tools to estimate these parameters — the most important of which is velocity.
The migration imaging condition. (a) Simple reflection from a flat event. (b) Downgoing source-side wavefield for two-way propagation. (c) Upcoming receiver-side wavefield. (d) Imaging condition forms the final image plus cross-talk terms. Adapted from Jones (2019).
Shallow section from the RTM image: (a) prior to filtering the backscattered noise; (b) after filtering of RTM angle gathers. From Jones (2019).
The low-frequency background “image condition” noise is visible above the critical angle — these are the rabbit-ear components of the FWI gradient.
Definition
Full waveform inversion (FWI) is a high-resolution seismic imaging technique that is based on using the entire content of seismic traces for extracting physical parameters of the medium sampled by seismic waves. (Virieux et al. 2017)
FWI provides the highest resolution of indirect osculating physical methods, compared with results from gravimetry, geomagnetism, or electromagnetism.
These two probing interactions have led to two strategies for imaging the medium: tomographic approaches and migration methods. (Virieux et al. 2017)
Extraction percentage of vertical model variation when considering the vertical component \(k_z\) of the wavenumber vector. Tomographic tools reconstruct the smooth part of the velocity structure, whereas migration tools reconstruct the contrasted part. There is an apparent insensitivity at intermediate values of wavenumbers. After Figure 1.4-3 of Claerbout (1985). From Virieux et al. (2017).
Common shot gather with different organized hierarchies in the time recording. Yellow phases express the reflection regime with time delays, whereas undulations visible on direct waves (red curves) and on hyperbolic branches of reflected phases (yellow curves) identify the transmission regime. From Virieux et al. (2017).
Seismic wave propagation through complex subsurface structures illustrating how diving waves interact with geological heterogeneities. From Sanford et al. (2023).
In our approach we do not use any linearization of the forward modeling — such as the linear Born approach — which is considered in migration approaches. (Virieux et al. 2017)
When we reconstruct our model by beginning with an initial model and revising it iteratively, nonlinearity is implicitly introduced by repeated exact forward modeling at each update stage.
Seismic wave propagation around subsurface igneous sill complexes, showing diving wave paths and their interaction with complex geological structures. From Sanford et al. (2023).
The FWI problem is defined as minimizing the least-squares misfit:
\[ \min_{\mathbf{m}} \mathcal{C}(\mathbf{m}) = \frac{1}{2} \int_0^{T_w} \int_{\partial\Omega_s} \int_{\partial\Omega_r} \left| \mathbf{d}_\text{syn}^\mathbf{m}(\mathbf{x}_r, \mathbf{x}_s, t) - \mathbf{d}_\text{obs}(\mathbf{x}_r, \mathbf{x}_s, t) \right|^2 \, \mathrm{d}x_r \, \mathrm{d}x_s \, \mathrm{d}t \tag{1}\]
where \(\mathbf{d}_\text{syn}^\mathbf{m}\) is the synthetic data computed in model \(\mathbf{m}\), \(\mathbf{d}_\text{obs}\) is the observed data, \(T_w\) is the recording time, \(\partial\Omega_s\) and \(\partial\Omega_r\) are the source and receiver domains. (Virieux et al. 2017, eq. 3)
This problem is solved through local nonlinear optimization. An iterative sequence \(\mathbf{m}_k(\mathbf{x})\) is built from an initial guess \(\mathbf{m}_\text{ini}(\mathbf{x})\):
\[ \mathbf{m}_{k+1} = \mathbf{m}_k + \alpha_k \Delta\mathbf{m}_k \tag{2}\]
where \(\alpha_k \in \mathbb{R}\) is computed through a linesearch or trust-region procedure. (Virieux et al. 2017, eq. 4)
For the Newton method, \(\alpha_k = 1\) and \(\Delta\mathbf{m}_k\) is the solution of:
\[ \mathcal{H}(\mathbf{m}_k) \Delta\mathbf{m}_k = -\nabla\mathcal{C}(\mathbf{m}_k) \tag{3}\]
where \(\mathcal{H}(\mathbf{m}_k)\) is the Hessian matrix and \(\nabla\mathcal{C}(\mathbf{m})\) is the gradient vector. (Virieux et al. 2017, eq. 5)
The FWI method can be expressed in terms of the Jacobian operator \(\mathcal{J}(\mathbf{m}) = \partial\mathbf{d}_\text{syn}/\partial\mathbf{m}\):
\[ \nabla\mathcal{C}(\mathbf{m}) = \mathcal{J}^T(\mathbf{m})(\mathbf{d}_\text{syn}(\mathbf{m}) - \mathbf{d}_\text{obs}) \]
\[ \mathcal{H}(\mathbf{m}) = \mathcal{J}^T(\mathbf{m})\mathcal{J}(\mathbf{m}) + \frac{\partial\mathcal{J}}{\partial\mathbf{m}}(\mathbf{d}_\text{syn}(\mathbf{m}) - \mathbf{d}_\text{obs}) \tag{4}\]
(Virieux et al. 2017, eq. 6)
Dropping the second term of the Hessian gives:
\[ \mathcal{H}_\text{GN}(\mathbf{m}) = \mathcal{J}^T(\mathbf{m})\mathcal{J}(\mathbf{m}) \tag{5}\]
The model update at iteration \(k\) becomes the least-squares solution:
\[ \mathcal{J}(\mathbf{m}_k) \Delta\mathbf{m}_k = -(\mathbf{d}_\text{syn}(\mathbf{m}_k) - \mathbf{d}_\text{obs}) \tag{6}\]
(Virieux et al. 2017, eqs. 7–9)
| Method | Convergence | Cost per iteration |
|---|---|---|
| Gradient descent | Slow (linear) | 1 forward + 1 adjoint |
| Conjugate gradient | Better | 1 forward + 1 adjoint |
| L-BFGS | Good | 1 forward + 1 adjoint |
| Gauss-Newton | Fast (quadratic) | Multiple solves |
| Newton | Fastest (quadratic) | Most expensive |
Wave equation in the time domain:
\[ \mathbf{M}(\mathbf{x}) \frac{\mathrm{d}^2\mathbf{u}(\mathbf{x},t)}{\mathrm{d}t^2} = \mathbf{A}(\mathbf{x})\mathbf{u}(\mathbf{x},t) + \mathbf{s}(\mathbf{x},t) \]
where \(\mathbf{M}\) and \(\mathbf{A}\) are the mass and stiffness matrices, \(\mathbf{s}\) is the source term, and \(\mathbf{u}\) is the seismic wavefield. (Virieux and Operto 2009, eq. 1)
Wave equation in the frequency domain:
\[ \mathbf{B}(\mathbf{x},\omega)\mathbf{u}(\mathbf{x},\omega) = \mathbf{s}(\mathbf{x},\omega) \tag{7}\]
where \(\mathbf{B}\) is the impedance matrix — a sparse complex-valued matrix. (Virieux and Operto 2009, eq. 2)
The relationship between the seismic wavefield and the parameters is nonlinear and can be written compactly as:
\[ \mathbf{u} = \mathbf{G}(\mathbf{m}) \tag{8}\]
(Virieux and Operto 2009, eq. 3)
The data-misfit vector \(\Delta\mathbf{d} = \mathbf{d}_\text{obs} - \mathbf{d}_\text{cal}(\mathbf{m})\) with \(\mathbf{d}_\text{cal}(\mathbf{m})=\mathcal{R}\mathbf{u}\) of dimension \(N\) captures the differences at receiver positions for each source-receiver pair. The misfit function is:
\[ \mathcal{C}(\mathbf{m}) = \frac{1}{2} \Delta\mathbf{d}^\dagger \Delta\mathbf{d} \tag{9}\]
where \(\dagger\) denotes the adjoint operator (transpose conjugate). (Virieux and Operto 2009, eq. 4)
Adjoint-state gradient:
The gradient can be computed without explicitly building the sensitivity matrix, using the adjoint-state method:
\[ \frac{\partial\mathcal{C}}{\partial\mathbf{m}}(\mathbf{x}) = \sum_s \int_0^{T_w} dt \, \boldsymbol{\lambda}^T(\mathbf{x},t) \frac{\partial\mathbf{M}}{\partial\mathbf{m}}(\mathbf{x}) \partial_t \mathbf{w}(\mathbf{x},t) \]
(Virieux et al. 2017, eq. 28)
The gradient is formed by the zero-lag crosscorrelation between the adjoint wavefield \(\boldsymbol{\lambda}\) and the time derivative of the incident wavefield \(\mathbf{w}\), multiplied by a local scattering operator.
Misfit function gradient showing the first Fresnel zone between a source and receiver in a homogeneous background. The central ellipse represents the first Fresnel zone. The outer fringes represent migration isochrones whose width decreases with depth according to the scattering angle.The dashed red lines delineate the first Fresnel zone and an isochrone surface. From Virieux et al. (2017).
The first Fresnel zone is the region where the phase time does not differ by more than half a period with respect to the direct arrival.
The scattering wavenumber vector \(\mathbf{k}\) linking the experimental setup to spatial resolution:
\[ \mathbf{k} = \frac{2f}{c_0} \cos\left(\frac{\theta}{2}\right) \mathbf{n} = \frac{4\pi}{\lambda} \cos\left(\frac{\theta}{2}\right) \mathbf{n} \tag{10}\]
where \(f\) is frequency, \(c_0\) is velocity, \(\theta\) is the diffraction/aperture angle, \(\lambda\) is wavelength, and \(\mathbf{n}\) is the unit vector. (Virieux et al. 2017, eq. 29; Virieux and Operto 2009, eq. 30)


Frequency-domain sensitivity kernel:
Wavepath. (a) Wavepath for a receiver located at a horizontal distance of 70 km. The frequency is 5 Hz and the velocity in the homogeneous background is 6 km/s. (b) Monochromatic Green’s function for a point source. From Virieux and Operto (2009).
Note
Key insights:
Initial model (left) and final FWI model (right) for the Valhall field. Horizontal sections at 175 m, 500 m, and 1000 m depth. Note the significant increase in resolution from left to right — paleochannels, glacier imprints, and gas cloud contours become visible. From Virieux et al. (2017).
Common receiver gather in (a) the initial model and (b) the final model. Red arrows identify first-arrival phases, and black arrows identify post-critical phases. Note the increasing extraction of information achieved by FWI. The source signal is shown as an insert and differs between the initial and final models. From Virieux et al. (2017).
Initial model in the left panels and final model in the right panels. Horizontal sections are shown at depths of 175 m, 500 m, and 1000 m. Note the significant increase of resolution from left to right. The paleoriver features are nicely visible in (d) but cannot be seen in (a), at a depth of 175 m. In (d), one can identify also the strong imprint of the acquisition. At a depth of 500 m, imprints on bedrock of former glaciers are focused in (e) but are not visible in (b). Finally, at a depth of 1000 m, very precise contours of the gas cloud are clear in (f) but are completely blurred in (c). Acquisition imprints decrease with depth but are still present. From Virieux et al. (2017)
FWI is an ill-posed problem — an infinite number of models can match the data. The misfit function can be augmented:
\[ \mathcal{C}(\mathbf{m}) = \frac{1}{2} \Delta\mathbf{d}^\dagger \mathbf{W}_d \Delta\mathbf{d} + \frac{1}{2} \varepsilon (\mathbf{m} - \mathbf{m}_\text{prior})^\dagger \mathbf{W}_m (\mathbf{m} - \mathbf{m}_\text{prior}) \tag{11}\]
where:
(Virieux and Operto 2009, eq. 17)
The problem:

The Born approximation requires that the starting model allows matching observed traveltimes with an error less than half the period (Beydoun and Tarantola 1988):
\[ \frac{\Delta t}{T_L} < \frac{1}{N_A} \]
where \(T_L\) is the simulation duration and \(N_A\) is the number of propagated wavelengths.
For 50 wavelengths, the traveltime error must be less than 1%!
Comparative study of objective functions to overcome noise and bandwidth limitations in FWI. From Tejero et al. (2015).
Characterising ice slabs in firn using seismic full waveform inversion. From Pearce et al. (2023).
Long-wavelength FWI updates in the presence of cycle skipping. From Ramos-Martínez et al. (2019).
(a) Time shift \(\Delta t\) between field data (yellow) and modelled data (blue). Centre: far-offset traces for various model errors. Bottom: residual traces. (b) Cost function for various time shifts — note the local minima on either side of the global minimum. From Jones (2019).



Sensitivity of FWI to the accuracy of different starting models for the synthetic Valhall model. (a) Close-up of the true model. (b–d) FWI models from different starting models. (e–g) Velocity profiles comparing true (black), starting (blue), and FWI models (red). From Virieux and Operto (2009).
Frequency continuation: The hierarchical multiscale strategy successively processes data subsets of increasing resolution power to incorporate progressively smaller wavenumbers in the tomographic models. (Virieux and Operto 2009)
Laplace-Fourier approach: Using complex-valued frequencies equivalent to exponential damping of seismograms:
\[ P\left(\omega + \frac{i}{\tau}\right) e^{i\omega t} = \int_{-\infty}^{+\infty} p(t) e^{-(t-t_0)/\tau} e^{i\omega t} \, dt \tag{12}\]
The Laplace-domain inversion provides a smooth velocity model as a starting model for subsequent Fourier-domain FWI. (Shin and Cha 2009; Virieux and Operto 2009)
Multiscale strategy for elastic FWI applied to the overthrust model. (a) FWI velocity models \(V_P\) (top) and \(V_S\) (bottom). (b) Well-log comparison. (c) Synthetic seismograms showing final residuals. From Virieux and Operto (2009).
Laplace-Fourier-domain waveform inversion for the BP benchmark model. (a) True model. (b) Starting model. (c) After Laplace-domain inversion. (d) After Laplace-Fourier-domain inversion. (e) After frequency-domain FWI. All structures are successfully imaged from a very crude starting model. From Virieux and Operto (2009).
While multiscale techniques mitigate some of the effects of cycle skipping they to not overcome the problem.
So, far we mainly looked at transmitted diving waves whose depth of penetration depends on
In the next lectures, we will exploit

Banana-doughnut kernels vs. rabbit ears:
The sensitivity kernels of acoustic FWI for one source-receiver pair: (a) without and (b) with a prior reflector. The banana-shaped kernel (blue) represents diving-wave contributions; the rabbit-ears-shaped kernel (yellow) represents reflected-wave contributions. \(\theta\) is the aperture angle. From Kim et al. (2022).
The overall gradient response is comprised of three constituent parts: (a) the direct arrival “banana” refraction path (low-\(k\) transmission), (b) the usual migration-response-like elliptical arc (high-\(k\) “migration” FWI), and (c) the “cross-talk” rabbit ears (low-\(k\) updates from reflections). Courtesy of Chao Wang, ION. From Jones (2019).

(a) PreSDM image with single shot gradient contribution superimposed. (b) After summing gradient contributions for all shots — blue indicates where to increase velocity, red where to decrease. From Jones (2019).
Key takeaways from industrial implementation (Jones 2019):
Two-stage approach:
The following slides illustrate a hands-on implementation of FWI using Devito, a domain-specific language for finite-difference computations. Based on tutorials from the SLIM Group.
Seismic survey setup.
Three key steps:
\[ \begin{cases} m \dfrac{d^2 u(\mathbf{x},t)}{dt^2} - \nabla^2 u(\mathbf{x},t) + \eta \dfrac{d u(\mathbf{x},t)}{dt} = q & \text{in } \Omega \\[6pt] u(\cdot,0) = 0, \quad \dfrac{du}{dt}\big|_{t=0} = 0 \end{cases} \]
where:
from devito import TimeFunction, Eq, solve, Operator
# Define wavefield and PDE
u = TimeFunction(name="u", grid=model.grid,
time_order=2, space_order=2)
pde = model.m * u.dt2 - u.laplace + model.damp * u.dt
# Create time-marching stencil: solve for u(t+dt)
stencil = Eq(u.forward, solve(pde, u.forward))
# Source injection and receiver interpolation
src_term = src.inject(field=u.forward,
expr=src * dt**2 / model.m)
rec_term = rec.interpolate(expr=u.forward)
# Create and run operator
op = Operator([stencil] + src_term + rec_term,
subs=model.spacing_map)
op(time=time_range.num-1, dt=model.critical_dt)Synthetic shot record from the two-layer model. The direct arrival and reflection from the interface are clearly visible.
The forward simulation can be written as:
\[ \mathbf{A}(\mathbf{m}) \, \mathbf{u} = \mathbf{q} \]
The adjoint (back-propagation) solves:
\[ \mathbf{A}(\mathbf{m})^T \, \mathbf{v} = \delta\mathbf{d} \]
where \(\delta\mathbf{d} = \mathbf{P}_r \mathbf{u} - \mathbf{d}_\text{obs}\) is the data residual.
. . .
\[ \text{Image} = \sum_{t=1}^{n_t} \mathbf{u}[t] \, \mathbf{v}[t] \]
The forward and adjoint wavefields meet at reflector positions at zero time offset.



def ImagingOperator(model, image):
v = TimeFunction(name='v', grid=model.grid,
time_order=2, space_order=4)
u = TimeFunction(name='u', grid=model.grid,
time_order=2, space_order=4,
save=geometry.nt)
# Adjoint wave equation (note negated damping)
eqn = model.m * v.dt2 - v.laplace + model.damp * v.dt.T
stencil = Eq(v.backward, solve(eqn, v.backward))
# Inject data residual at receiver locations
res_term = residual.inject(field=v.backward,
expr=residual * dt**2 / model.m)
# Imaging condition: correlate u and v
image_update = Eq(image, image - u * v)
return Operator([stencil] + res_term + [image_update],
subs=model.spacing_map)for i in range(nshots):
# Forward model with true velocity
true_d, _, _ = solver.forward(vp=model.vp)
# Forward model with smooth velocity
smooth_d, u0, _ = solver.forward(
vp=model0.vp, save=True)
# Back-propagate residual
residual = smooth_d.data - true_d.data
op_imaging(u=u0, v=v, vp=model0.vp,
residual=residual)
\[ \min_{\mathbf{m}} \; \Phi_s(\mathbf{m}) = \frac{1}{2} \left\lVert \mathbf{P}_r \mathbf{u} - \mathbf{d} \right\rVert_2^2, \quad \text{s.t.} \quad \mathbf{u} = \mathbf{A}(\mathbf{m})^{-1} \mathbf{P}_s^T \mathbf{q}_s \]
\[ \nabla \Phi_s(\mathbf{m}) = \sum_{t=1}^{n_t} \mathbf{u}[t] \, \mathbf{v}_{tt}[t] = \mathbf{J}^T \delta\mathbf{d}_s \]
where \(\mathbf{v}_{tt}\) is the second-order time derivative of the adjoint wavefield:
\[ \mathbf{A}^T(\mathbf{m}) \, \mathbf{v} = \mathbf{P}_r^T \, \delta\mathbf{d} \]
Note
The FWI gradient is the RTM imaging condition with an extra \(\partial_{tt}\) applied to the adjoint field.



Transmission geometry: sources on one side, receivers on the other — records mostly diving waves, which are essential for FWI convergence.
def fwi_gradient(vp_in):
grad = Function(name="grad", grid=model.grid)
objective = 0.
for i in range(nshots):
geometry.src_positions[0, :] = source_locations[i, :]
# Forward model with true velocity → observed data
_, _, _ = solver.forward(vp=model.vp, rec=d_obs)
# Forward model with current estimate → save wavefield
_, u0, _ = solver.forward(vp=vp_in, save=True, rec=d_syn)
# Compute residual and accumulate objective
compute_residual(residual, d_obs, d_syn)
objective += .5 * norm(residual)**2
# Back-propagate residual → gradient
solver.gradient(rec=residual, u=u0, vp=vp_in, grad=grad)
return objective, grad




Important
The gradient has the same sign as the true perturbation — confirming that gradient descent moves the model in the correct direction.
Stochastic optimization and computational efficiency:
Effective method for parameter estimation with PDE constraints (Haber, Chung, and Herrmann 2012)
Seismic waveform inversion by stochastic optimization (Leeuwen, Aravkin, and Herrmann 2011)
Robustness and scalability:
Robust inversion, dimensionality reduction, and randomized sampling (Herrmann, Erlangga, and Lin 2012)
3D frequency-domain seismic inversion with controlled sloppiness (Leeuwen and Herrmann 2014)
Common theme
All four works address the computational bottleneck of FWI by leveraging ideas from stochastic optimization, randomized linear algebra, and dimensionality reduction.
Key takeaways: