An Uncertainty-Aware Digital Shadow for Underground CO2 Storage Monitoring

Digital Twins for Physical Systems – Capstone Lecture

Felix J. Herrmann

Georgia Institute of Technology

2026-03-11

An Uncertainty-Aware Digital Shadow for Underground CO2 Storage Monitoring

Felix J. Herrmann

Computational Science and Engineering — Georgia Institute of Technology

Note

This slide deck is derived from Gahlot, Orozco, Yin, et al. (2025b).

Outline

  1. Motivation – Why Geological Carbon Storage needs Digital Twins
  2. Problem Formulation – Nonlinear state dynamics, observations & Bayesian filtering
  3. Methodology – Lifecycle of the Digital Shadow
    • Forecast, Training & Analysis steps
    • Summary statistics & Conditional Normalizing Flows
    • Linear vs. nonlinear prior-to-posterior mappings
  4. Algorithms – Neural data assimilation
  5. Setup & ValidationIn silico case study
  6. Multimodal Case Study – Seismic, wells & combined
  7. Results & Discussion
  8. Conclusions & Future Work

Part I: Motivation

The Need for Geological Carbon Storage

Acquisition setup with wells

The Challenge: Unknown Subsurface Properties

  • Fluid-flow properties (permeability) are poorly known
  • Displacement of brine by supercritical CO2 is intricate
  • Different permeability realizations \(\rightarrow\) vastly different plume outcomes
    • Left: plume stays within storage complex
    • Right: plume breaches containment due to unknown high-permeability streak
  • Both realizations consistent with baseline seismic survey!

Two vastly different plume outcomes for two permeability realizations consistent with the same baseline survey

Important

We cannot rely on reservoir simulations alone – we need data assimilation with time-lapse monitoring data!

Digital Twin for Enhanced Oil Recovery

From Digital Shadows to Digital Twins

IBM’s definition of a Digital Twin:

A digital twin is a virtual representation of an object or system designed to reflect a physical object accurately. It spans the object’s lifecycle, is updated from real-time data and uses simulation, machine learning and reasoning to help make decisions.

Digital Shadow (Asch 2022):

  • Captures temporal evolution of CO2 plume via unidirectional data integration
  • Combines: reservoir simulation + wave-equation inversion + ensemble Bayesian filtering + neural posterior density estimation
  • Foundation for future Digital Twin with control, reasoning & decision-making

Key innovations over standard EnKF:

  1. Stochastic treatment of permeability \(\boldsymbol{\kappa}\)
  2. No Gaussian/linearity assumptions
  3. Learned nonlinear prior-to-posterior mapping
  4. Integration of multimodal time-lapse data

Part II: Problem Formulation

State Dynamics Model

The time-varying state \(\mathbf{x}_k = \bigl(\mathbf{x}_k[S_{\text{CO}_2}],\, \mathbf{x}_k[\delta p]\bigr)\) evolves as:

\[ \mathbf{x}_{k} = \mathcal{M}_k\bigl(\mathbf{x}_{k-1}, \boldsymbol{\kappa}_k\bigr),\quad \boldsymbol{\kappa}_k\sim p(\boldsymbol{\kappa}) \quad \text{for}\quad k=1\cdots K \tag{1}\]

where:

Symbol Meaning
\(\mathcal{M}_k\) Dynamics operator (multiphase flow simulator)
\(\mathbf{x}_k[S_{\text{CO}_2}]\) CO2 saturation at timestep \(k\)
\(\mathbf{x}_k[\delta p]\) Pressure perturbation at timestep \(k\)
\(\boldsymbol{\kappa}\) Permeability field – stochastic nuisance parameter
\(p(\boldsymbol{\kappa})\) Prior distribution from baseline seismic (via WISE)

Note

The permeability \(\boldsymbol{\kappa}\) is drawn anew at each timestep, reflecting our epistemic uncertainty about reservoir properties.

Observation Model

Multimodal time-lapse observations:

\[ \mathbf{y}_k = \mathcal{H}_k(\mathbf{x}_k,\boldsymbol{\epsilon}_k, \mathbf{\bar{m}}_k),\quad \boldsymbol{\epsilon}_k \sim p(\boldsymbol{\epsilon}) \quad\text{for}\quad k=1,2,\cdots,K \tag{2}\]

Two types of observations:

1. Well measurements (direct)

  • Saturation \(S_{\text{CO}_2}\) at well locations
  • Pressure perturbation \(\delta p\) at well locations
  • Sparse but accurate

2. Surface seismic (indirect)

  • Active-source seismic surveys
  • Processed via reverse-time migration (RTM)
  • Broad spatial coverage
  • Subject to noise \(\boldsymbol{\epsilon}_k\)

The observation operator \(\mathcal{H}_k\) is nonlinear, ill-posed, and computationally expensive.

Nonlinear Bayesian Filtering

The goal: recursively compute \(p(\mathbf{x}_k\mid \mathbf{y}_{1:k})\).

Three steps (Asch 2022):

  1. Initialization: Choose prior \(p(\mathbf{x}_0)\)

  2. Prediction (Chapman-Kolmogorov): \[ p(\mathbf{x}_k\mid \mathbf{y}_{1:k-1}) = \mathbb{E}_{\mathbf{x}_{k-1}\sim p(\mathbf{x}_{k-1}\mid\mathbf{y}_{1:k-1})}\bigl[p(\mathbf{x}_{k}\mid\mathbf{x}_{k-1})\bigr] \]

  3. Analysis (Bayes’ rule): \[ p(\mathbf{x}_{k}|\mathbf{y}_{1:k}) \propto p(\mathbf{y}_{k}|\mathbf{x}_{k})\, p(\mathbf{x}_{k}|\mathbf{y}_{1:k-1}) \]

Challenges: No analytical solution for nonlinear, non-Gaussian dynamics + nuisance parameters + high-dimensional state + expensive operators.

Why Not Standard EnKF?

Ensemble Kalman Filter propagates \(M\) particles:

\[ \widehat{\mathbf{x}}^{(m)}_k = \mathbf{x}^{(m)}_k - \mathbb{C}\text{ov}_{\mathbf{x}_k, \mathbf{y}_k}\,\mathbb{C}\text{ov}^{-1}_{\mathbf{y}_k, \mathbf{y}_k}\bigl(\mathbf{y}_k^{(m)}-\mathbf{y}_k^{\text{obs}}\bigr) \]

Limitations:

  • Linear correction \(\rightarrow\) introduces bias for nonlinear problems
  • Accuracy plateaus with increasing ensemble size (Spantini et al. 2022)
  • Cannot handle non-Gaussian statistics
  • Unclear how to handle nuisance parameters (\(\boldsymbol{\kappa}\))

Our approach: Replace the linear mapping with a learned nonlinear prior-to-posterior mapping using Conditional Normalizing Flows (CNFs).

Part III: Methodology

Lifecycle of the Digital Shadow

Lifecycle of the Digital Shadow at timestep \(k=1\): Forecast \(\rightarrow\) Training \(\rightarrow\) Analysis

Lifecycle – Step by Step

At each timestep \(k\), the Digital Shadow cycles through:

1. Forecast Step

  • Time-advance posterior samples from \(k{-}1\): \[ \mathbf{x}_k^{(m)} \sim p(\mathbf{x}_k \mid \widehat{\mathbf{x}}_{k-1}^{(m)}) \]
  • Simulate observations: \[ \mathbf{y}_k^{(m)} \sim p(\mathbf{y}_k \mid \mathbf{x}_k^{(m)}) \]
  • Produces training ensemble \(\{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^M\)

2. Training Step

  • Train CNF \(f_{\boldsymbol{\phi}}\) on ensemble pairs

3. Analysis Step

  • Condition on field data \(\mathbf{y}_k^{\text{obs}}\)
  • Map prior samples to posterior: \[ \mathbf{x}_k^{(m)} \mapsto \widehat{\mathbf{x}}_k^{(m)} \sim p(\widehat{\mathbf{x}}_k \mid \mathbf{y}_{1:k}^{\text{obs}}) \]

Tip

The \(\widehat{\quad}\) symbol distinguishes analyzed states (conditioned on field data) from predicted (digital) states.

Imaging-Based Summary Statistics

Challenge: Raw seismic data is high-dimensional, non-uniform, and the mapping to CO2 states is complex.

Solution: Physics-based summary statistics via Reverse-Time Migration (RTM) (Orozco et al. 2023):

\[ \mathbf{y}_k = \Bigl(\mathbf{J}_{\mathcal{F}[\mathbf{v}]}\Bigr|_{\widetilde{\mathbf{v}}_k}\Bigr)_{\text{ic}}^{\top} \Bigl(\mathcal{F}[\mathbf{\bar{m}}_k]-\mathbf{d}_k\Bigr) \]

where:

  • \(\mathcal{F}[\mathbf{m}]\): nonlinear forward wave operator
  • \(\mathbf{J}_{\mathcal{F}}\): Jacobian with inverse-scattering imaging condition
  • \(\widetilde{\mathbf{v}}_k\): smoothed migration-velocity model
  • \(\mathbf{d}_k = \mathcal{F}[\mathbf{m}_k] + \boldsymbol{\epsilon}_k\): noisy shot records

Key insight: RTM preserves information from shot records (Alsing et al. 2018; Orozco et al. 2023) while mapping to a uniform image domain where the relation to the CO2 plume is simpler.

Rock Physics & Reference Models

Patchy saturation model (Avseth et al. 2010) maps CO2 saturation to seismic properties: \[ \mathbf{m}_k = \mathcal{R}(\mathbf{m}_0, \mathbf{x}_k),\quad k=1\cdots K \]

CO2 saturation-induced changes in wavespeed and density

Reference model for each timestep uses ensemble average: \[ \mathbf{\bar{m}}_k = \mathcal{R}(\mathbf{m}_0, \mathbf{\bar{x}}_k) \quad\text{where}\quad \mathbf{\bar{x}}_k = \frac{1}{M}\sum_{m=1}^M \mathbf{x}_k^{(m)} \]

Training the CNF

Objective: Train CNF to approximate the posterior by minimizing the forward KL divergence:

\[ \widehat{\boldsymbol{\phi}}_k = \arg\min_{\boldsymbol{\phi}} \frac{1}{M}\sum_{m=1}^M \left(\frac{1}{2}\bigl\|f_{\boldsymbol{\phi}}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)})\bigr\|_2^2 - \log\bigl|\det\bigl(\mathbf{J}^{(m)}_{f_{\boldsymbol{\phi}}}\bigr)\bigr|\right) \tag{3}\]

How it works:

  • CNF \(f_{\boldsymbol{\phi}}\): invertible neural network
  • Maps state \(\mathbf{x}\) to latent \(\mathbf{z}\) conditioned on \(\mathbf{y}\)
  • Latent space \(\rightarrow\) standard Gaussian
  • Amortized over ensemble: generalizes over pairs at a given timestep
  • Trained via Simulation-Based Inference (SBI) (Cranmer et al. 2020)

Key property – marginalization:

  • Permeability \(\boldsymbol{\kappa}\) varies across ensemble members
  • Training on these diverse samples \(\rightarrow\) posterior marginalizes over \(\boldsymbol{\kappa}\)
  • Uncertainty from unknown permeability is built into the posterior

Derivation of the Training Loss

Starting from the KL divergence between true and approximate posterior (Cranmer et al. 2020; Radev et al. 2020):

\[ \widehat{\boldsymbol{\phi}} = \arg\min_{\boldsymbol{\phi}} \mathbb{E}_{p(\mathbf{y})}\bigl[\mathbb{KL}\bigl(p(\mathbf{x}\mid \mathbf{y})\,\|\,p_{\boldsymbol{\phi}}(\mathbf{x}\mid \mathbf{y})\bigr)\bigr] \]

  • Expand KL and drop terms independent of \(\boldsymbol{\phi}\): \[= \arg\max_{\boldsymbol{\phi}} \mathbb{E}_{p(\mathbf{x},\mathbf{y})}\bigl[\log p_{\boldsymbol{\phi}}(\mathbf{x}\mid\mathbf{y})\bigr]\]

  • Apply change-of-variables \(p_{\boldsymbol{\phi}}(\mathbf{x}\mid\mathbf{y}) = p\bigl(\mathbf{z}=f_{\boldsymbol{\phi}}(\mathbf{x};\mathbf{y})\bigr)\bigl|\det(\mathbf{J}_{f_{\boldsymbol{\phi}}})\bigr|\): \[= \arg\max_{\boldsymbol{\phi}} \mathbb{E}_{p(\mathbf{x},\mathbf{y})}\Bigl[\log p\bigl(f_{\boldsymbol{\phi}}(\mathbf{x};\mathbf{y})\bigr) + \log\bigl|\det(\mathbf{J}_{f_{\boldsymbol{\phi}}})\bigr|\Bigr]\]

  • With \(p(\mathbf{z}) = \mathcal{N}(0,\mathbf{I})\) and Monte Carlo approximation \(\rightarrow\) training loss in Equation 3

Posterior Sampling

After training, samples from the posterior can be drawn:

\[ \widehat{\mathbf{x}}_k^{(m)} = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{z}_k^{(m)};\,\mathbf{y}_k^{\text{obs}}),\quad \mathbf{z}_k^{(m)} \sim \mathcal{N}(0,\mathbf{I}) \]

But this reference-to-conditional mapping requires ideal training…

Better approach: Joint-to-conditional mapping (Spantini et al. 2022; Ramgraber et al. 2023):

\[ \widehat{\mathbf{x}}_k^{(m)} = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}\Bigl(\underbrace{f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\,\mathbf{y}_k^{(m)})}_{\mathbf{z}_k^{(m)}};\,\mathbf{y}_k^{\text{obs}}\Bigr) \]

This is a nonlinear extension of EnKF’s correction: \[ \widehat{\mathbf{x}}^{(m)}_{k} = {T}^{\text{nonlin}}_{\mathbf{y}_k^{\text{obs}}}(\mathbf{y}_k^{(m)},\,\mathbf{x}_k^{(m)}) = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}\left(f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\,\mathbf{y}^{(m)}_k);\,\mathbf{y}_k^{\text{obs}}\right) \]

Linear vs. Nonlinear Prior-to-Posterior Mapping

EnKF (Linear)

\[ \widehat{\mathbf{x}}^{(m)}_k = \mathbf{x}^{(m)}_k - \mathbb{C}\text{ov}_{\mathbf{x}_k,\mathbf{y}_k}\,\mathbb{C}\text{ov}^{-1}_{\mathbf{y}_k,\mathbf{y}_k}(\mathbf{y}_k^{(m)} - \mathbf{y}_k^{\text{obs}}) \]

  • Linear correction
  • Gaussian assumption
  • Accuracy saturates with ensemble size
  • Cannot handle nuisance parameters

CNF (Nonlinear)

\[ \widehat{\mathbf{x}}^{(m)}_k = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}\bigl(f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)});\,\mathbf{y}_k^{\text{obs}}\bigr) \]

  • Nonlinear composite mapping
  • No Gaussian assumption
  • Errors cancel in composition
  • Marginalizes over \(\boldsymbol{\kappa}\)
  • Works with small ensembles (\(M=128\))

Advantages of joint-to-conditional:

  1. Improved sample quality with relaxed training convergence
  2. No reliance on external reference distribution
  3. Each predicted state is one-to-one tied to its corrected state

Part IV: Algorithms

Algorithm 1: Forecast, Training & Analysis

Procedure Forecast(\(p(\boldsymbol{\kappa}), p(\boldsymbol{\epsilon}_k), \{\widehat{\mathbf{x}}_{k-1}^{(m)}\}_{m=1}^M\)):

for \(m = 1\) to \(M\) do

  Sample from transition density: \(\mathbf{x}^{(m)}_k\sim p(\cdot\mid \widehat{\mathbf{x}}^{(m)}_{k-1})\) via dynamics simulations

  Sample from likelihood: \(\mathbf{y}^{(m)}_k\sim p(\cdot\mid \mathbf{x}^{(m)}_{k})\) via observation simulations

return \(\{(\mathbf{x}_k^{(m)},\mathbf{y}_k^{(m)})\}_{m=1}^{M}\)

Procedure Training(\(\{(\mathbf{x}_k^{(m)},\mathbf{y}_k^{(m)})\}_{m=1}^{M}\)):

\(\displaystyle\widehat{\boldsymbol{\phi}}_k = \arg\min_{\boldsymbol{\phi}} \frac{1}{M}\sum_{m=1}^M \left(\frac{1}{2}\|f_{\boldsymbol{\phi}}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)})\|_2^2 - \log|\det(\mathbf{J}^{(m)}_{f_{\boldsymbol{\phi}}})|\right)\)

return \(\widehat{\boldsymbol{\phi}}_k\)

Procedure Analysis(\(\mathbf{y}^{\text{obs}}_{k}, \widehat{\boldsymbol{\phi}}_k, \{(\mathbf{x}_k^{(m)},\mathbf{y}_k^{(m)})\}_{m=1}^{M}\)):

for \(m = 1\) to \(M\) do

  Compute latent variables: \(\mathbf{z}^{(m)}_k = f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)})\)

  Apply prior-to-posterior map: \(\widehat{\mathbf{x}}_k^{(m)} = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{z}_k^{(m)};\mathbf{y}_k^{\text{obs}})\)

return \(\{\widehat{\mathbf{x}}^{(m)}_k\}_{m=1}^M\)

Algorithm 2: Sequential SBI

Require: Datasets \(\mathbf{y}_k^{\text{obs}}\), collected sequentially at times \(k=1,\ldots,K\)

Output: Estimated ensemble \(\{\widehat{\mathbf{x}}^{(m)}_k\}_{m=1}^M\) for \(k=1,\ldots,K\)

  1. Initialize: For \(m = 1\) to \(M\): sample \(\widehat{\mathbf{x}}^{(m)}_0 \sim p(\mathbf{x}_0)\)

  2. For \(k = 1\) to \(K\):

    • \(\{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^{M} =\) Forecast\((p(\boldsymbol{\kappa}), p(\boldsymbol{\epsilon}_k), \{\widehat{\mathbf{x}}_{k-1}^{(m)}\}_{m=1}^M)\)
    • \(\widehat{\boldsymbol{\phi}}_k =\) Training\((\{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^{M})\)
    • Collect field data \(\mathbf{y}_k^{\text{obs}}\)
    • \(\{\widehat{\mathbf{x}}_k^{(m)}\}_{m=1}^M =\) Analysis\((\mathbf{y}^{\text{obs}}_{k}, \widehat{\boldsymbol{\phi}}_k, \{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^{M})\)

This procedure implements neural sequential Bayesian inference – the basis of our Digital Shadow.

Part V: Setup & Validation

Probabilistic Baseline Model

Velocity samples from WISE

Corresponding permeability samples
  • Baseline seismic data (SNR \(12\) dB) processed with WISE (Yin et al. 2024)
  • Posterior velocity samples \(\mathbf{v}_0 \sim p(\mathbf{v}_0 \mid \mathbf{y}_0^{\text{obs}})\) converted to permeability \(\boldsymbol{\kappa} \sim p(\boldsymbol{\kappa})\)
  • High variability reflects epistemic uncertainty in reservoir properties
  • Study area: South-Western North Sea, Compass model (E. Jones et al. 2012)

Ground-Truth Simulations

Ground-truth states: saturation (left), pressure perturbation (center), seismic observations (right) for \(k=1\cdots 4\)

Simulation parameters:

  • Grid: \(512 \times 256\), spacing \(6.25\) m
  • 24 timesteps of 80 days each
  • 4 seismic surveys at 480-day intervals
  • Injection: \(0.05\,\text{m}^3/\text{s}\) at \(1200\) m depth
  • Total: \(5.8\) Mt CO2 (\(1.1\) Mt/year)

Forecast Ensemble Variability

Predicted state samples at \(k=1\) showing variability from different permeability realizations

  • Different permeability realizations \(\rightarrow\) vastly different CO2 plume shapes
  • Both saturation and pressure perturbations vary significantly
  • This variability is essential for training the CNFs

Impact of the Analysis Step

Corrections by the Analysis step at \(k=3\): individual samples and conditional means

  • Top row: CO2 saturation corrections
  • Bottom row: Pressure perturbation corrections
  • Conditioning on observed data reduces sample variation and produces more resolved estimates

Impact on Errors and Uncertainty

Errors and standard deviations before and after the Analysis step at \(k=3\)

  • Errors relative to ground truth are significantly reduced after correction
  • Uncertainty (conditional standard deviation) tightens during the Analysis step
  • This validates the prior-to-posterior mapping

Part VI: Multimodal Case Study

Experimental Setup

Three conditioning scenarios compared:

Modality Data used
Wells only Saturation + pressure at well locations
Seismic only Time-lapse migrated seismic images
Wells + Seismic Both modalities combined

Ensemble parameters:

  • \(M = 128\) ensemble members
  • 120 training / 8 validation / 1 test (ground-truth)
  • 120 training epochs, Adam optimizer, LR \(= 5 \times 10^{-4}\)
  • Noise magnitude schedule: \(0.1 \rightarrow 0.005\)

Saturation Results: Timestep \(k=1\)

Posterior mean, errors, and standard deviations at \(k=1\) for four scenarios

Saturation Results: Timestep \(k=2\)

Posterior mean, errors, and standard deviations at \(k=2\)

Saturation Results: Timestep \(k=3\)

Posterior mean, errors, and standard deviations at \(k=3\)

Saturation Results: Timestep \(k=4\)

Posterior mean, errors, and standard deviations at \(k=4\)

Warning

Well-only data fails to capture the critical leftward protruding plume feature that breaches containment!

Saturation Evolution: Unconditioned

Unconditioned saturation evolution

Saturation Evolution: Wells vs. Seismic

Wells Conditioned

Seismic Conditioned

Saturation Evolution: Combined Conditioning

Seismic + Wells Conditioned saturation evolution

Pressure Evolution: Unconditioned

Unconditioned pressure evolution

Pressure Evolution: Wells vs. Seismic

Wells Conditioned

Seismic Conditioned

Pressure Evolution: Combined Conditioning

Seismic + Wells Conditioned pressure evolution

Part VII: Quantitative Results

Performance Metrics

SSIM-Error

Relative RMSE

Relative STD

Key observations:

  • Conditioning on time-lapse data improves all metrics
  • Well-only corrections are minor (orange)
  • Seismic (blue) and combined (green) yield substantial improvements
  • Quality deteriorates between surveys but recovers after Analysis
  • Relative errors remain flat or decrease with multimodal conditioning

Metric Definitions

SSIM-Error: \(1 - \text{SSIM}(\mathbf{x}^\ast_k, \bar{\widehat{\mathbf{x}}}_k)\)

RMSE: \[ \text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N\bigl(\mathbf{x}^\ast[i] - \bar{\widehat{\mathbf{x}}}[i]\bigr)^2} \]

Relative RMSE: \[ \Delta\text{RMSE}_k = \frac{\text{RMS}(\mathbf{e}_k)}{\text{RMS}(\mathbf{x}^\ast_k)} \]

Mean Standard Deviation: \[ \bar{\sigma} = \frac{1}{N}\sum_{i=1}^N \bar{\widehat{\boldsymbol{\sigma}}}[i] \]

Relative STD: \[ \Delta\text{STD}_k = \frac{\text{mean}(\bar{\widehat{\boldsymbol{\sigma}}}_k)}{\text{RMS}(\bar{\widehat{\mathbf{x}}}_k)} \]

These normalizations account for biases from plume growth over time.

Training Diagnostics

Training and validation loss

SSIM during training
  • Loss decreases smoothly for both training and validation
  • SSIM improves steadily
  • No sign of overfitting

Uncertainty Calibration

Calibration: binned MSE vs. posterior deviation

Calibration test (Guo et al. 2017):

  • Group deviations into \(L\) bins \(B_l\)
  • For each bin compute:

\[ \text{UQ}(B_l) = \frac{1}{|B_l|}\sum_{i \in B_l} \bar{\widehat{\boldsymbol{\sigma}}}[i] \]

\[ \text{MSE}(B_l) = \frac{1}{|B_l|}\sum_{i \in B_l} (\mathbf{x}^\ast[i] - \widehat{\mathbf{x}}[i])^2 \]

  • Good calibration \(\rightarrow\) points along \(y = x\)
  • Calibration degrades at later timesteps as plume grows

Part VIII: 3D Extension & Discussion

Scaling to 3D: Ground Truth & Seismic

Ground truth

Seismic image

\(3.2 \times 3.2 \times 3.2\) km\(^3\) volume, \(128^3\) grid, 16000 receivers, 16 sources (Gahlot, Orozco, & Herrmann 2025)

Scaling to 3D: Forecast vs. Posterior

Forecast mean

Posterior mean

Scaling to 3D: Error & Uncertainty

Error

Standard deviation

Key Findings

  1. Unknown high-permeability streak undetected by well-only data, but captured by seismic – critical for containment monitoring

  2. Multimodal data (seismic + wells) produces the best estimates across all metrics and timesteps

  3. Uncertainty calibration: pointwise standard deviation correlates with errors relative to ground truth

  4. Nonlinear mapping via CNFs handles complex non-Gaussian posterior distributions that EnKF cannot capture

  5. Small ensemble size (\(M=128\)) sufficient thanks to joint-to-conditional mapping

  6. Scalable to 3D via physics-based summary statistics (RTM) and memory-efficient invertible networks

Open-Source Software Stack

All results are fully reproducible:

Component Package Purpose
Reservoir simulation JutulDarcy.jl Multiphase flow
Seismic modeling JUDI.jl Wave simulations & RTM
Neural networks InvertibleNetworks.jl Conditional normalizing flows
Full pipeline Twin4GCS.jl Digital Shadow implementation

Conclusions & Future Work

Conclusions

  • Introduced an uncertainty-aware Digital Shadow for underground CO2 storage monitoring

  • Combines high-fidelity simulations with neural data assimilation via Simulation-Based Inference

  • Nonlinear prior-to-posterior mapping via CNFs extends Ensemble Kalman Filtering to handle:

    • Non-Gaussian statistics
    • Stochastic nuisance parameters (permeability)
    • Nonlinear dynamics and observations
  • Multimodal conditioning (seismic + wells) produces the most accurate state estimates

  • Seismic data is essential for detecting critical flow pathways that wells alone miss

  • First proof-of-concept for an uncertainty-aware, scalable Digital Shadow for GCS

  • Lays the foundation for a Digital Twin to optimize underground storage operations

Future Work

Methodological extensions:

Towards a Digital Twin:

  • Optimized operations: maximize injectivity while avoiding fracture pressure (Gahlot et al. 2024)
  • Adaptive surveys: trigger follow-up when injection deviates from expectations
  • Optimal monitoring: select well locations to reduce uncertainty most (Orozco, Gahlot, et al. 2024)
  • Include geomechanical, chemical & elastic effects
  • Permeability inference from time-lapse data

From Digital Shadow to Digital Twin

The Digital Shadow provides state estimation. A full Digital Twin adds control and decision-making (Gahlot et al. 2024):

Digital Twin lifecycle: training, inference, and decision-making

Key question: How do we maximize CO2 injectivity while avoiding fracture pressure?

Digital Twin with Experimental Design

Digital Twin with experimental design: forecasted plumes, synthetic observations, trained inference network with optimal well density, and posterior inference from field data

Bayesian Experimental Design

Solution: Choose monitoring well locations \(\mathbf{d}^\ast\) that maximize the Expected Information Gain (EIG) (Orozco, Gahlot, et al. 2024):

\[ \mathbf{d}^\ast = \arg\max_{\mathbf{d}} \;\mathbb{E}_{p(\mathbf{y} \mid \mathbf{d})}\bigl[\mathbb{KL}\bigl(p(\mathbf{x} \mid \mathbf{y}, \mathbf{d})\,\|\,p(\mathbf{x})\bigr)\bigr] \]

Key insight: Maximizing the expected posterior log-density (our CNF training objective) is equivalent to maximizing the EIG:

\[ \max_{\mathbf{d}}\;\mathbb{E}_{p(\mathbf{x},\mathbf{y} \mid \mathbf{d})}\bigl[\log p_{\boldsymbol{\phi}}(\mathbf{x} \mid \mathbf{y}, \mathbf{d})\bigr] \;\Longleftrightarrow\; \max_{\mathbf{d}}\;\text{EIG}(\mathbf{d}) \]

This means we can jointly optimize the CNF parameters \(\boldsymbol{\phi}\) and the experimental design \(\mathbf{d}\)!

Optimal Well Placement: Results

Learned optimal well density

Selected well locations (red) vs. optimal (black)

Optimized vs. random well placement: RMSE comparison
  • Red crosses: optimized placement consistently outperforms random (blue boxes)
  • Improvement grows with number of monitoring timesteps

The Control Problem – Many thanks to Haoyun Li

Goal: Maximize cumulative CO2 injection while ensuring reservoir pressure does not exceed fracture pressure \(p_{\text{frac}}\) (Gahlot et al. 2024).

\[ \begin{aligned} \max_{q(t)} \quad & \int_{0}^{T} q(t)\,\mathrm{d}t \\ \text{subject to}\quad & \widehat{\text{POF}}_{\tau}(q) \le \varepsilon \quad \text{or} \quad \widehat{\text{CVaR}}_{\alpha}(\boldsymbol{\mathcal{L}}(q)) \le \gamma, \end{aligned} \]

where either the POF or the CVaR constraint is active (not both simultaneously), \(\varepsilon\) is the maximum allowable exceedance probability, and \(\gamma\) is the maximum allowable tail severity.

Because the subsurface parameters \(\mathbf{Z}\) are random, the constraint \(g(q, \mathbf{Z}) \leq 0\) cannot be enforced deterministically – we need risk measures.

Controlled Injectivity: Comparison

Non-fracture vs. fracture scenarios: safety margin (top), pressure perturbation (middle), and CO2 saturation (bottom) at \(t=480\) days

Risk Measures: POF, Quantile & CVaR

Relationship between POF, Quantile, and CVaR

Probability of Failure (POF)

The Probability of Failure quantifies the chance that the constraint is violated:

\[ \text{POF}_{\epsilon}(\mathbf{d}) = \mathbb{P}\bigl[g(\mathbf{d}, \mathbf{Z}) > 0\bigr] \leq \epsilon \]

  • One-line: the probability that pressure exceeds the fracture threshold
  • Parameterized by tolerance \(\epsilon\) (e.g., \(\epsilon = 0.05\) means \(\leq 5\%\) failure probability)

Quantile (Value-at-Risk)

The Quantile (or Value-at-Risk) at level \(\alpha\) is the threshold below which a fraction \(\alpha\) of outcomes fall:

\[ Q_{\alpha}(\mathbf{d}) = \inf\bigl\{t : \mathbb{P}\bigl[g(\mathbf{d}, \mathbf{Z}) \leq t\bigr] \geq \alpha\bigr\} \]

  • One-line: the worst-case constraint value at confidence level \(\alpha\)
  • The POF constraint \(\text{POF}_\epsilon \leq \epsilon\) is equivalent to requiring \(Q_{1-\epsilon} \leq 0\)

Conditional Value-at-Risk (CVaR)

CVaR (also called Expected Shortfall) averages over the worst outcomes beyond the quantile:

\[ \text{CVaR}_{\gamma}(\mathbf{d}) = \mathbb{E}\bigl[g(\mathbf{d}, \mathbf{Z}) \mid g(\mathbf{d}, \mathbf{Z}) \geq Q_{\gamma}(\mathbf{d})\bigr] \]

  • One-line: the expected severity of constraint violations in the worst \((1-\gamma)\) fraction of scenarios
  • Parameterized by \(\gamma\) (e.g., \(\gamma = 0.95\) averages over the worst \(5\%\))
  • More conservative than POF: penalizes not just the probability but also the magnitude of failure

Fracture Probability vs. Injectivity

Empirical CDF of fracture probability vs. injection rate for POF (\(\varepsilon=0.0\), \(\varepsilon=0.01\)) and CVaR (\(\gamma=0.1\), \(\alpha=0.01\)) constraints with bootstrap confidence intervals

  • CVaR constraint yields higher optimal injection rates while maintaining safety
  • All constrained approaches keep fracture probability below the \(1\%\) threshold

Controlled Injection: POF vs. CVaR

Controlled vs. Uncontrolled Injection

Thank You!

Paper: Gahlot, Orozco, Yin, Bruer & Herrmann (2025). “An uncertainty-aware Digital Shadow for underground multimodal CO2 storage monitoring.” Geophysical Journal International, 242(1), ggaf176.

Code: github.com/slimgroup/Twin4GCS.jl

Data: doi.org/10.5281/zenodo.15204624

Group website: slim.gatech.edu

Acknowledgements:

  • Georgia Research Alliance & ML4Seismic Center partners
  • US National Science Foundation grant OAC 2203821

References

Alsing, J., Wandelt, B. & Feeney, S., 2018. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology. Monthly Notices of the Royal Astronomical Society, 477, 2874–2885, Oxford University Press.
Asch, M., 2022. A toolbox for digital twins: From model-based to data-driven, SIAM.
Avseth, P., Mukerji, T. & Mavko, G., 2010. Quantitative seismic interpretation: Applying rock physics tools to reduce interpretation risk, Cambridge university press.
Bui, M., Adjiman, C.S., Bardow, A., Anthony, E.J., Boston, A., Brown, S., Fennell, P.S., et al., 2018. Carbon capture and storage (CCS): The way forward. Energy & Environmental Science, 11, 1062–1176, Royal Society of Chemistry.
Cranmer, K., Brehmer, J. & Louppe, G., 2020. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117, 30055–30062. doi:10.1073/pnas.1912789117
Deng, Z., Orozco, R., Gahlot, A.P. & Herrmann, F.J., 2025. Probabilistic joint recovery method for CO2 plume monitoring. arXiv preprint arXiv:2501.18761.
E. Jones, C., A. Edgar, J., I. Selvage, J. & Crook, H., 2012. Building complex synthetic models to evaluate acquisition geometries and velocity inversion technologies. In 74th EAGE Conference and Exhibition Incorporating EUROPEC 2012, cp–293, European Association of Geoscientists & Engineers. doi:https://doi.org/10.3997/2214-4609.20148575
Elsemüller, L., Olischläger, H., Schmitt, M., Bürkner, P.-C., Koethe, U. & Radev, S.T., 2024. Sensitivity-aware amortized bayesian inference. Transactions on Machine Learning Research. Retrieved from https://openreview.net/forum?id=Kxtpa9rvM0
Gahlot, A.P. & Herrmann, F.J., 2025. Enhancing robustness of digital shadow for CO2 storage monitoring with augmented rock physics modeling. arXiv preprint arXiv:2502.07171.
Gahlot, A.P., Li, H., Yin, Z., Orozco, R. & Herrmann, F.J., 2024. A digital twin for geological carbon storage with controlled injectivity. arXiv preprint arXiv:2403.19819.
Gahlot, A.P., Orozco, R. & Herrmann, F.J., 2025. Advancing geological carbon storage monitoring with 3D digital shadow technology. arXiv preprint arXiv:2502.07169.
Gahlot, A.P., Orozco, R., Yin, Z., Bruer, G. & Herrmann, F., 2025, April. Dataset - digital twin for geological carbon storage monitoring, Zenodo. doi:10.5281/zenodo.15204624
Gahlot, A.P., Orozco, R., Yin, Z., Bruer, J. & Herrmann, F.J., 2025. An uncertainty-aware digital shadow for underground multimodal CO\(_2\) storage monitoring. Geophysical Journal International, 242, ggaf176. doi:10.1093/gji/ggaf176
Grady, T.J., Khan, R., Louboutin, M., Yin, Z., Witte, P.A., Chandra, R., Hewett, R.J., et al., 2023. Model-parallel fourier neural operators as learned surrogates for large-scale parametric PDEs. Computers & Geosciences, 178, 105402, Elsevier.
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K.Q., 2017. On calibration of modern neural networks. International conference on machine learning, pp. 1321–1330, PMLR.
IPCC (Intergovernmental Panel on Climate Change), 2018. Global warming of 1.5° c. An IPCC special report on the impacts of global warming of 1.5° c above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty, ipcc Geneva.
IPCC special report, 2005. IPCC special report on carbon dioxide capture and storage, Cambridge: Cambridge University Press.
IPCC special report, 2018. Global warming of 1.5 c. An IPCC Special Report on the impacts of global warming of, 1, 43–50.
Ketzer, J.M., Iglesias, R.S. & Einloft, S., 2012. Reducing greenhouse gas emissions with CO\(_{2}\) capture and geological storage. in Handbook of climate change mitigation, pp. 1405–1440, Springer US. doi:10.1007/978-1-4419-7991-9_37
Møyner, O., Johnsrud, M., Nilsen, H.M., Raynaud, X., Lye, K.O. & Yin, Z., 2023, April. Sintefmath/jutul.jl: v0.2.6, Zenodo. doi:10.5281/zenodo.7855605
Orozco, R., Gahlot, A. & Herrmann, F.J., 2024. BEACON: Bayesian experimental design acceleration with conditional normalizing flows \(-\) a case study in optimal monitor well placement for CO \(\_2\) sequestration. arXiv preprint arXiv:2404.00075.
Orozco, R., Siahkoohi, A., Rizzuti, G., Leeuwen, T. van & Herrmann, F.J., 2023. Adjoint operators enable fast and amortized machine learning based bayesian uncertainty quantification. Medical imaging 2023: Image processing, Vol. 12464, pp. 357–367, SPIE.
Orozco, R., Witte, P., Louboutin, M., Siahkoohi, A., Rizzuti, G., Peters, B. & Herrmann, F.J., 2024. InvertibleNetworks.jl: A julia package for scalable normalizing flows. Journal of Open Source Software, 9, 6554, The Open Journal. doi:10.21105/joss.06554
Radev, S.T., Mertens, U.K., Voss, A., Ardizzone, L. & Köthe, U., 2020. BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE transactions on neural networks and learning systems, 33, 1452–1466, IEEE.
Ramgraber, M., Baptista, R., McLaughlin, D. & Marzouk, Y., 2023. Ensemble transport smoothing. Part II: Nonlinear updates. Journal of Computational Physics: X, 17, 100133, Elsevier.
Ringrose, P., 2020. How to store CO\(_{2}\) underground: Insights from early-mover CCS projects, Vol. 129, Springer.
Ringrose, P., 2023. Storage of carbon dioxide in saline aquifers: Building confidence by forecasting and monitoring, Society of Exploration Geophysicists.
Schmitt, M., Radev, S.T. & Bürkner, P.-C., 2023. Fuse it or lose it: Deep fusion for multimodal simulation-based inference. arXiv preprint arXiv:2311.10671.
Spantini, A., Baptista, R. & Marzouk, Y., 2022. Coupling techniques for nonlinear ensemble filtering. SIAM Review, 64, 921–953, SIAM.
Witte, P.A., Louboutin, M., Kukreja, N., Luporini, F., Lange, M., Gorman, G.J. & Herrmann, F.J., 2019. A large-scale framework for symbolic implementations of seismic inversion algorithms in julia. Geophysics, 84, F57–F71. doi:10.1190/geo2018-0174.1
Yin, Z., Orozco, R., Louboutin, M. & Herrmann, F.J., 2024. WISE: Full-waveform variational inference via subsurface extensions. Geophysics, 89, 1–31, Society of Exploration Geophysicists.