An Uncertainty-Aware Digital Shadow for Underground CO₂ Storage Monitoring

Digital Twins for Physical Systems – Capstone Lecture

Felix J. Herrmann

Georgia Institute of Technology

2026-03-11

An Uncertainty-Aware Digital Shadow for Underground CO₂ Storage Monitoring

Felix J. Herrmann

Computational Science and Engineering — Georgia Institute of Technology

Note

This slide deck is derived from Gahlot, Orozco, Yin, et al. (2025b).

Outline

Motivation – Why Geological Carbon Storage needs Digital Twins
Problem Formulation – Nonlinear state dynamics, observations & Bayesian filtering
Methodology – Lifecycle of the Digital Shadow
- Forecast, Training & Analysis steps
- Summary statistics & Conditional Normalizing Flows
- Linear vs. nonlinear prior-to-posterior mappings
Algorithms – Neural data assimilation
Setup & Validation – In silico case study
Multimodal Case Study – Seismic, wells & combined
Results & Discussion
Conclusions & Future Work

Part I: Motivation

The Need for Geological Carbon Storage

Global warming requires net-negative CO₂ emission technologies (IPCC special report 2005; Bui et al. 2018)
IPCC: contain atmospheric warming to \(<2°\text{C}\) by 2100 (IPCC (Intergovernmental Panel on Climate Change) 2018)
Need to remove tens of gigatonnes of CO₂ per year (IPCC special report 2018)
Geological Carbon Storage (GCS): inject supercritical CO₂ into deep geological formations
- Depleted oil/gas reservoirs
- Deep saline aquifers
- Coal seams
Can mitigate up to 20% of total CO₂ emission problem (Ketzer et al. 2012)
Requires scalable monitoring to ensure Conformance and Containment (Ringrose 2020, 2023)

The Challenge: Unknown Subsurface Properties

Fluid-flow properties (permeability) are poorly known
Displacement of brine by supercritical CO₂ is intricate
Different permeability realizations \(\rightarrow\) vastly different plume outcomes
- Left: plume stays within storage complex
- Right: plume breaches containment due to unknown high-permeability streak
Both realizations consistent with baseline seismic survey!

Two vastly different plume outcomes for two permeability realizations consistent with the same baseline survey

Important

We cannot rely on reservoir simulations alone – we need data assimilation with time-lapse monitoring data!

Digital Twin for Enhanced Oil Recovery

From Digital Shadows to Digital Twins

IBM’s definition of a Digital Twin:

A digital twin is a virtual representation of an object or system designed to reflect a physical object accurately. It spans the object’s lifecycle, is updated from real-time data and uses simulation, machine learning and reasoning to help make decisions.

Digital Shadow (Asch 2022):

Captures temporal evolution of CO₂ plume via unidirectional data integration
Combines: reservoir simulation + wave-equation inversion + ensemble Bayesian filtering + neural posterior density estimation
Foundation for future Digital Twin with control, reasoning & decision-making

Key innovations over standard EnKF:

Stochastic treatment of permeability \(\boldsymbol{\kappa}\)
No Gaussian/linearity assumptions
Learned nonlinear prior-to-posterior mapping
Integration of multimodal time-lapse data

Part II: Problem Formulation

State Dynamics Model

The time-varying state \(\mathbf{x}_k = \bigl(\mathbf{x}_k[S_{\text{CO}_2}],\, \mathbf{x}_k[\delta p]\bigr)\) evolves as:

\[ \mathbf{x}_{k} = \mathcal{M}_k\bigl(\mathbf{x}_{k-1}, \boldsymbol{\kappa}_k\bigr),\quad \boldsymbol{\kappa}_k\sim p(\boldsymbol{\kappa}) \quad \text{for}\quad k=1\cdots K \tag{1}\]

where:

Symbol	Meaning
\(\mathcal{M}_k\)	Dynamics operator (multiphase flow simulator)
\(\mathbf{x}_k[S_{\text{CO}_2}]\)	CO₂ saturation at timestep \(k\)
\(\mathbf{x}_k[\delta p]\)	Pressure perturbation at timestep \(k\)
\(\boldsymbol{\kappa}\)	Permeability field – stochastic nuisance parameter
\(p(\boldsymbol{\kappa})\)	Prior distribution from baseline seismic (via WISE)

Note

The permeability \(\boldsymbol{\kappa}\) is drawn anew at each timestep, reflecting our epistemic uncertainty about reservoir properties.

Observation Model

Multimodal time-lapse observations:

\[ \mathbf{y}_k = \mathcal{H}_k(\mathbf{x}_k,\boldsymbol{\epsilon}_k, \mathbf{\bar{m}}_k),\quad \boldsymbol{\epsilon}_k \sim p(\boldsymbol{\epsilon}) \quad\text{for}\quad k=1,2,\cdots,K \tag{2}\]

Two types of observations:

1. Well measurements (direct)

Saturation \(S_{\text{CO}_2}\) at well locations
Pressure perturbation \(\delta p\) at well locations
Sparse but accurate

2. Surface seismic (indirect)

Active-source seismic surveys
Processed via reverse-time migration (RTM)
Broad spatial coverage
Subject to noise \(\boldsymbol{\epsilon}_k\)

The observation operator \(\mathcal{H}_k\) is nonlinear, ill-posed, and computationally expensive.

Nonlinear Bayesian Filtering

The goal: recursively compute \(p(\mathbf{x}_k\mid \mathbf{y}_{1:k})\).

Three steps (Asch 2022):

Initialization: Choose prior \(p(\mathbf{x}_0)\)
Prediction (Chapman-Kolmogorov): \[ p(\mathbf{x}_k\mid \mathbf{y}_{1:k-1}) = \mathbb{E}_{\mathbf{x}_{k-1}\sim p(\mathbf{x}_{k-1}\mid\mathbf{y}_{1:k-1})}\bigl[p(\mathbf{x}_{k}\mid\mathbf{x}_{k-1})\bigr] \]
Analysis (Bayes’ rule): \[ p(\mathbf{x}_{k}|\mathbf{y}_{1:k}) \propto p(\mathbf{y}_{k}|\mathbf{x}_{k})\, p(\mathbf{x}_{k}|\mathbf{y}_{1:k-1}) \]

Challenges: No analytical solution for nonlinear, non-Gaussian dynamics + nuisance parameters + high-dimensional state + expensive operators.

Why Not Standard EnKF?

Ensemble Kalman Filter propagates \(M\) particles:

\[ \widehat{\mathbf{x}}^{(m)}_k = \mathbf{x}^{(m)}_k - \mathbb{C}\text{ov}_{\mathbf{x}_k, \mathbf{y}_k}\,\mathbb{C}\text{ov}^{-1}_{\mathbf{y}_k, \mathbf{y}_k}\bigl(\mathbf{y}_k^{(m)}-\mathbf{y}_k^{\text{obs}}\bigr) \]

Limitations:

Linear correction \(\rightarrow\) introduces bias for nonlinear problems
Accuracy plateaus with increasing ensemble size (Spantini et al. 2022)
Cannot handle non-Gaussian statistics
Unclear how to handle nuisance parameters (\(\boldsymbol{\kappa}\))

Our approach: Replace the linear mapping with a learned nonlinear prior-to-posterior mapping using Conditional Normalizing Flows (CNFs).

Part III: Methodology

Lifecycle of the Digital Shadow

Lifecycle of the Digital Shadow at timestep \(k=1\): Forecast \(\rightarrow\) Training \(\rightarrow\) Analysis

Lifecycle – Step by Step

At each timestep \(k\), the Digital Shadow cycles through:

1. Forecast Step

Time-advance posterior samples from \(k{-}1\): \[ \mathbf{x}_k^{(m)} \sim p(\mathbf{x}_k \mid \widehat{\mathbf{x}}_{k-1}^{(m)}) \]
Simulate observations: \[ \mathbf{y}_k^{(m)} \sim p(\mathbf{y}_k \mid \mathbf{x}_k^{(m)}) \]
Produces training ensemble \(\{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^M\)

2. Training Step

Train CNF \(f_{\boldsymbol{\phi}}\) on ensemble pairs

3. Analysis Step

Condition on field data \(\mathbf{y}_k^{\text{obs}}\)
Map prior samples to posterior: \[ \mathbf{x}_k^{(m)} \mapsto \widehat{\mathbf{x}}_k^{(m)} \sim p(\widehat{\mathbf{x}}_k \mid \mathbf{y}_{1:k}^{\text{obs}}) \]

Tip

The \(\widehat{\quad}\) symbol distinguishes analyzed states (conditioned on field data) from predicted (digital) states.

Imaging-Based Summary Statistics

Challenge: Raw seismic data is high-dimensional, non-uniform, and the mapping to CO₂ states is complex.

Solution: Physics-based summary statistics via Reverse-Time Migration (RTM) (Orozco et al. 2023):

\[ \mathbf{y}_k = \Bigl(\mathbf{J}_{\mathcal{F}[\mathbf{v}]}\Bigr|_{\widetilde{\mathbf{v}}_k}\Bigr)_{\text{ic}}^{\top} \Bigl(\mathcal{F}[\mathbf{\bar{m}}_k]-\mathbf{d}_k\Bigr) \]

where:

\(\mathcal{F}[\mathbf{m}]\): nonlinear forward wave operator
\(\mathbf{J}_{\mathcal{F}}\): Jacobian with inverse-scattering imaging condition
\(\widetilde{\mathbf{v}}_k\): smoothed migration-velocity model
\(\mathbf{d}_k = \mathcal{F}[\mathbf{m}_k] + \boldsymbol{\epsilon}_k\): noisy shot records

Key insight: RTM preserves information from shot records (Alsing et al. 2018; Orozco et al. 2023) while mapping to a uniform image domain where the relation to the CO₂ plume is simpler.

Rock Physics & Reference Models

Patchy saturation model (Avseth et al. 2010) maps CO₂ saturation to seismic properties: \[ \mathbf{m}_k = \mathcal{R}(\mathbf{m}_0, \mathbf{x}_k),\quad k=1\cdots K \]

CO₂ saturation-induced changes in wavespeed and density

Reference model for each timestep uses ensemble average: \[ \mathbf{\bar{m}}_k = \mathcal{R}(\mathbf{m}_0, \mathbf{\bar{x}}_k) \quad\text{where}\quad \mathbf{\bar{x}}_k = \frac{1}{M}\sum_{m=1}^M \mathbf{x}_k^{(m)} \]

Training the CNF

Objective: Train CNF to approximate the posterior by minimizing the forward KL divergence:

\[ \widehat{\boldsymbol{\phi}}_k = \arg\min_{\boldsymbol{\phi}} \frac{1}{M}\sum_{m=1}^M \left(\frac{1}{2}\bigl\|f_{\boldsymbol{\phi}}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)})\bigr\|_2^2 - \log\bigl|\det\bigl(\mathbf{J}^{(m)}_{f_{\boldsymbol{\phi}}}\bigr)\bigr|\right) \tag{3}\]

How it works:

CNF \(f_{\boldsymbol{\phi}}\): invertible neural network
Maps state \(\mathbf{x}\) to latent \(\mathbf{z}\) conditioned on \(\mathbf{y}\)
Latent space \(\rightarrow\) standard Gaussian
Amortized over ensemble: generalizes over pairs at a given timestep
Trained via Simulation-Based Inference (SBI) (Cranmer et al. 2020)

Key property – marginalization:

Permeability \(\boldsymbol{\kappa}\) varies across ensemble members
Training on these diverse samples \(\rightarrow\) posterior marginalizes over \(\boldsymbol{\kappa}\)
Uncertainty from unknown permeability is built into the posterior

Derivation of the Training Loss

Starting from the KL divergence between true and approximate posterior (Cranmer et al. 2020; Radev et al. 2020):

\[ \widehat{\boldsymbol{\phi}} = \arg\min_{\boldsymbol{\phi}} \mathbb{E}_{p(\mathbf{y})}\bigl[\mathbb{KL}\bigl(p(\mathbf{x}\mid \mathbf{y})\,\|\,p_{\boldsymbol{\phi}}(\mathbf{x}\mid \mathbf{y})\bigr)\bigr] \]

Expand KL and drop terms independent of \(\boldsymbol{\phi}\): \[= \arg\max_{\boldsymbol{\phi}} \mathbb{E}_{p(\mathbf{x},\mathbf{y})}\bigl[\log p_{\boldsymbol{\phi}}(\mathbf{x}\mid\mathbf{y})\bigr]\]
Apply change-of-variables \(p_{\boldsymbol{\phi}}(\mathbf{x}\mid\mathbf{y}) = p\bigl(\mathbf{z}=f_{\boldsymbol{\phi}}(\mathbf{x};\mathbf{y})\bigr)\bigl|\det(\mathbf{J}_{f_{\boldsymbol{\phi}}})\bigr|\): \[= \arg\max_{\boldsymbol{\phi}} \mathbb{E}_{p(\mathbf{x},\mathbf{y})}\Bigl[\log p\bigl(f_{\boldsymbol{\phi}}(\mathbf{x};\mathbf{y})\bigr) + \log\bigl|\det(\mathbf{J}_{f_{\boldsymbol{\phi}}})\bigr|\Bigr]\]
With \(p(\mathbf{z}) = \mathcal{N}(0,\mathbf{I})\) and Monte Carlo approximation \(\rightarrow\) training loss in Equation 3

Posterior Sampling

After training, samples from the posterior can be drawn:

\[ \widehat{\mathbf{x}}_k^{(m)} = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{z}_k^{(m)};\,\mathbf{y}_k^{\text{obs}}),\quad \mathbf{z}_k^{(m)} \sim \mathcal{N}(0,\mathbf{I}) \]

But this reference-to-conditional mapping requires ideal training…

Better approach: Joint-to-conditional mapping (Spantini et al. 2022; Ramgraber et al. 2023):

\[ \widehat{\mathbf{x}}_k^{(m)} = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}\Bigl(\underbrace{f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\,\mathbf{y}_k^{(m)})}_{\mathbf{z}_k^{(m)}};\,\mathbf{y}_k^{\text{obs}}\Bigr) \]

This is a nonlinear extension of EnKF’s correction: \[ \widehat{\mathbf{x}}^{(m)}_{k} = {T}^{\text{nonlin}}_{\mathbf{y}_k^{\text{obs}}}(\mathbf{y}_k^{(m)},\,\mathbf{x}_k^{(m)}) = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}\left(f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\,\mathbf{y}^{(m)}_k);\,\mathbf{y}_k^{\text{obs}}\right) \]

Linear vs. Nonlinear Prior-to-Posterior Mapping

EnKF (Linear)

\[ \widehat{\mathbf{x}}^{(m)}_k = \mathbf{x}^{(m)}_k - \mathbb{C}\text{ov}_{\mathbf{x}_k,\mathbf{y}_k}\,\mathbb{C}\text{ov}^{-1}_{\mathbf{y}_k,\mathbf{y}_k}(\mathbf{y}_k^{(m)} - \mathbf{y}_k^{\text{obs}}) \]

Linear correction
Gaussian assumption
Accuracy saturates with ensemble size
Cannot handle nuisance parameters

CNF (Nonlinear)

\[ \widehat{\mathbf{x}}^{(m)}_k = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}\bigl(f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)});\,\mathbf{y}_k^{\text{obs}}\bigr) \]

Nonlinear composite mapping
No Gaussian assumption
Errors cancel in composition
Marginalizes over \(\boldsymbol{\kappa}\)
Works with small ensembles (\(M=128\))

Advantages of joint-to-conditional:

Improved sample quality with relaxed training convergence
No reliance on external reference distribution
Each predicted state is one-to-one tied to its corrected state

Part IV: Algorithms

Algorithm 1: Forecast, Training & Analysis

Procedure Forecast(\(p(\boldsymbol{\kappa}), p(\boldsymbol{\epsilon}_k), \{\widehat{\mathbf{x}}_{k-1}^{(m)}\}_{m=1}^M\)):

for \(m = 1\) to \(M\) do

Sample from transition density: \(\mathbf{x}^{(m)}_k\sim p(\cdot\mid \widehat{\mathbf{x}}^{(m)}_{k-1})\) via dynamics simulations

Sample from likelihood: \(\mathbf{y}^{(m)}_k\sim p(\cdot\mid \mathbf{x}^{(m)}_{k})\) via observation simulations

return \(\{(\mathbf{x}_k^{(m)},\mathbf{y}_k^{(m)})\}_{m=1}^{M}\)

Procedure Training(\(\{(\mathbf{x}_k^{(m)},\mathbf{y}_k^{(m)})\}_{m=1}^{M}\)):

\(\displaystyle\widehat{\boldsymbol{\phi}}_k = \arg\min_{\boldsymbol{\phi}} \frac{1}{M}\sum_{m=1}^M \left(\frac{1}{2}\|f_{\boldsymbol{\phi}}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)})\|_2^2 - \log|\det(\mathbf{J}^{(m)}_{f_{\boldsymbol{\phi}}})|\right)\)

return \(\widehat{\boldsymbol{\phi}}_k\)

Procedure Analysis(\(\mathbf{y}^{\text{obs}}_{k}, \widehat{\boldsymbol{\phi}}_k, \{(\mathbf{x}_k^{(m)},\mathbf{y}_k^{(m)})\}_{m=1}^{M}\)):

for \(m = 1\) to \(M\) do

Compute latent variables: \(\mathbf{z}^{(m)}_k = f_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{x}_k^{(m)};\mathbf{y}_k^{(m)})\)

Apply prior-to-posterior map: \(\widehat{\mathbf{x}}_k^{(m)} = f^{-1}_{\widehat{\boldsymbol{\phi}}_k}(\mathbf{z}_k^{(m)};\mathbf{y}_k^{\text{obs}})\)

return \(\{\widehat{\mathbf{x}}^{(m)}_k\}_{m=1}^M\)

Algorithm 2: Sequential SBI

Require: Datasets \(\mathbf{y}_k^{\text{obs}}\), collected sequentially at times \(k=1,\ldots,K\)

Output: Estimated ensemble \(\{\widehat{\mathbf{x}}^{(m)}_k\}_{m=1}^M\) for \(k=1,\ldots,K\)

Initialize: For \(m = 1\) to \(M\): sample \(\widehat{\mathbf{x}}^{(m)}_0 \sim p(\mathbf{x}_0)\)
For \(k = 1\) to \(K\):
- \(\{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^{M} =\) Forecast\((p(\boldsymbol{\kappa}), p(\boldsymbol{\epsilon}_k), \{\widehat{\mathbf{x}}_{k-1}^{(m)}\}_{m=1}^M)\)
- \(\widehat{\boldsymbol{\phi}}_k =\) Training\((\{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^{M})\)
- Collect field data \(\mathbf{y}_k^{\text{obs}}\)
- \(\{\widehat{\mathbf{x}}_k^{(m)}\}_{m=1}^M =\) Analysis\((\mathbf{y}^{\text{obs}}_{k}, \widehat{\boldsymbol{\phi}}_k, \{(\mathbf{x}_k^{(m)}, \mathbf{y}_k^{(m)})\}_{m=1}^{M})\)

This procedure implements neural sequential Bayesian inference – the basis of our Digital Shadow.

Part V: Setup & Validation

Probabilistic Baseline Model

Baseline seismic data (SNR \(12\) dB) processed with WISE (Yin et al. 2024)
Posterior velocity samples \(\mathbf{v}_0 \sim p(\mathbf{v}_0 \mid \mathbf{y}_0^{\text{obs}})\) converted to permeability \(\boldsymbol{\kappa} \sim p(\boldsymbol{\kappa})\)
High variability reflects epistemic uncertainty in reservoir properties
Study area: South-Western North Sea, Compass model (E. Jones et al. 2012)

Ground-Truth Simulations

Ground-truth states: saturation (left), pressure perturbation (center), seismic observations (right) for \(k=1\cdots 4\)

Simulation parameters:

Grid: \(512 \times 256\), spacing \(6.25\) m
24 timesteps of 80 days each
4 seismic surveys at 480-day intervals
Injection: \(0.05\,\text{m}^3/\text{s}\) at \(1200\) m depth
Total: \(5.8\) Mt CO₂ (\(1.1\) Mt/year)

Software stack:

Flow: JutulDarcy.jl (Møyner et al. 2023)
Seismic: JUDI.jl (Witte et al. 2019)
Networks: InvertibleNetworks.jl (Orozco, Witte, et al. 2024)

Forecast Ensemble Variability

Predicted state samples at \(k=1\) showing variability from different permeability realizations

Different permeability realizations \(\rightarrow\) vastly different CO₂ plume shapes
Both saturation and pressure perturbations vary significantly
This variability is essential for training the CNFs

Impact of the Analysis Step

Corrections by the Analysis step at \(k=3\): individual samples and conditional means

Top row: CO₂ saturation corrections
Bottom row: Pressure perturbation corrections
Conditioning on observed data reduces sample variation and produces more resolved estimates

Impact on Errors and Uncertainty

Errors and standard deviations before and after the Analysis step at \(k=3\)

Errors relative to ground truth are significantly reduced after correction
Uncertainty (conditional standard deviation) tightens during the Analysis step
This validates the prior-to-posterior mapping

Part VI: Multimodal Case Study

Experimental Setup

Three conditioning scenarios compared:

Modality	Data used
Wells only	Saturation + pressure at well locations
Seismic only	Time-lapse migrated seismic images
Wells + Seismic	Both modalities combined

Ensemble parameters:

\(M = 128\) ensemble members
120 training / 8 validation / 1 test (ground-truth)
120 training epochs, Adam optimizer, LR \(= 5 \times 10^{-4}\)
Noise magnitude schedule: \(0.1 \rightarrow 0.005\)

Saturation Results: Timestep \(k=1\)

Posterior mean, errors, and standard deviations at \(k=1\) for four scenarios

Saturation Results: Timestep \(k=2\)

Posterior mean, errors, and standard deviations at \(k=2\)

Saturation Results: Timestep \(k=3\)

Posterior mean, errors, and standard deviations at \(k=3\)

Saturation Results: Timestep \(k=4\)

Posterior mean, errors, and standard deviations at \(k=4\)

Warning

Well-only data fails to capture the critical leftward protruding plume feature that breaches containment!

Saturation Evolution: Unconditioned

Unconditioned saturation evolution

Saturation Evolution: Wells vs. Seismic

Wells Conditioned

Seismic Conditioned

Saturation Evolution: Combined Conditioning

Seismic + Wells Conditioned saturation evolution

Pressure Evolution: Unconditioned

Unconditioned pressure evolution

Pressure Evolution: Wells vs. Seismic

Wells Conditioned

Seismic Conditioned

Pressure Evolution: Combined Conditioning

Seismic + Wells Conditioned pressure evolution

Part VII: Quantitative Results

Performance Metrics

Key observations:

Conditioning on time-lapse data improves all metrics
Well-only corrections are minor (orange)
Seismic (blue) and combined (green) yield substantial improvements
Quality deteriorates between surveys but recovers after Analysis
Relative errors remain flat or decrease with multimodal conditioning

Metric Definitions

SSIM-Error: \(1 - \text{SSIM}(\mathbf{x}^\ast_k, \bar{\widehat{\mathbf{x}}}_k)\)

RMSE: \[ \text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N\bigl(\mathbf{x}^\ast[i] - \bar{\widehat{\mathbf{x}}}[i]\bigr)^2} \]

Relative RMSE: \[ \Delta\text{RMSE}_k = \frac{\text{RMS}(\mathbf{e}_k)}{\text{RMS}(\mathbf{x}^\ast_k)} \]

Mean Standard Deviation: \[ \bar{\sigma} = \frac{1}{N}\sum_{i=1}^N \bar{\widehat{\boldsymbol{\sigma}}}[i] \]

Relative STD: \[ \Delta\text{STD}_k = \frac{\text{mean}(\bar{\widehat{\boldsymbol{\sigma}}}_k)}{\text{RMS}(\bar{\widehat{\mathbf{x}}}_k)} \]

These normalizations account for biases from plume growth over time.

Training Diagnostics

Loss decreases smoothly for both training and validation
SSIM improves steadily
No sign of overfitting

Uncertainty Calibration

Calibration: binned MSE vs. posterior deviation

Calibration test (Guo et al. 2017):

Group deviations into \(L\) bins \(B_l\)
For each bin compute:

\[ \text{UQ}(B_l) = \frac{1}{|B_l|}\sum_{i \in B_l} \bar{\widehat{\boldsymbol{\sigma}}}[i] \]

\[ \text{MSE}(B_l) = \frac{1}{|B_l|}\sum_{i \in B_l} (\mathbf{x}^\ast[i] - \widehat{\mathbf{x}}[i])^2 \]

Good calibration \(\rightarrow\) points along \(y = x\)
Calibration degrades at later timesteps as plume grows

Part VIII: 3D Extension & Discussion

Scaling to 3D: Ground Truth & Seismic

\(3.2 \times 3.2 \times 3.2\) km\(^3\) volume, \(128^3\) grid, 16000 receivers, 16 sources (Gahlot, Orozco, & Herrmann 2025)

Scaling to 3D: Forecast vs. Posterior

Scaling to 3D: Error & Uncertainty

Key Findings

Unknown high-permeability streak undetected by well-only data, but captured by seismic – critical for containment monitoring
Multimodal data (seismic + wells) produces the best estimates across all metrics and timesteps
Uncertainty calibration: pointwise standard deviation correlates with errors relative to ground truth
Nonlinear mapping via CNFs handles complex non-Gaussian posterior distributions that EnKF cannot capture
Small ensemble size (\(M=128\)) sufficient thanks to joint-to-conditional mapping
Scalable to 3D via physics-based summary statistics (RTM) and memory-efficient invertible networks

Open-Source Software Stack

All results are fully reproducible:

Component	Package	Purpose
Reservoir simulation	JutulDarcy.jl	Multiphase flow
Seismic modeling	JUDI.jl	Wave simulations & RTM
Neural networks	InvertibleNetworks.jl	Conditional normalizing flows
Full pipeline	Twin4GCS.jl	Digital Shadow implementation

Data available at: doi.org/10.5281/zenodo.15204624 (Gahlot, Orozco, Yin, et al. 2025a)

Conclusions & Future Work

Conclusions

Introduced an uncertainty-aware Digital Shadow for underground CO₂ storage monitoring
Combines high-fidelity simulations with neural data assimilation via Simulation-Based Inference
Nonlinear prior-to-posterior mapping via CNFs extends Ensemble Kalman Filtering to handle:
- Non-Gaussian statistics
- Stochastic nuisance parameters (permeability)
- Nonlinear dynamics and observations
Multimodal conditioning (seismic + wells) produces the most accurate state estimates
Seismic data is essential for detecting critical flow pathways that wells alone miss
First proof-of-concept for an uncertainty-aware, scalable Digital Shadow for GCS
Lays the foundation for a Digital Twin to optimize underground storage operations

Future Work

Methodological extensions:

Data augmentation with alternative rock-physics models (Gahlot & Herrmann 2025)
Sensitivity-aware amortized inference (Elsemüller et al. 2024)
Fusion of continuous well data with sparse seismic (Schmitt et al. 2023)
Neural surrogates (FNOs) for flow simulations (Grady et al. 2023)
Non-replicated sparse seismic surveys (Deng et al. 2025)

Towards a Digital Twin:

Optimized operations: maximize injectivity while avoiding fracture pressure (Gahlot et al. 2024)
Adaptive surveys: trigger follow-up when injection deviates from expectations
Optimal monitoring: select well locations to reduce uncertainty most (Orozco, Gahlot, et al. 2024)
Include geomechanical, chemical & elastic effects
Permeability inference from time-lapse data

From Digital Shadow to Digital Twin

The Digital Shadow provides state estimation. A full Digital Twin adds control and decision-making (Gahlot et al. 2024):

Digital Twin lifecycle: training, inference, and decision-making

Key question: How do we maximize CO₂ injectivity while avoiding fracture pressure?

Digital Twin with Experimental Design

Digital Twin with experimental design: forecasted plumes, synthetic observations, trained inference network with optimal well density, and posterior inference from field data

Bayesian Experimental Design

Solution: Choose monitoring well locations \(\mathbf{d}^\ast\) that maximize the Expected Information Gain (EIG) (Orozco, Gahlot, et al. 2024):

\[ \mathbf{d}^\ast = \arg\max_{\mathbf{d}} \;\mathbb{E}_{p(\mathbf{y} \mid \mathbf{d})}\bigl[\mathbb{KL}\bigl(p(\mathbf{x} \mid \mathbf{y}, \mathbf{d})\,\|\,p(\mathbf{x})\bigr)\bigr] \]

Key insight: Maximizing the expected posterior log-density (our CNF training objective) is equivalent to maximizing the EIG:

\[ \max_{\mathbf{d}}\;\mathbb{E}_{p(\mathbf{x},\mathbf{y} \mid \mathbf{d})}\bigl[\log p_{\boldsymbol{\phi}}(\mathbf{x} \mid \mathbf{y}, \mathbf{d})\bigr] \;\Longleftrightarrow\; \max_{\mathbf{d}}\;\text{EIG}(\mathbf{d}) \]

This means we can jointly optimize the CNF parameters \(\boldsymbol{\phi}\) and the experimental design \(\mathbf{d}\)!

Optimal Well Placement: Results

Selected well locations (red) vs. optimal (black)

Optimized vs. random well placement: RMSE comparison

Red crosses: optimized placement consistently outperforms random (blue boxes)
Improvement grows with number of monitoring timesteps

The Control Problem – Many thanks to Haoyun Li

Goal: Maximize cumulative CO₂ injection while ensuring reservoir pressure does not exceed fracture pressure \(p_{\text{frac}}\) (Gahlot et al. 2024).

\[ \begin{aligned} \max_{q(t)} \quad & \int_{0}^{T} q(t)\,\mathrm{d}t \\ \text{subject to}\quad & \widehat{\text{POF}}_{\tau}(q) \le \varepsilon \quad \text{or} \quad \widehat{\text{CVaR}}_{\alpha}(\boldsymbol{\mathcal{L}}(q)) \le \gamma, \end{aligned} \]

where either the POF or the CVaR constraint is active (not both simultaneously), \(\varepsilon\) is the maximum allowable exceedance probability, and \(\gamma\) is the maximum allowable tail severity.

Because the subsurface parameters \(\mathbf{Z}\) are random, the constraint \(g(q, \mathbf{Z}) \leq 0\) cannot be enforced deterministically – we need risk measures.

Controlled Injectivity: Comparison

Non-fracture vs. fracture scenarios: safety margin (top), pressure perturbation (middle), and CO₂ saturation (bottom) at \(t=480\) days

Risk Measures: POF, Quantile & CVaR

Relationship between POF, Quantile, and CVaR

Probability of Failure (POF)

The Probability of Failure quantifies the chance that the constraint is violated:

\[ \text{POF}_{\epsilon}(\mathbf{d}) = \mathbb{P}\bigl[g(\mathbf{d}, \mathbf{Z}) > 0\bigr] \leq \epsilon \]

One-line: the probability that pressure exceeds the fracture threshold
Parameterized by tolerance \(\epsilon\) (e.g., \(\epsilon = 0.05\) means \(\leq 5\%\) failure probability)

Quantile (Value-at-Risk)

The Quantile (or Value-at-Risk) at level \(\alpha\) is the threshold below which a fraction \(\alpha\) of outcomes fall:

\[ Q_{\alpha}(\mathbf{d}) = \inf\bigl\{t : \mathbb{P}\bigl[g(\mathbf{d}, \mathbf{Z}) \leq t\bigr] \geq \alpha\bigr\} \]

One-line: the worst-case constraint value at confidence level \(\alpha\)
The POF constraint \(\text{POF}_\epsilon \leq \epsilon\) is equivalent to requiring \(Q_{1-\epsilon} \leq 0\)

Conditional Value-at-Risk (CVaR)

CVaR (also called Expected Shortfall) averages over the worst outcomes beyond the quantile:

\[ \text{CVaR}_{\gamma}(\mathbf{d}) = \mathbb{E}\bigl[g(\mathbf{d}, \mathbf{Z}) \mid g(\mathbf{d}, \mathbf{Z}) \geq Q_{\gamma}(\mathbf{d})\bigr] \]

One-line: the expected severity of constraint violations in the worst \((1-\gamma)\) fraction of scenarios
Parameterized by \(\gamma\) (e.g., \(\gamma = 0.95\) averages over the worst \(5\%\))
More conservative than POF: penalizes not just the probability but also the magnitude of failure

Fracture Probability vs. Injectivity

Empirical CDF of fracture probability vs. injection rate for POF (\(\varepsilon=0.0\), \(\varepsilon=0.01\)) and CVaR (\(\gamma=0.1\), \(\alpha=0.01\)) constraints with bootstrap confidence intervals

CVaR constraint yields higher optimal injection rates while maintaining safety
All constrained approaches keep fracture probability below the \(1\%\) threshold

Controlled Injection: POF vs. CVaR

Controlled vs. Uncontrolled Injection

Thank You!

Paper: Gahlot, Orozco, Yin, Bruer & Herrmann (2025). “An uncertainty-aware Digital Shadow for underground multimodal CO₂ storage monitoring.” Geophysical Journal International, 242(1), ggaf176.

Code: github.com/slimgroup/Twin4GCS.jl

Data: doi.org/10.5281/zenodo.15204624

Group website: slim.gatech.edu

Acknowledgements:

Georgia Research Alliance & ML4Seismic Center partners
US National Science Foundation grant OAC 2203821

References

Alsing, J., Wandelt, B. & Feeney, S., 2018. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology. Monthly Notices of the Royal Astronomical Society, 477, 2874–2885, Oxford University Press.

Asch, M., 2022. A toolbox for digital twins: From model-based to data-driven, SIAM.

Avseth, P., Mukerji, T. & Mavko, G., 2010. Quantitative seismic interpretation: Applying rock physics tools to reduce interpretation risk, Cambridge university press.

Bui, M., Adjiman, C.S., Bardow, A., Anthony, E.J., Boston, A., Brown, S., Fennell, P.S., et al., 2018. Carbon capture and storage (CCS): The way forward. Energy & Environmental Science, 11, 1062–1176, Royal Society of Chemistry.

Cranmer, K., Brehmer, J. & Louppe, G., 2020. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117, 30055–30062. doi:10.1073/pnas.1912789117

Deng, Z., Orozco, R., Gahlot, A.P. & Herrmann, F.J., 2025. Probabilistic joint recovery method for CO2 plume monitoring. arXiv preprint arXiv:2501.18761.

E. Jones, C., A. Edgar, J., I. Selvage, J. & Crook, H., 2012. Building complex synthetic models to evaluate acquisition geometries and velocity inversion technologies. In 74th EAGE Conference and Exhibition Incorporating EUROPEC 2012, cp–293, European Association of Geoscientists & Engineers. doi:https://doi.org/10.3997/2214-4609.20148575

Elsemüller, L., Olischläger, H., Schmitt, M., Bürkner, P.-C., Koethe, U. & Radev, S.T., 2024. Sensitivity-aware amortized bayesian inference. Transactions on Machine Learning Research. Retrieved from https://openreview.net/forum?id=Kxtpa9rvM0

Gahlot, A.P. & Herrmann, F.J., 2025. Enhancing robustness of digital shadow for CO2 storage monitoring with augmented rock physics modeling. arXiv preprint arXiv:2502.07171.

Gahlot, A.P., Li, H., Yin, Z., Orozco, R. & Herrmann, F.J., 2024. A digital twin for geological carbon storage with controlled injectivity. arXiv preprint arXiv:2403.19819.

Gahlot, A.P., Orozco, R. & Herrmann, F.J., 2025. Advancing geological carbon storage monitoring with 3D digital shadow technology. arXiv preprint arXiv:2502.07169.

Gahlot, A.P., Orozco, R., Yin, Z., Bruer, G. & Herrmann, F., 2025, April. Dataset - digital twin for geological carbon storage monitoring, Zenodo. doi:10.5281/zenodo.15204624

Gahlot, A.P., Orozco, R., Yin, Z., Bruer, J. & Herrmann, F.J., 2025. An uncertainty-aware digital shadow for underground multimodal CO\(_2\) storage monitoring. Geophysical Journal International, 242, ggaf176. doi:10.1093/gji/ggaf176

Grady, T.J., Khan, R., Louboutin, M., Yin, Z., Witte, P.A., Chandra, R., Hewett, R.J., et al., 2023. Model-parallel fourier neural operators as learned surrogates for large-scale parametric PDEs. Computers & Geosciences, 178, 105402, Elsevier.

Guo, C., Pleiss, G., Sun, Y. & Weinberger, K.Q., 2017. On calibration of modern neural networks. International conference on machine learning, pp. 1321–1330, PMLR.

IPCC (Intergovernmental Panel on Climate Change), 2018. Global warming of 1.5° c. An IPCC special report on the impacts of global warming of 1.5° c above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty, ipcc Geneva.

IPCC special report, 2005. IPCC special report on carbon dioxide capture and storage, Cambridge: Cambridge University Press.

IPCC special report, 2018. Global warming of 1.5 c. An IPCC Special Report on the impacts of global warming of, 1, 43–50.

Ketzer, J.M., Iglesias, R.S. & Einloft, S., 2012. Reducing greenhouse gas emissions with CO\(_{2}\) capture and geological storage. in Handbook of climate change mitigation, pp. 1405–1440, Springer US. doi:10.1007/978-1-4419-7991-9_37

Møyner, O., Johnsrud, M., Nilsen, H.M., Raynaud, X., Lye, K.O. & Yin, Z., 2023, April. Sintefmath/jutul.jl: v0.2.6, Zenodo. doi:10.5281/zenodo.7855605

Orozco, R., Gahlot, A. & Herrmann, F.J., 2024. BEACON: Bayesian experimental design acceleration with conditional normalizing flows \(-\) a case study in optimal monitor well placement for CO \(\_2\) sequestration. arXiv preprint arXiv:2404.00075.

Orozco, R., Siahkoohi, A., Rizzuti, G., Leeuwen, T. van & Herrmann, F.J., 2023. Adjoint operators enable fast and amortized machine learning based bayesian uncertainty quantification. Medical imaging 2023: Image processing, Vol. 12464, pp. 357–367, SPIE.

Orozco, R., Witte, P., Louboutin, M., Siahkoohi, A., Rizzuti, G., Peters, B. & Herrmann, F.J., 2024. InvertibleNetworks.jl: A julia package for scalable normalizing flows. Journal of Open Source Software, 9, 6554, The Open Journal. doi:10.21105/joss.06554

Radev, S.T., Mertens, U.K., Voss, A., Ardizzone, L. & Köthe, U., 2020. BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE transactions on neural networks and learning systems, 33, 1452–1466, IEEE.

Ramgraber, M., Baptista, R., McLaughlin, D. & Marzouk, Y., 2023. Ensemble transport smoothing. Part II: Nonlinear updates. Journal of Computational Physics: X, 17, 100133, Elsevier.

Ringrose, P., 2020. How to store CO\(_{2}\) underground: Insights from early-mover CCS projects, Vol. 129, Springer.

Ringrose, P., 2023. Storage of carbon dioxide in saline aquifers: Building confidence by forecasting and monitoring, Society of Exploration Geophysicists.

Schmitt, M., Radev, S.T. & Bürkner, P.-C., 2023. Fuse it or lose it: Deep fusion for multimodal simulation-based inference. arXiv preprint arXiv:2311.10671.

Spantini, A., Baptista, R. & Marzouk, Y., 2022. Coupling techniques for nonlinear ensemble filtering. SIAM Review, 64, 921–953, SIAM.

Witte, P.A., Louboutin, M., Kukreja, N., Luporini, F., Lange, M., Gorman, G.J. & Herrmann, F.J., 2019. A large-scale framework for symbolic implementations of seismic inversion algorithms in julia. Geophysics, 84, F57–F71. doi:10.1190/geo2018-0174.1

Yin, Z., Orozco, R., Louboutin, M. & Herrmann, F.J., 2024. WISE: Full-waveform variational inference via subsurface extensions. Geophysics, 89, 1–31, Society of Exploration Geophysicists.

An Uncertainty-Aware Digital Shadow for Underground CO2 Storage Monitoring

An Uncertainty-Aware Digital Shadow for Underground CO2 Storage Monitoring

Outline

Part I: Motivation

The Need for Geological Carbon Storage

The Challenge: Unknown Subsurface Properties

Digital Twin for Enhanced Oil Recovery

From Digital Shadows to Digital Twins

Part II: Problem Formulation

State Dynamics Model

Observation Model

Nonlinear Bayesian Filtering

Why Not Standard EnKF?

Part III: Methodology

Lifecycle of the Digital Shadow

Lifecycle – Step by Step

Imaging-Based Summary Statistics

Rock Physics & Reference Models

Training the CNF

Derivation of the Training Loss

Posterior Sampling

Linear vs. Nonlinear Prior-to-Posterior Mapping

EnKF (Linear)

CNF (Nonlinear)

Part IV: Algorithms

Algorithm 1: Forecast, Training & Analysis

Algorithm 2: Sequential SBI

Part V: Setup & Validation

Probabilistic Baseline Model

Ground-Truth Simulations

Forecast Ensemble Variability

Impact of the Analysis Step

Impact on Errors and Uncertainty

Part VI: Multimodal Case Study

Experimental Setup

Saturation Results: Timestep \(k=1\)

Saturation Results: Timestep \(k=2\)

Saturation Results: Timestep \(k=3\)

Saturation Results: Timestep \(k=4\)

Saturation Evolution: Unconditioned

Saturation Evolution: Wells vs. Seismic

Wells Conditioned

Seismic Conditioned

Saturation Evolution: Combined Conditioning

Pressure Evolution: Unconditioned

Pressure Evolution: Wells vs. Seismic

Wells Conditioned

Seismic Conditioned

Pressure Evolution: Combined Conditioning

Part VII: Quantitative Results

Performance Metrics

Metric Definitions

Training Diagnostics

Uncertainty Calibration

Part VIII: 3D Extension & Discussion

Scaling to 3D: Ground Truth & Seismic

Scaling to 3D: Forecast vs. Posterior

Scaling to 3D: Error & Uncertainty

Key Findings

Open-Source Software Stack

Conclusions & Future Work

Conclusions

Future Work

From Digital Shadow to Digital Twin

Digital Twin with Experimental Design

Bayesian Experimental Design

Optimal Well Placement: Results

The Control Problem – Many thanks to Haoyun Li

Controlled Injectivity: Comparison

Risk Measures: POF, Quantile & CVaR

Probability of Failure (POF)

Quantile (Value-at-Risk)

Conditional Value-at-Risk (CVaR)

Fracture Probability vs. Injectivity

Controlled Injection: POF vs. CVaR

Controlled vs. Uncontrolled Injection

Thank You!

References

An Uncertainty-Aware Digital Shadow for Underground CO₂ Storage Monitoring

An Uncertainty-Aware Digital Shadow for Underground CO₂ Storage Monitoring