Multi-Level Distributional Validation of Agent-Based Contact Tracing Against Empirical Epidemiological Data

A reusable validation framework comparing ABM-generated distributions to empirical reference data across contact network structure, CT process parameters, and aggregate outcomes.

Category: SO (Physics and Society) arXiv: 2601.14632

Validation Overview

2/5
Distributional Checks Passed
0.9856
Age-Mixing Cosine Similarity
0.55
Age-Mixing RMSE
5,000
Samples per Distribution
0.0809
Best KS (Daily Contacts)
1.0
Worst KS (Traced Fraction)

Problem Statement & Methods

The Validation Gap

Agent-based models (ABMs) of epidemic contact tracing rely on synthetic populations and assumed operational parameters, yet their CT processes are rarely validated against real-world epidemiological data. This framework addresses the open problem identified by Chae et al. (2026), who acknowledged that their ABM simulations could not be quantitatively validated against actual CT logs.

Three-Level Validation Framework

Level 1 -- Contact Network Structure: Daily contact degree distributions and age-mixing matrices compared against POLYMOD survey data (Mossong et al., 2008).

Level 2 -- CT Process Parameters: Notification delays, recall probabilities, and contacts per interview compared against KDCA and CDC operational data.

Level 3 -- Aggregate CT Outcomes: Overall fraction of contacts traced and epidemic trajectory metrics.

Statistical Tests: Kolmogorov-Smirnov (KS) statistic, Jensen-Shannon (JS) divergence, and Earth Mover Distance (EMD) with 95% bootstrap confidence intervals (1,000 resamples).

Interactive Validation Charts

KS Statistics by Distribution

JS Divergence by Distribution

EMD with 95% Bootstrap Confidence Intervals

Validation Coverage (1 - KS)

Daily Contact Distribution (POLYMOD Reference)

Contacts per Interview Distribution

Age-Mixing Matrix (POLYMOD Reference)

Mean Daily Contacts Between Age Groups

Cosine similarity: 0.9856 | RMSE: 0.55

0-17
18-34
35-64
65+
0-17
7.4
1.9
3.5
0.6
18-34
1.9
5.8
3.2
0.5
35-64
3.5
3.2
4.5
1.0
65+
0.6
0.5
1.0
2.2

Complete Validation Results

Level Distribution KS Stat JS Div EMD EMD CI Low EMD CI High Status
1 Daily contacts 0.0809 0.0044 2.3547 1.8996 2.8262 PASS
2 Notification delay 0.0382 0.0045 0.1128 0.0914 0.1363 PASS
2 Contacts per interview 0.3933 0.1948 4.3947 4.2387 4.5213 FAIL
2 Recall probability 0.1924 0.1475 0.1013 0.0960 0.1062 FAIL
3 Traced fraction 1.0 0.6931 3.6296 3.6267 3.6327 FAIL

Empirical Reference Data Summary

Distribution Source n Mean Median Std Dev Min Max
Daily contacts POLYMOD (NegBin, mean=13.4, disp=0.5) 5,000 13.11 6.0 18.71 0 216
Notification delay KDCA (Gamma, shape=2.5, scale=0.6) 5,000 1.50 1.29 0.97 0.02 8.13
Contacts per interview CDC (Poisson, lambda=5.0) 5,000 5.00 5.0 2.26 0 14
Recall probability Bi et al. (Beta, a=6, b=4) 5,000 0.60 0.61 0.15 0.14 0.97
Traced fraction Park et al. (Beta, a=12, b=7) 5,000 0.63 0.64 0.11 0.22 0.90

Surrogate ABM Configuration

ParameterValueDescription
Population size10,000Number of simulated agents
Mean contacts12.0Mean daily contacts per agent
Contact dispersion0.45Negative binomial overdispersion
Recall rate0.55Probability of recalling a contact
Notification delayGamma(2.0, 0.8)Days from confirmation to notification
Tracing success0.70Probability a notified contact is reached
R02.5Basic reproduction number
Infectious period7.0 daysMean infectious period
Simulation days90Duration of simulation
Total infections10,000Observed epidemic outcome
Peak daily cases1,332Maximum daily new infections

Key Findings