Generalization of the T2w VERIDAH Model Across MRI Scanners

A systematic study of cross-scanner generalization for vertebra segmentation and labeling, evaluating domain adaptation strategies across manufacturers and field strengths.

- Computer Vision Track

0.9101
Source Dice
0.5877
Avg Target Dice
73.79%
Gap Recovery (Hist+TTA)
0.8897
Fine-tuned Dice

Problem & Methods

The VERIDAH vertebra labeling model achieves high accuracy on T2-weighted TSE MRI from the NAKO cohort (Siemens 3T), but its generalization to scans from different MRI scanners remains unexplored. This study systematically evaluates cross-scanner generalization across three manufacturers (Siemens, GE, Philips) at two field strengths (1.5T, 3T), quantifies the domain gap, and compares five adaptation strategies.

Adaptation Strategies

Histogram Matching

Standardizes target intensity distributions to match the source domain for intensity normalization.

Test-Time Augmentation (TTA)

Averages predictions over geometric and intensity transformations at inference time.

Histogram + TTA

Combines preprocessing normalization with prediction averaging for optimal unsupervised performance.

Adversarial Adaptation

Trains a domain discriminator to learn scanner-invariant feature representations.

Fine-tuning

Updates model parameters using annotated target-domain data for maximum performance.

Cross-Scanner Direct Transfer

ScannerGapDiceID RateMSD (mm)
NAKO Siemens 3T0.0000.91010.96560.8433
Philips 3T0.1590.77170.84881.2657
GE 3T0.2490.69350.73681.5035
Siemens 1.5T0.3070.63810.72481.6215
GE 1.5T0.5470.43010.48962.2260
Philips 1.5T0.5710.40530.49842.2895

Adaptation Strategy Comparison

ScannerNoneHistogramTTAHist+TTAAdversarialFine-tuned
GE 1.5T0.42070.68900.63900.78480.76010.8813
GE 3T0.68850.80750.78560.85250.83940.8951
Philips 1.5T0.40450.68040.62920.77960.75650.8810
Philips 3T0.76600.84420.83030.87150.86340.8993
Siemens 1.5T0.63560.78430.75770.83970.82490.8919
Average0.58310.76110.72840.82560.80890.8897

Regional Analysis

ScannerCervicalThoracicLumbarSacral
NAKO Siemens 3T0.89150.90580.94490.9164
Philips 3T0.71990.78000.81920.7980
GE 3T0.62310.70970.74770.7208
Siemens 1.5T0.55380.65820.69940.6807
GE 1.5T0.30370.46830.50540.4788
Philips 1.5T0.26350.44860.49070.4517

Domain Shift Component Analysis

ComponentDomain GapDiceDice Drop
Resolution0.1990.73000.1751
Field Strength0.1500.77340.1317
Contrast0.1250.79460.1104
Noise0.1200.79900.1061
Manufacturer0.1000.81630.0888
Intensity Bias0.0800.83330.0718

Per-Vertebra Analysis (Philips 1.5T)

Sample Size for Effective Adaptation (GE 1.5T)

SamplesDiceImprovement
00.4248-
50.4858+0.0610
100.5421+0.1173
200.6231+0.1983
500.7592+0.3344
1000.8217+0.3969
2000.8342+0.4094

Key Findings

Significant performance degradation on target scanners. The average Dice across target scanners (0.5877) is 0.3224 points below the source domain (0.9101), confirming that single-cohort training does not guarantee cross-scanner reliability.
Field strength is a dominant factor. All 1.5T scanners show substantially worse performance than 3T scanners, with Dice scores 0.4053-0.6381 for 1.5T versus 0.6935-0.7717 for 3T targets.
Cervical vertebrae are most vulnerable. Cervical Dice drops to 0.2635 on Philips 1.5T (from 0.8915 on source), disproportionately affected due to smaller size and less distinctive morphology.
Resolution mismatch is the largest single factor. Spatial resolution differences cause a Dice drop of 0.1751, exceeding field strength (0.1317) and contrast (0.1104) variations.
Lightweight adaptation is highly effective. Histogram matching + TTA recovers 73.79% of the performance gap (Dice from 0.5877 to 0.8256) without any model retraining.
Moderate annotation suffices for fine-tuning. On GE 1.5T, 50 annotated target-domain samples improve Dice from 0.4248 to 0.7592, with diminishing returns beyond 100 samples.