Learned vs Innate Tolerance for Incorrect Perspective

A factorial framework decomposing perspective tolerance in vision architectures into innate (architectural) and learned (training) components across 288 experimental conditions.

Category: Computer Vision Track: Research 6 Architectures 4 Distortion Types 288 Conditions
0.6432
Highest innate tolerance (ResNet-50) -- retains ~64% of base accuracy without perspective-diverse training
r = -0.9937
Near-perfect negative correlation between spatial invariance score and learned fraction
22.59%
MLP-Mixer-B learned fraction -- highest reliance on training data diversity among all architectures
288
Total experimental conditions: 6 architectures x 2 regimes x 4 distortions x 6 severity levels

Problem Statement

Modern vision architectures vary dramatically in their ability to recognize objects under perspective distortions, yet the source of this tolerance remains poorly understood.

Two broad sources of perspective tolerance exist: innate tolerance arising from architectural design choices (e.g., convolutional weight sharing, pooling hierarchies) and learned tolerance acquired through training on perspective-diverse data. Prior studies typically conflate these two sources, making it difficult to attribute observed robustness. This work disentangles these contributions through a controlled factorial experimental design.

Methodology

Factorial design with two training regimes (diverse vs. restricted) across calibrated perspective distortions.

τtotal = τinnate + τlearned

Tilt

Rotation around horizontal axis up to 60 degrees, simulating looking up/down at objects.

Pan

Rotation around vertical axis up to 60 degrees, simulating lateral viewpoint change.

Off-axis

Translation of the principal point, simulating objects at the periphery of the field of view.

Combined

Composition of all three primitives, representing worst-case compound distortion.

Architectures Evaluated

ArchitectureFamilySpatial Inv. Score (σ)DepthParams (M)
ResNet-50Convolutional0.725025.6
ConvNeXt-TConvolutional0.682828.6
ViT-B/16Attention0.451286.6
DeiT-SAttention0.481222.1
Swin-TAttention0.612428.3
MLP-Mixer-BMLP0.351259.9

Accuracy Degradation Curves

How accuracy drops with increasing distortion severity. Solid = diverse training, dashed = restricted training.

Tolerance Decomposition

Stacked innate and learned components per architecture, averaged across all severity levels and distortion types.

Innate vs Learned Tolerance

Learned Fraction by Architecture

Spatial Invariance vs Learned Fraction

A near-perfect negative correlation (r = -0.9937) reveals a fundamental capacity-data tradeoff.

Correlation: σ vs φ (r = -0.9937)

Family Comparison: Mean Tolerance

Analysis by Distortion Type

Off-axis distortions are best tolerated innately. Combined distortions yield the highest learned fraction.

Innate vs Learned by Distortion Type

Severity-Dependent Gap (Combined Distortion)

Learned Fraction Heatmap

Learned fraction at high severity (s >= 0.6) across all architecture-distortion combinations. Darker shading indicates greater dependence on diverse training.

Data Tables

Complete numerical results from all experimental conditions.

Tolerance Decomposition by Architecture

ArchitectureFamilyστinnateτlearnedφ (Learned Frac.)
ResNet-50Conv0.720.6432 ± 0.13070.1197 ± 0.035716.46%
ConvNeXt-TConv0.680.6322 ± 0.13000.1208 ± 0.031316.80%
ViT-B/16Attention0.450.5658 ± 0.14310.1377 ± 0.043220.78%
DeiT-SAttention0.480.5759 ± 0.13940.1335 ± 0.033719.90%
Swin-TAttention0.610.6171 ± 0.14220.1310 ± 0.042118.50%
MLP-Mixer-BMLP0.350.5405 ± 0.13830.1475 ± 0.041722.59%

Tolerance Decomposition by Distortion Type

Distortionτinnateτlearnedφ (Learned Frac.)
Tilt0.6148 ± 0.13440.1329 ± 0.036618.67%
Pan0.5823 ± 0.13670.1324 ± 0.038919.57%
Off-axis0.6509 ± 0.12990.1299 ± 0.042817.40%
Combined0.5352 ± 0.14220.1317 ± 0.039021.04%

Base Accuracy (Severity = 0, Diverse Training)

ArchitectureBase Accuracy (a0)
ResNet-500.7640
ConvNeXt-T0.8173
ViT-B/160.8101
DeiT-S0.7980
Swin-T0.8211
MLP-Mixer-B0.7480

Detailed Results: Combined Distortion

ArchitectureSeverityAcc. (Diverse)Acc. (Restricted)τinnateτlearnedφ