Parameterization Ambiguity in Idealized Autoregressive Transformers

Systematic computational study of continuous symmetries, solution manifold geometry, sparsity, and algorithmic multiplicity in transformer models.

320
Symmetry Upper Bound
4.0
Empirical Null-Space Dim
95%
Max Sparsity (Perfect Acc)
~0.0
Mean Cosine Similarity
20
Independent Models

Symmetry Scaling: Parameters vs. Symmetry Dimension

Ambiguity Ratio vs. Model Dimension

Null-Space Dimension Across Tasks

Sparsity vs. Accuracy (Copy-Last and XOR)

Overparameterization: Null-Space Dim vs. Model Size

PCA Variance Explained (20 Solutions)

Symmetry Group Dimensions Across Architectures

Config (d, L, H)Total ParamsQK SymVO SymMLP SymTotal SymRatio
d=8, L=1, H=18006464321600.200
d=8, L=1, H=2800323232960.120
d=16, L=1, H=23,200128128643200.100
d=16, L=2, H=26,2722562561286400.102
d=32, L=2, H=425,0885125122561,2800.051
d=64, L=4, H=8198,6562,0482,0481,0245,1200.026
d=128, L=6, H=81,187,84012,28812,2883,07227,6480.023
d=512, L=12, H=837,879,808393,216393,21624,576811,0080.021

Null-Space Analysis Summary

TaskConvergedTotal ParamsMean Null DimStdUpper Bound
Copy-Last (V=2,T=3)8/83,1364.00.0320
XOR (V=2,T=3)8/83,1364.00.0320
Copy-Last (V=2,T=4)8/83,13616.10.33320
Copy-Last (V=3,T=2)8/83,1680.00.0320