Weight-Activation Gap in MoE Models

-0.112

Pearson r (Weight vs Act MSO)

p = 0.596 (not significant)

50-90x

Act MSO / Weight MSO Ratio

Across all scales

5/5

Architectures with Gap

Universal presence

4.2M

Max Expert Params Tested

Gap persists at all scales

Regularization Scan: Weight vs Activation MSO

Model Dim	Expert Params	Weight MSO	Activation MSO	Gap	Ratio (Act/Weight)
32	65K	2.40e-4	2.24e-2	0.0222	93x
64	262K	6.27e-5	1.57e-2	0.0156	250x
128	1.0M	1.46e-5	7.89e-3	0.0079	540x
256	4.2M	4.29e-6	3.65e-3	0.0036	851x