Investigating whether the disconnect between weight-space and activation-space orthogonality persists across model scales and architectural variants in Mixture-of-Experts models.
| Model Dim | Expert Params | Weight MSO | Activation MSO | Gap | Ratio (Act/Weight) |
|---|---|---|---|---|---|
| 32 | 65K | 2.40e-4 | 2.24e-2 | 0.0222 | 93x |
| 64 | 262K | 6.27e-5 | 1.57e-2 | 0.0156 | 250x |
| 128 | 1.0M | 1.46e-5 | 7.89e-3 | 0.0079 | 540x |
| 256 | 4.2M | 4.29e-6 | 3.65e-3 | 0.0036 | 851x |