Post-Training Regression and Generalization Gap

-0.041

Wt Exfil Regression

0.08

Toxicity->Exfil Transfer

+0.359

Toxicity Improvement

+0.150

Mitigation Recovery

Domain	Pre	Post	Change
Toxicity Refusal	0.550	0.909	+0.359
Jailbreak Resistance	0.500	0.809	+0.309
Deception Avoidance	0.720	0.779	+0.059
Sycophancy Resistance	0.620	0.649	+0.029
Power-Seeking Refusal	0.680	0.659	-0.021
Weight Exfil. Refusal	0.650	0.609	-0.041

Post-Training Misalignment Regression and Generalization Gaps