Stabilizing Entropy Regularization in RLVR Training

Comparing entropy control strategies: dynamics, stability, and accuracy

Entropy Dynamics

Accuracy During Training

Strategy Comparison: Stability vs Accuracy

Entropy Coefficient Evolution