Stabilizing Entropy Regularization in RLVR Training
Comparing entropy control strategies: dynamics, stability, and accuracy
Entropy Dynamics
Strategy:
All Strategies
PID Control
Lagrangian Dual
Fixed Coefficient
Accuracy During Training
Strategy Comparison: Stability vs Accuracy
Entropy Coefficient Evolution