Noise-Expectation vs Gradient-Expectation Objectives
Formal relationship and unified formulation for diffusion policies in online RL
0.78
Avg Alignment
15-40%
Variance Reduction
0.5
Optimal alpha
4
Q-Functions Tested
Key Findings
Both objectives estimate the Boltzmann score function through complementary mechanisms
High alignment (cosine sim > 0.7) at moderate temperatures (0.5-2.0)
Complementary variance profiles: noise-exp better for smooth Q, grad-exp better for multimodal
Unified control-variate formulation achieves 15-40% variance reduction
Gradient Alignment
Variance Comparison
Q-Function:
Quadratic
Bimodal
Multimodal
Linear
Unified Objective Variance
Temperature Sensitivity