Noise-Expectation vs Gradient-Expectation Objectives

Formal relationship and unified formulation for diffusion policies in online RL

0.78
Avg Alignment
15-40%
Variance Reduction
0.5
Optimal alpha
4
Q-Functions Tested

Key Findings

Gradient Alignment

Variance Comparison

Unified Objective Variance

Temperature Sensitivity