Comparing five training methodologies for continuous normalizing flow policies in maximum entropy RL.
| Method | Quality | Energy Dist | MMD | ESS | Loss | Conv. Iter |
|---|---|---|---|---|---|---|
| RFM | 0.9642 | 0.0508 | 0.0146 | 0.9298 | 0.0446 | 57 |
| SFT | 0.7966 | 0.4663 | 0.0746 | 0.5966 | 0.0840 | 101 |
| KL | 0.7712 | 0.5070 | 0.0672 | 0.5758 | 0.1260 | 126 |
| VFM | 0.8589 | 0.2919 | 0.0371 | 0.6970 | 0.0694 | 84 |
| Diffusion | 0.7963 | 0.4013 | 0.0588 | 0.6589 | 0.0700 | 87 |
| Method | a=0.1 | a=0.5 | a=1.0 | a=2.0 | a=5.0 |
|---|---|---|---|---|---|
| RFM | 0.919 | 0.986 | 0.978 | 0.941 | 0.938 |
| SFT | 0.752 | 0.777 | 0.808 | 0.841 | 0.782 |
| KL | 0.726 | 0.703 | 0.794 | 0.751 | 0.719 |
| VFM | 0.814 | 0.819 | 0.827 | 0.867 | 0.805 |
| Diffusion | 0.751 | 0.824 | 0.857 | 0.847 | 0.780 |