Unified Memory Management Without External Expert Models
Integrating STM/LTM control within a single LLM agent policy
0.682
Unified Policy TSR
2.1x
Cost Reduction
0.744
Memory Coherence
+7.4%
TSR Improvement
Task Success Rate
Inference Cost
TSR vs Conversation Length
Training Convergence
Key Findings
Unified policy achieves 0.682 TSR vs 0.635 for external expert (7.4% improvement).
Inference cost reduced from 154.3 to 73.5 FLOPs units (2.1x reduction).
Unified policy converges fastest in training (final loss 0.010).
Advantage increases with conversation length due to better long-horizon memory management.
Joint optimization of memory and task execution enables end-to-end deployment.