Planning Directly in Latent Action Space

Benchmarking Planners Across Latent Geometry, Dimension, and Horizon

cs.AI · Artificial Intelligence · Jan 2026 · arXiv: 2601.05230

Problem Statement

Develop methods to perform planning directly within the latent action space learned by latent action world models trained on in-the-wild videos. Construct sampling and optimization procedures that operate over continuous latent action vectors inferred by the inverse dynamics model, accounting for the geometry of sparsity- or noise-regularized latent actions to enable goal-directed sequence generation in latent space.

VAE Geometry Sparse-EBM Geometry VQ-VAE Geometry Diffusion Planner 5 Planners · 4 Dims · 4 Horizons

Planning Amenability by Geometry

Effective Dimension vs Ambient Dimension

Planner Comparison (d=8)

Dimension Scaling

VAE

Sparse-EBM

VQ-VAE

Horizon Scaling (VAE, d=8)

Trajectory Smoothness by Planner & Geometry

Key Findings

CEM achieves the lowest goal distance overall but generates jerky, non-smooth trajectories.
Diffusion planner produces the smoothest trajectories at the lowest computational cost (1,050 vs 2,000-3,250 function evaluations).
All planners degrade superlinearly with increasing latent dimension, highlighting the curse of dimensionality in latent planning.
VQ-VAE exhibits the highest planning amenability; Sparse-EBM has near-full-rank effective dimension, making it hardest to plan in.
Gradient-based methods (Gradient descent, SGLD) fail in discrete VQ-VAE geometries due to non-differentiable codebook lookups.