Optimization Landscape & Feasibility in Riemannian AmbientFlow

An empirical investigation of the open problem: which local minima are reached, and do the recoverability theorem's feasibility assumptions hold at those minima?

cs.LG - Machine Learning Based on arXiv:2601.18728

Problem Statement

Riemannian AmbientFlow minimizes a combined objective with a variational lower bound and geometric regularization:

L(θ, φ) = LAmbientFlow(θ, φ) + λ · |Jfθ(0)|F2

The recoverability theorem requires three feasibility assumptions:

(F1) Data matching: The learned data distribution equals the ground truth.
(F2) Posterior matching: The variational posterior equals the true posterior.
(F3) Geometric constraint: The Jacobian Frobenius norm at the origin is bounded.

The open question: do gradient-based optimizers find local minima that satisfy all three conditions?

Circle in R2

Unit circle S1 parameterized by f*(z) = (cos z, sin z). Intrinsic dim d=1, ambient dim D=2.

1 → 2
d → D

Sphere in R3

Unit sphere S2 via inverse stereographic projection. Intrinsic dim d=2, ambient dim D=3.

2 → 3
d → D

Helix in R3

Helix f*(t) = (cos t, sin t, t/2π). Intrinsic dim d=1, ambient dim D=3.

1 → 3
d → D

Experimental Setup

200
Data Points
10
Random Starts
7
λ Values
200
L-BFGS-B Iters
0.1
Noise σ
4
Experiments

Experiment 1: Multi-Start Landscape Exploration

Objective value across 10 random initializations for each λ. High variance indicates multiple distinct local minima.

Objective Spread (Landscape Complexity)

Standard deviation of the converged objective across starts. Higher spread indicates more distinct local minima.

λCircle (std)Sphere (std)Helix (std)
0.001.8370.1602.083
0.011.2080.1281.823
0.051.2080.1171.379
0.101.6240.1330.058
0.501.1980.1430.031
1.001.2020.1871.391
2.001.6060.2810.048
Key insight: The sphere has consistently low objective spread (<0.28), suggesting a simpler landscape. The circle and helix show spreads exceeding 1.8, indicating multiple well-separated basins.

Experiment 2: Feasibility Phase Diagram

Aggregate feasibility score combining data matching (F1), posterior matching (F2), and geometric constraint (F3).

Feasibility Decomposition

Breaking down the aggregate score into its three components reveals the fundamental trade-off.

Trade-off: Increasing λ improves the geometric constraint (F3) but degrades data matching (F1). The aggregate feasibility is non-monotonic with a manifold-dependent sweet spot. No configuration achieves near-perfect feasibility (>0.9).

Experiment 3: Hessian Spectral Analysis

Distribution of directional second derivatives at converged critical points. All positive curvature confirms genuine local minima.

Hessian Summary

ManifoldλMin EigenvalueMax EigenvalueMeanNegative Dirs
Circle0.012.28461.67210.710/50
Circle0.110.83476.92229.860/50
Circle0.510.45580.49256.590/50
Circle1.011.52671.04279.470/50
Sphere0.08.1876.6436.810/50
Sphere0.18.6975.4437.650/50
Sphere0.57.0870.8833.680/50
Sphere1.062.76859.35375.300/50
Helix0.0141.031870.47878.690/50
Helix0.181.96982.71396.460/50
Helix0.5124.541512.43614.730/50
Helix1.0326.873896.461648.250/50
Key finding: Zero negative curvature directions detected across all 600 random probes (3 manifolds x 4 λ values x 50 directions). All converged points are genuine local minima.

Experiment 4: Parameter Continuation

Tracking a single local minimum as λ increases from 0 to 2. Reveals smooth deformation without bifurcation.

Path dependence: Feasibility monotonically decreases along the continuation path, while multi-start optimization at large λ sometimes finds better solutions. This shows that the basin reached at λ=0 is not necessarily the most feasible basin at larger λ.

Pullback Metric Analysis

Comparing the Riemannian geometry (pullback metric trace) of learned vs. ground-truth diffeomorphisms.

Metric Trace at Origin

ManifoldλTr(Gθ(0))Tr(G*(0))Ratio
Circle0.00.4501.0000.450
Circle0.10.4051.0000.405
Circle1.00.2611.0000.261
Sphere0.01.0108.0000.126
Sphere0.10.8258.0000.103
Sphere1.00.5158.0000.064
Helix0.00.4161.0250.406
Helix0.10.3881.0250.378
Helix1.00.2631.0250.256
Systematic underestimation: The learned metric consistently underestimates the true geometry (all ratios < 0.45). On the sphere, the ratio drops as low as 0.064, showing a 15x underestimation. This is a direct consequence of the Jacobian penalty.

Key Findings

Finding 1: All converged points are genuine local minima.
Zero negative curvature directions were detected across 600 random probes. Gradient-based optimization reliably reaches local minima, not saddle points. However, multiple distinct minima exist (objective spread up to 2.08).
Finding 2: Fundamental feasibility trade-off.
Increasing λ improves the geometric constraint (F3) but degrades data matching (F1). The aggregate feasibility is non-monotonic, with a manifold-dependent sweet spot. Best scores: Circle 0.553 (λ=1.0), Sphere 0.111 (λ=0.1), Helix 0.514 (λ=0.0).
Finding 3: Feasibility assumptions are generically not satisfied.
No tested configuration achieves near-perfect feasibility (score > 0.6). The recoverability theorem's assumptions appear to be generically violated at local minima found by standard gradient-based optimization.
Finding 4: Systematic geometric underestimation.
The pullback metric at learned solutions underestimates the true geometry by factors of 2--16x. The Jacobian penalty directly suppresses Tr(G(0)), pushing the learned map away from the true diffeomorphism.
Finding 5: Path dependence in the landscape.
Parameter continuation reveals smooth deformation without bifurcation, but the continued minimum has worse feasibility than what fresh multi-start optimization achieves. The basin reached at λ=0 is not the most feasible at larger λ.

Implications

These findings suggest that closing the gap between the theoretical recoverability guarantee and practical optimization will require:

Architectural innovations that enforce feasibility by construction, e.g., parameterizations that guarantee the learned distribution matches the data while satisfying the geometric constraint.
Optimization strategies designed to navigate toward feasible basins, such as curriculum strategies on λ or initialization schemes informed by the manifold structure.
Relaxed theoretical results that provide approximate recoverability guarantees under approximate feasibility, bridging the gap to what gradient-based optimization achieves in practice.

Reference

Based on the open problem from: Diepeveen et al., "Riemannian AmbientFlow: Towards Simultaneous Manifold Learning and Generative Modeling from Corrupted Data," arXiv:2601.18728, January 2026.