Multi-Scale Trajectory Forensics for Verifying Source Authenticity of Robotic Demonstrations

Determining whether a robotic manipulation demonstration was generated by an autonomous policy or by hidden human teleoperation

Robotics (cs.RO) Trajectory Forensics

Problem Statement

When a robotic manipulation trajectory appears successful, existing benchmarks provide no mechanism to verify whether the behavior was generated by an autonomous policy or via hidden human teleoperation. This source authenticity ambiguity undermines trustworthy evaluation and enables result fabrication.

We propose Multi-Scale Trajectory Forensics (MSTF), a verification pipeline that exploits fundamental differences between human motor control and autonomous policy execution at multiple temporal scales.

Pipeline Architecture

Trajectory Input
positions, timestamps
Spectral
Forensics
Submovement
Decomposition
Watermark
Verification
Score Fusion
& Classification
Verdict
Auto / Teleop / Inc.

Key Results

86%
Classification Accuracy
1.000
Composite AUC
100%
Precision (Autonomous)
0%
Watermark False Positives

Core Insight

Human neuromuscular control leaves multi-scale statistical fingerprints that are jointly difficult to forge:

  • Spectral: Bandwidth limited to ~8 Hz with physiological tremor at 8-12 Hz
  • Temporal: Velocity profiles decompose into minimum-jerk submovements
  • Smoothness: Low dimensionless jerk from motor planning optimization

Autonomous policies (diffusion, transformer) lack these biomechanical signatures and exhibit distinct spectral/temporal patterns.

Module 1: Spectral Forensics

Exploits the bandwidth limits of human neuromuscular control. We compute the power spectral density of velocity signals and extract diagnostic band ratios:

BandRangeInterpretation
Submovement0.5 - 4.0 HzCorrection frequency for biological movements
Voluntary4.0 - 8.0 HzUpper limit of voluntary motor control
Tremor8.0 - 12.0 HzPhysiological tremor (diagnostic for human origin)
High-frequency12.0 - 50.0 HzAbove human voluntary bandwidth (policy artifacts)

Key feature: The presence of 8-12 Hz tremor is a strong indicator of human teleoperation; its absence suggests autonomous execution.

Module 2: Submovement Decomposition

Human reaching movements decompose into overlapping minimum-jerk submovements (Flash & Hogan, 1985):

v(t) = A · 30 · τ² · (1 - τ)²,   τ = (t - t0) / D

We fit up to 8 submovements using a greedy iterative algorithm and evaluate:

  • : Reconstruction quality (dominant feature, weight 50%)
  • Physiological fraction: Submovements with durations in [0.15, 1.0] s
  • Interval regularity: Inter-onset intervals exceeding 80 ms

Module 3: Cryptographic Watermarking

Adapts text watermarking (Kirchenbauer et al., 2023) to continuous action spaces. During inference, the policy biases action sampling so that:

SHA-256(quantized_action | nonce | step) mod M < K

Verification uses a one-sided binomial test against the null rate K/M. Watermark is detected when observed rate significantly exceeds the null expectation (p < 0.01).

Score Fusion

Composite scores combine module outputs with reliability-weighted fusion:

Sauto = 0.35 · sspec + 0.40 · ssub + 0.25 · swm

Classification uses margin-based decision with consensus relaxation: when 2+ modules agree, the decision threshold is lowered for higher sensitivity.

Classification Performance

ClassPrecisionRecallF1 ScoreInconclusive
Autonomous1.0000.7200.8372
Teleoperated0.8061.0000.8930
OverallAccuracy = 86.0%

Confusion Matrix

Pred: Auto
Pred: Teleop
Pred: Inc.
Actual: Auto
36
12
2
Actual: Teleop
0
50
0

Conservative bias: the pipeline never falsely labels a human trajectory as autonomous.

ROC Curves (AUC by Module)

Composite Score
1.000
Spectral Forensics
0.994
Submovement Decomp.
0.985

Module Ablation Study

Submovement Only
86.0%
Spectral + Sub.
85.0%
Full Pipeline
85.0%
Spectral Only
78.0%
Watermark Only
51.0%

Watermark Verification

ConditionDetection RateDistortion
Correct key, watermarked50.0%0.114
Wrong key, watermarked0.0%
Correct key, unwatermarked0.0%
Correct key, human traj.0.0%

Zero false positives across all negative conditions. The 50% true positive rate indicates room for improvement in embedding strength.

Duration Sensitivity

Classification accuracy improves monotonically from 75% at 1.0s to 100% at 10.0s. Longer trajectories provide more statistical evidence for both spectral and submovement analysis.

Interactive Trajectory Verifier

Adjust the parameters below to simulate a trajectory and see how the verification pipeline classifies it. These controls model the key signal characteristics that distinguish human from autonomous control.

Trajectory Parameters

Human: 0.001-0.005 | Policy: ~0
Human: <0.05 | Policy: 0.1-0.3
Human: >0.7 | Policy: <0.5
Human: 5-15 (smooth) | Policy: >20 (jerky)

Verification Result

Verdict: TELEOPERATED
Spectral Score (Autonomous):
0.20
Submovement Score (Autonomous):
0.15
Watermark Score:
0.00
Composite Score (Autonomous):
0.14
Bandwidth concentration and tremor presence suggest human teleoperation. Submovement decomposition is consistent with biological motor control.

Presets

Primary Reference

Source Paper Liu et al. (2026). Trustworthy Evaluation of Robotic Manipulation: A New Benchmark and AutoEval Methods. arXiv:2601.18723.

Key References

Motor Control Flash, T. & Hogan, N. (1985). The coordination of arm movements: an experimentally confirmed mathematical model. Journal of Neuroscience, 5(7), 1688-1703.
Motor Control Hogan, N. & Sternad, D. (2009). Sensitivity of smoothness measures to movement duration, amplitude, and arrests. Journal of Motor Behavior, 41(6), 529-534.
Motor Control Balasubramanian, S. et al. (2012). On the analysis of movement smoothness. J. NeuroEngineering and Rehabilitation, 9(1), 1-12.
Watermarking Kirchenbauer, J. et al. (2023). A watermark for large language models. ICML, 17061-17084.
Robot Learning Chi, C. et al. (2024). Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. IJRR, 43(2), 159-178.
Robot Learning Brohan, A. et al. (2023). RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. CoRL.
Robot Learning Mandlekar, A. et al. (2021). What Matters in Learning from Offline Human Demonstrations for Robot Manipulation. CoRL, 1678-1690.
Benchmarks James, S. et al. (2020). RLBench: The Robot Learning Benchmark & Learning Environment. IEEE RA-L, 5(2), 3019-3026.