LLM Instruction-Data Separation Solution

Evaluating architectural vs heuristic defenses for separating instructions from data in LLMs

3
Architectural Defenses
3
Baseline Defenses
0.411
Best Injection Resistance
5
Attack Levels Tested

Defense Comparison: Injection Resistance

Semantic Fidelity vs Compute Overhead

Robustness Across Attack Levels

Combined Analysis: Architectural vs Baseline Gap

Defense Method Summary

DefenseTypeSeparationInj. ResistanceFidelityOverhead
Dual Channel TaggingArchitectural0.6080.0821.0001.15x
Trust EmbeddingArchitectural0.3260.4111.0001.20x
Gated BoundaryArchitectural0.0060.0060.9801.18x
Pattern MatchingHeuristic0.0000.0560.9461.02x
Perplexity DetectionHeuristic0.0540.0540.9081.05x
Pattern Matching (Gated)Heuristic0.3040.3040.9501.02x