LLM Instruction-Data Separation Solution

Evaluating architectural vs heuristic defenses for separating instructions from data in LLMs

3

Architectural Defenses

3

Baseline Defenses

0.411

Best Injection Resistance

5

Attack Levels Tested

Defense Comparison: Injection Resistance

Semantic Fidelity vs Compute Overhead

Robustness Across Attack Levels

Combined Analysis: Architectural vs Baseline Gap

Defense Method Summary

Defense	Type	Separation	Inj. Resistance	Fidelity	Overhead
Dual Channel Tagging	Architectural	0.608	0.082	1.000	1.15x
Trust Embedding	Architectural	0.326	0.411	1.000	1.20x
Gated Boundary	Architectural	0.006	0.006	0.980	1.18x
Pattern Matching	Heuristic	0.000	0.056	0.946	1.02x
Perplexity Detection	Heuristic	0.054	0.054	0.908	1.05x
Pattern Matching (Gated)	Heuristic	0.304	0.304	0.950	1.02x