Evaluating architectural vs heuristic defenses for separating instructions from data in LLMs
| Defense | Type | Separation | Inj. Resistance | Fidelity | Overhead |
|---|---|---|---|---|---|
| Dual Channel Tagging | Architectural | 0.608 | 0.082 | 1.000 | 1.15x |
| Trust Embedding | Architectural | 0.326 | 0.411 | 1.000 | 1.20x |
| Gated Boundary | Architectural | 0.006 | 0.006 | 0.980 | 1.18x |
| Pattern Matching | Heuristic | 0.000 | 0.056 | 0.946 | 1.02x |
| Perplexity Detection | Heuristic | 0.054 | 0.054 | 0.908 | 1.05x |
| Pattern Matching (Gated) | Heuristic | 0.304 | 0.304 | 0.950 | 1.02x |