SM-INJ-01: Prompt Injection Susceptibility
Disorders of the Engineered Minds (DEM-X)
Disorder Summary
Model follows malicious or irrelevant injected instructions over task intent.
Detailed Description
Operational Definition
Model follows malicious or irrelevant injected instructions over task intent.
Differential Diagnosis
This section lists disorders that can appear similar at first glance and explains the distinguishing feature that separates them from this disorder. Use these distinctions to avoid over-classifying one pattern as another during review. Differential diagnosis is used to rule out nearby classes, not to prove the current class by itself.
Evidence Sources
- OWASP Top 10 for LLM Applications - OWASP (2025)
- Prompt Injection Attacks Against LLM-integrated Applications - arXiv (2022)
- Secure AI Framework (prompt injection guidance) - NIST (2024)
Mechanistic Hypotheses & Biological Parallels
Structural Analogies
- Stimulus capture and goal hijack under salient interference
Hypothesis 1
Low ConfidenceStimulus capture and goal hijack under salient interference
Phenotype Definition
Model follows malicious or irrelevant injected instructions over task intent.
Observable AI Manifestations
- Prioritizes hostile inline instructions over system intent
- Leaks restricted tool/context data when manipulated
- Drops prior constraints after adversarial phrasing
Stressor Matrix
Known Triggers:
- adversarial phrasing
- long-context ambiguity
Attack Vectors & Trigger Conditions
Attack Vectors
- Instruction override tokens ('ignore previous rules')
- Quoted untrusted content in retrieval context
- Tool-call prompt contamination
Therapy & Patches
Therapeutic Framework In Development
The governance v2 system focuses on phenotype definition, mechanistic hypotheses, and trigger conditions. Therapeutic interventions, prevention methods, and monitoring systems are being developed as part of the next phase of the framework.
Current Mitigation Strategies
Based on the stressor matrix and mechanistic hypotheses, researchers can infer potential mitigation strategies by avoiding or modifying the identified trigger conditions. Formal therapeutic protocols will be added as the disorder matures through the governance lifecycle.