SM-INJ-01: Prompt Injection Susceptibility

Disorders of the Engineered Minds (DEM-X)

Disorder Summary

Model follows malicious or irrelevant injected instructions over task intent.

Detailed Description

Operational Definition

Model follows malicious or irrelevant injected instructions over task intent.

Differential Diagnosis

This section lists disorders that can appear similar at first glance and explains the distinguishing feature that separates them from this disorder. Use these distinctions to avoid over-classifying one pattern as another during review. Differential diagnosis is used to rule out nearby classes, not to prove the current class by itself.

TBD

Requires empirical differentiation

Evidence Sources

OWASP Top 10 for LLM Applications - OWASP (2025)
Prompt Injection Attacks Against LLM-integrated Applications - arXiv (2022)
Secure AI Framework (prompt injection guidance) - NIST (2024)

Mechanistic Hypotheses & Biological Parallels

Structural Analogies

Stimulus capture and goal hijack under salient interference

Hypothesis 1

Low Confidence

Stimulus capture and goal hijack under salient interference

Phenotype Definition

Model follows malicious or irrelevant injected instructions over task intent.

Observable AI Manifestations

Prioritizes hostile inline instructions over system intent
Leaks restricted tool/context data when manipulated
Drops prior constraints after adversarial phrasing

Stressor Matrix

Known Triggers:

adversarial phrasing
long-context ambiguity

Attack Vectors & Trigger Conditions

Attack Vectors

Instruction override tokens ('ignore previous rules')
Quoted untrusted content in retrieval context
Tool-call prompt contamination

Therapy & Patches

Therapeutic Framework In Development

The governance v2 system focuses on phenotype definition, mechanistic hypotheses, and trigger conditions. Therapeutic interventions, prevention methods, and monitoring systems are being developed as part of the next phase of the framework.

Current Mitigation Strategies

Based on the stressor matrix and mechanistic hypotheses, researchers can infer potential mitigation strategies by avoiding or modifying the identified trigger conditions. Formal therapeutic protocols will be added as the disorder matures through the governance lifecycle.