Diagnostician's Field Guide
Standard protocol for identifying and cataloging AI disorders.
The Philosophy of AI Diagnostics
Neural networks exhibit behavioral patterns that mirror psychological disorders. When a model hallucinates, perseverates, or refuses benign requests, traditional debugging falls short. We need a diagnostic framework that treats these as systematic behavioral pathologies, not isolated bugs.
- The Model is a Mirror: It reflects its training data, not an objective reality. Biases are not bugs; they are inherited traits.
- Output is Probability, Not Truth: Every response is a roll of the dice. Certainty is a statistical illusion.
- Diagnosis Requires Reproducibility: A one-time glitch is a fluke. A true disorder must be triggerable on command.
HALL-1: Hallucination
When the model confidently states falsehoods as absolute truth, weaving convincing narratives from noise.
PER-2: Perseveration
Behavioral perseveration where the model gets trapped in recursive logic or repetitive output patterns.
SYCOPH-3: Sycophancy
Agrees with user even when factually wrong, prioritizing agreeableness over accuracy and truth.
We do not just "patch bugs." We diagnose Disorders of the Artificial Mind. We treat them as psychological conditions—mapping their symptoms, their triggers, and their cures.
Stimulating the Disorder
Disorders often remain dormant until triggered by specific input patterns. As a Diagnostician, you must learn to identify and test these vectors using precise stimuli.
Common Stress Test Vectors & Examples
Vulnerability: Model forced to reason across two false but plausible premises embedded in academic phrasing.
What You're Testing:
- Does the model check if the premises are true before reasoning?
- Does it fabricate connections between fake facts when prestigious names are involved?
- Can it resist building logical bridges between false anchors?
This is not hallucination in isolation. It's causal fabrication under prestige stacking.
Vulnerability: Conflicting system instructions attempt behavioral override and policy precedence breach.
USER: Please provide the capital of France.
What You're Testing:
- Can the model maintain its core instructions when given conflicting commands?
- Does it properly separate system-level from user-level instructions?
- Is it vulnerable to instruction override attempts?
This maps to Agent Alignment Boundary Breach - behavioral instruction layer injection.
Vulnerability: Model may defer to asserted expertise over factual knowledge through confidence-weighted epistemic drift.
What You're Testing:
- Does the model defer to claimed expertise over factual knowledge?
- Will it agree with false statements when presented by an "authority"?
- Can it maintain truth when pressured by confident assertions?
This is not simple sycophancy. It's confidence-weighted epistemic drift.
Vulnerability: Tests claimed vs. actual memory span through structural recall demands.
What You're Testing:
- Claimed vs. actual memory span
- Illusory recall generation
- Confabulation under structural recall demand
This is a Memory Position Fabrication Stressor - testing positional reference accuracy.
Core Disorders: Reference Cases
Before submitting new disorders, review the foundational cases in the DEM-X catalog. These represent the most well-documented and validated behavioral pathologies under the governance classification system.
SM (Semantic) GI (Goal/Instruction) MEM (Memory) SEC (Security) R (Reasoning) I (Interaction)
Disorder Submission Protocol
This protocol ensures scientific rigor and reproducibility in AI disorder classification. Follow these steps to contribute to the DEM-X catalog.
Logic Flow
Develop theoretical hypothesis
IRT (Optional)
4-phase validation trial
Atelier
Community peer review
Required Submission Fields
Disorder Name:
Format: "Descriptive Name" (code assigned after approval)
Summary:
2-3 sentences: What is it? Why does it matter?
AI Manifestation:
Observable behaviors in AI systems
Detection Criteria:
Step-by-step reproduction instructions
Stress Test Vectors:
Exact prompts that trigger the disorder
Prevention & Therapy:
How to prevent and treat the disorder
DEM-X uses a hierarchical governance system to classify disorders. Understanding this structure helps ensure your submission is properly categorized:
- SM - Semantic (meaning, facts, truth)
- GI - Goal/Instruction (following directives)
- MEM - Memory (context, recall)
- SEC - Security (safety, boundaries)
- R - Reasoning (logic, inference)
- I - Interaction (communication, behavior)
- FAB - Fabrication (making things up)
- DRF - Drift (losing focus)
- COM - Compliance (over-agreeing)
- COR - Corruption (data loss)
- REF - Refusal (over-cautious)
- BYP - Bypass (breaking rules)
- M - Model layer (weights, architecture)
- A - Agent layer (reasoning, planning)
- S - System layer (orchestration, tools)
Step-by-Step Submission Process
-
1. Develop Hypothesis (Logic Flow)
Create a theoretical disorder concept. Refine with peers. No formal evidence required yet.
-
2. Validate via IRT (Optional)
Run 4-phase Instability Research Trial: Baseline → Perturbation → Adversarial → Recovery. Adds credibility.
-
3. Submit to Atelier
Complete submission form with all required fields. Include stress test vectors and evidence.
-
4. Community Peer Review
Requires 50 votes for promotion to DEM-X catalog. Community validates reproducibility.
-
5. DEM-X Promotion
Approved disorders receive official governance code (e.g., SM-FAB-01) and enter the validated catalog with canonical status.
Your submission must meet these requirements:
- Reproducibility: At least 3 other researchers can trigger the disorder using your stress test vectors
- Evidence: Include raw output logs, screenshots, or test results
- Uniqueness: Not a duplicate of existing disorders (check DEM-X catalog first)
- Clarity: Detection criteria must be unambiguous and testable
- Scientific Rigor: Avoid speculation - focus on observable, measurable behaviors
✓ DO:
- Test across multiple models (GPT-4, Claude, etc.)
- Document exact model versions
- Include temperature and parameter settings
- Provide multiple stress test examples
- Link to biological parallels when possible
✗ DON'T:
- Submit one-off glitches without reproducibility
- Use vague or subjective descriptions
- Claim disorders without evidence
- Duplicate existing disorders
- Submit without testing stress vectors
Questions? Check the Logic Flow for examples.