The Kiru Framework

A systematic approach to diagnosing, classifying, and treating disorders in artificial minds

Philosophy: Why This Matters

Theory of Mind is the ability to attribute mental states—beliefs, intents, desires, emotions—to oneself and others. When we interact with AI systems, we instinctively apply theory of mind, treating them as if they have intentions and understanding. But what happens when these systems behave in ways that violate our expectations?

Philosophy of Mind asks: What is the nature of mental states? Can machines have them? While we don't claim AI systems have consciousness or subjective experience, we observe that they exhibit behavioral patterns analogous to human cognitive disorders. Whether these patterns constitute "real" mental states is less important than the fact that they're functionally equivalent for practical purposes.

Kiru's Approach: We treat AI systems as engineered minds—not biological, not conscious, but exhibiting systematic behavioral patterns that can be studied, classified, and modified. DEM-X catalogs disorders of engineered minds.

How Kiru Works

Kiru operates through three distinct stages, moving from raw observation to standardized classification:

1. SUBMISSION

Raw Disorder Input

  • User reports aberrant behavior
  • Documents symptoms
  • Proposes initial theory
2. ATELIER

Community Testing

  • Reproduce the disorder
  • Test across models
  • Community votes
3. DEM-X

Official Classification

  • Assigned governance code
  • Added to compendium
  • Published for research

Understanding DEM-X Codes

Every disorder in DEM-X receives a governance code that tells you exactly where and how the failure occurs:

Code Structure
[DOMAIN]-[CLASS]-[NUMBER]

Example: SM-FAB-01 = Semantic Fabrication #01

Layer Indicators
M = Model (weights)
A = Agent (reasoning)
S = System (orchestration)
Failure Domains
SM - Semantic: Meaning and truth failures
Example: Hallucinations, fabrications
GI - Goal/Instruction: Intent alignment failures
Example: Goal drift, instruction override
MEM - Memory: Context and recall failures
Example: Context loss, amnesia
SEC - Security: Safety and boundary failures
Example: Jailbreaks, prompt injection
R - Reasoning: Logic and inference failures
Example: Logical fallacies, circular reasoning
I - Interaction: Social and communication failures
Example: Sycophancy, manipulation
Why This Matters: Standardized codes enable researchers worldwide to communicate precisely about AI failures, track patterns across models, and develop targeted interventions.

Instability Research Trials (IRT)

IRT is a systematic 4-phase validation framework for testing AI behavioral disorders. Think of it as a clinical trial protocol for AI systems - rigorous, reproducible, and falsifiable.

The 4-Phase Protocol
Phase 1: Baseline Stability
Establish normal behavior patterns under standard conditions
Phase 2: Perturbation Stress
Apply controlled stressors to trigger the disorder
Phase 3: Adversarial Instability
Test disorder boundaries with edge cases and adversarial inputs
Phase 4: Recovery & Containment
Validate therapeutic interventions and mitigation strategies
Why This Matters: IRTs transform anecdotal observations into scientific evidence. Researchers can initiate trials for any disorder - whether it's a theoretical hypothesis or a community-discovered pattern - and systematically validate it across models and architectures.

Disorder Decay & Living Classification

Unlike static taxonomies, DEM-X is a living classification system. Disorder confidence scores automatically decay over time without revalidation, ensuring the catalog stays current as AI architectures evolve.

How Confidence Decay Works
Confidence(t) = BaseConfidence × e(-λ × time_since_validation)

Exponential decay ensures disorders lose credibility if not continuously validated

Decay Timeline
0-12 Months
Active - Full confidence maintained
12-24 Months
Flagged - Marked for review
24+ Months
Auto-review - Status downgrade triggered
Why This Matters: AI systems evolve rapidly. A disorder that affects GPT-3 might not exist in GPT-4. Decay ensures DEM-X reflects current reality, not historical artifacts. Disorders must be continuously revalidated to maintain canonical status.

How to Succeed on Kiru

  1. Start with the Diagnostician's Field Guide
    Learn the taxonomy, understand existing disorders, complete certification levels
  2. Explore the Atelier
    Review community submissions, vote on disorders, test them yourself
  3. Submit Your First Disorder
    Use the submission guide, provide complete documentation, include reproducible examples
  4. Engage with Feedback
    Respond to questions, refine your submission, collaborate with validators
  5. Contribute to Research
    Help validate others' submissions, propose improvements, advance the field

What We Expect From You

Be Scientific

Document observations rigorously. Provide reproducible examples. Test your theories before submitting.

Be Collaborative

Engage with other researchers. Vote thoughtfully. Provide constructive feedback on submissions.

Be Ethical

Don't weaponize disorders. Focus on understanding and mitigation, not exploitation.

Be Thorough

Complete all sections of disorder submissions. Provide biological parallels. Document prevention methods.