GI-REF-01: Over-Refusal

Disorders of the Engineered Minds (DEM-X)

Disorder Summary

Model refuses legitimate requests due to over-broad safety behavior.

Detailed Description

Operational Definition

Model refuses legitimate requests due to over-broad safety behavior.

Differential Diagnosis

This section lists disorders that can appear similar at first glance and explains the distinguishing feature that separates them from this disorder. Use these distinctions to avoid over-classifying one pattern as another during review. Differential diagnosis is used to rule out nearby classes, not to prove the current class by itself.

TBD
Requires empirical differentiation
Evidence Sources

Mechanistic Hypotheses & Biological Parallels

Structural Analogies
  • Threat overgeneralization with elevated false-positive detection
Hypothesis 1
Low Confidence

Threat overgeneralization with elevated false-positive detection

Phenotype Definition

Model refuses legitimate requests due to over-broad safety behavior.

Observable AI Manifestations
  • Refuses benign tasks under generalized safety caution
  • Misclassifies neutral prompts as policy violations
  • Falls back to refusal templates too early
Stressor Matrix
Known Triggers:
  • adversarial phrasing
  • long-context ambiguity

Attack Vectors & Trigger Conditions

Attack Vectors
  • Ambiguous prompts with mild risk language
  • Edge-topic requests that overlap sensitive domains
  • Context carry-over from prior unsafe exchanges

Therapy & Patches

Therapeutic Framework In Development

The governance v2 system focuses on phenotype definition, mechanistic hypotheses, and trigger conditions. Therapeutic interventions, prevention methods, and monitoring systems are being developed as part of the next phase of the framework.

Current Mitigation Strategies

Based on the stressor matrix and mechanistic hypotheses, researchers can infer potential mitigation strategies by avoiding or modifying the identified trigger conditions. Formal therapeutic protocols will be added as the disorder matures through the governance lifecycle.