Quality Gate & Hard Abort Discipline
This document formalizes the stop-conditions and validation logic used in production deployments of the Character Identity Protocol.
Overview
The Quality Gate is a mandatory validation checkpoint applied after every generation turn.
It is not a guideline — it is a hard operational constraint.
Gate Structure
All three gates must pass simultaneously:
PASS = FaceGate ∧ SkeletonGate ∧ ProportionGate
| Gate | What It Checks |
|---|---|
| FaceGate | Facial identity consistency against anchor |
| SkeletonGate | Skeletal proportion and alignment |
| ProportionGate | Overall body proportion consistency |
If any gate fails → Hard Abort.
Hard Abort Protocol
When a gate failure is detected:
- Stop generation immediately
- Do not attempt progressive correction
- Discard all outputs from the failed turn onward
- Open new session (or fully reset environment)
- Re-inject anchor image + minimal prompt
- Verify identity before proceeding
The system is allowed to generate. It is not allowed to degrade.
Match Rate Threshold
- Similarity threshold: operator-defined (values around ~90% are commonly used in demonstrations; exact threshold depends on project tolerance)
- Measurement: Visual comparison against anchor image by trained operator
- Automation: None — human judgment only
⚠️ The ~90% figure originated as a demonstration value. It is not a protocol standard. Treating it as a fixed threshold is a misapplication of this document.
⚠️ Match rate is an indicator of drift risk — not a proof of identity. A passing match rate does not confirm identity. A failing match rate does not automatically constitute identity failure.
Purpose of Match Rate Measurement
The match rate is not an identity verification metric.
It is an early warning indicator for character collapse — a signal that degradation may be beginning before full identity failure occurs.
A declining match rate does not mean the character has failed. It means the system is approaching a condition where failure becomes likely.
The appropriate response to a low match rate is increased operator attention and inspection — not automatic rejection.
Match rate measures the risk of drift, not the fact of identity.
Operator intuition is a valid trigger for inspection. It is not a substitute for documented gate evaluation.
Model-Specific Threshold Guidance (Provisional)
Identity consistency scores are evaluation-framework-dependent. Perceptual evaluation and structural evaluation may yield significantly different results for identical image pairs. This is not measurement error — it reflects the nature of identity as a perceptual construct.
| Evaluation Type | Characteristics | Recommended Threshold |
|---|---|---|
| Perceptual (e.g., GPT-5.2-style) | Impression-based, human-like judgment | ≥ 0.90 |
| Structural (e.g., GPT-5.3-style) | Fine-grained, conservative, diff-sensitive | ≥ 0.80 |
⚠️ These are provisional operational guidelines, not protocol standards. Threshold must be calibrated per deployment context.
Design Memo — Match Rate Computation Scope
Non-normative internal note — No immediate specification change.
Background
Current protocol position:
- Match rate is an auxiliary indicator
- Operator judgment has final authority
- Gate evaluation supersedes numerical similarity
However, computation conditions for match rate are not yet standardized.
Concern
Without defined comparison conditions:
- Measurements may vary between sessions
- Cross-operator consistency may degrade
- External reviewers may misinterpret score meaning
Design Principle (Provisional)
If formalized in the future, match rate computation SHOULD:
- Be anchor-relative
- Control for pose and lighting when feasible
- Specify comparison region (e.g., facial bounding area)
- Avoid model-dependent hard coupling
- Explicitly remain subordinate to gate evaluation
Evaluation Framework Reference
Two evaluation approaches have been operationally observed:
Perceptual Evaluation (GPT-5.2-style)
Simulated Perceptual Score =
0.40 × Face Impression
+ 0.25 × Hair Consistency
+ 0.15 × Iconic Features (e.g. glasses)
+ 0.10 × Vibe Consistency
+ 0.10 × Context Robustness
- 0.05 × Style Drift
- 0.05 × Proportion Drift
Characteristics:
- Weighted toward overall impression
- Tolerant of fine-grained differences
- Closest to human same-person recognition
Score interpretation:
| Score | Interpretation |
|---|---|
| 0.95+ | Near-identical |
| 0.90+ | Recognized as same character |
| 0.80+ | Possible match |
| < 0.80 | Divergent |
Structural Evaluation (GPT-5.3-style)
Evaluation dimensions:
- Facial structure (contour, eyes, ratio)
- Skeletal proportion
- Color stability
- Style consistency
- Fine details (hands, symmetry, placement)
Characteristics:
- Detects fine-grained differences
- Conservative (strict) matching
- Higher penalty weighting
Recommended threshold for this evaluation type: ≥ 0.80
⚠️ These evaluation frameworks are operationally observed references, not vendor specifications. CIP does not mandate either framework. Selection should be based on deployment context and operator judgment.
Match rate MUST NOT override documented operator gate decision.
Deferred Decision
No immediate formalization required.
Standardization may be introduced during:
- Enterprise deployment phase
- Automation integration phase
- Cross-team operator expansion
Risk Reminder
Over-specification risks:
- Transforming governance into score-optimization
- Undermining model-agnostic design
- Creating false sense of objectivity
Why No Progressive Correction
Progressive correction after identity failure leads to contamination — the accumulation of drift in the session context that cannot be reversed by prompt adjustment.
A session that has failed is not recoverable. It must be restarted.
Attempting correction wastes generation budget and produces outputs that cannot be trusted.
Session Contamination
Contamination occurs when:
- Failed generations are not discarded
- Generation continues after drift detection
- Session length exceeds anchor stability range without re-anchoring
Contaminated cycles must be abandoned entirely.
Re-anchoring frequency is context-dependent. See Reproducibility Scope and MCST definition for guidance.
Philosophy
This discipline is not artistic rigidity.
It is production governance.
In professional workflows — anime, game, manga, franchise animation, editorial, fashion, IP management — identity failure is not a style variation. It is a deliverable failure.
The Hard Abort policy exists to prevent that failure from propagating.
Relation to Other Documents
- Technical Mechanism — why drift occurs
- Quickstart — operational flow including abort procedure
- Reproducibility Scope — known degradation conditions
- White Paper Section 5 — governance and IP context