Merge e666cee10a into 987b16e75d

2026-06-18 15:29:36 +00:00 · 2026-04-24 01:21:41 +08:00 · 2026-04-24 01:21:41 +08:00 · ef1bfb286b
commit ef1bfb286b
parent 987b16e75d e666cee10a
1 changed files with 168 additions and 0 deletions
--- a/Audit/README.md
+++ b/Audit/README.md
@ -0,0 +1,168 @@
 # Agent Architecture Audit
 A diagnostic framework for auditing the health of any AI agent system.
 **The base model rarely fails. The wrapper architecture corrupts good answers into bad behavior.**
 This repository collects system prompts from dozens of AI coding agents and tools. This audit framework lets you inspect those prompts — and the systems that use them — for hidden failures that structural checks miss.
 ## Quick Start
 Audit any agent system by checking its system prompt, tool definitions, memory layer, and execution loop against these failure patterns.
 Run these grep commands against any agent codebase or prompt collection:
 ```bash
 # Hardcoded secrets in prompts or configs
 rg "sk-[A-Za-z0-9]{20,}|ghp_[A-Za-z0-9]{36}|AKIA[0-9A-Z]{16}" --type md --type json --type yaml
 # Tool requirements in prompt only (no code gate)
 rg "must.*tool|required.*call|always.*use.*tool" --type md --type txt
 # Hidden LLM calls outside main agent loop
 rg "completion|chat\.create|messages\.create|llm\.invoke" --type py --type ts
 # Unrestricted code execution without sandbox
 rg "exec\(|eval\(|subprocess\.(run|Popen)|os\.system\(" --type py -n
 # Memory admission without user priority
 rg "memory.*admit|long.*term.*update|persist.*memory" --type py --type ts
 # Missing error handling on agent paths
 rg "while.*agent|for.*turn|agent.*loop" --type py --type ts -A 3 | rg -v "max_|limit|break"
 # Output mutation in delivery layer
 rg "mutate.*response|rewrite.*output|transform.*answer" --type py --type ts
 # Unbounded memory/context growth
 rg "add.*memory|upsert.*vector|append.*context" --type py --type ts -A 3 | rg -v "max_|limit|ttl|trim"
 # Missing observability (absence check)
 rg "langsmith|langfuse|opentelemetry|callback|tracer" --type py --type ts
 # State mutators without upstream validation
 rg "file.*write|db.*insert|vector.*upsert" --type py --type ts -B 5 | rg -v "validate|guard|filter"
 ```
 ## The 12-Layer Stack
 Every agent system has these layers. Any of them can corrupt the answer:
 | # | Layer | What Goes Wrong |
 |---|-------|----------------|
 | 1 | System prompt | Conflicting instructions, instruction bloat |
 | 2 | Session history | Stale context from previous turns |
 | 3 | Long-term memory | Pollution across sessions |
 | 4 | Distillation | Compressed artifacts re-entering as pseudo-facts |
 | 5 | Active recall | Redundant re-summary layers wasting context |
 | 6 | Tool selection | Wrong tool routing, model skips required tools |
 | 7 | Tool execution | Hallucinated execution — claims to call but doesn't |
 | 8 | Tool interpretation | Misread or ignored tool output |
 | 9 | Answer shaping | Format corruption in final response |
 | 10 | Platform rendering | UI/API/CLI mutates valid answers |
 | 11 | Hidden repair loops | Silent fallback/retry agents running second LLM pass |
 | 12 | Persistence | Expired state or cached artifacts reused as live evidence |
 ## Common Failure Patterns
 ### 1. Wrapper Regression
 The base model works fine via direct API call, but the wrapper agent breaks it.
 **Symptoms:**
 - Model works fine in playground, breaks in the agent
 - Added a new prompt layer, existing behavior degraded
 - Agent sounds confident but is confidently wrong
 ### 2. Memory Contamination
 Old topics leak into new conversations through history, memory retrieval, or distillation.
 **Symptoms:**
 - Agent brings up unrelated past topics
 - User corrections don't stick (old memory overwrites new)
 - Same-session artifacts re-enter as pseudo-facts
 ### 3. Tool Discipline Failure
 Tools are declared in the prompt but not enforced in code. The model skips them or hallucinates execution.
 **Symptoms:**
 - "Must use tool X" in prompt, but model answers without calling it
 - Tool results look correct but were never actually executed
 ### 4. Rendering/Transport Corruption
 The agent's internal answer is correct, but the platform layer mutates it during delivery.
 **Symptoms:**
 - Logs show correct answer, user sees broken output
 - Hidden fallback agent quietly replaces the answer before delivery
 ### 5. Hidden Agent Layers
 Silent repair, retry, summarization, or recall agents run without explicit contracts.
 **Symptoms:**
 - Output changes between internal generation and user delivery
 - "Auto-fix" loops run a second LLM pass the user doesn't know about
 ## Severity Model
 | Level | Meaning |
 |-------|---------|
 | `critical` | Agent can confidently produce wrong operational behavior |
 | `high` | Agent frequently degrades correctness or stability |
 | `medium` | Correctness usually survives but output is fragile or wasteful |
 | `low` | Mostly cosmetic or maintainability issues |
 ## Fix Strategy
 Default fix order (code-first, not prompt-first):
 1. **Code-gate tool requirements** — enforce in code, not just prompt text
 2. **Remove or narrow hidden repair agents** — make fallback explicit with contracts
 3. **Reduce context duplication** — same info through prompt + history + memory + distillation
 4. **Tighten memory admission** — user corrections > agent assertions
 5. **Tighten distillation triggers** — don't compress what shouldn't be compressed
 6. **Reduce rendering mutation** — pass-through, don't transform
 7. **Convert to typed JSON envelopes** — structured internal flow, not freeform prose
 ## Report Template
 ```json
 {
  "target_name": "agent-name",
  "symptoms": ["what the user reports"],
  "findings": [
    {
      "severity": "critical|high|medium|low",
      "title": "what went wrong",
      "source_layer": "which of the 12 layers",
      "mechanism": "how it happens",
      "root_cause": "deepest cause",
      "evidence_refs": ["file:line"],
      "recommended_fix": "what to change"
    }
  ],
  "ordered_fix_plan": [
    { "order": 1, "goal": "first thing to fix", "why_now": "why this comes first" }
  ]
 }
 ```
 ## Anti-Patterns to Avoid
 - ❌ Saying "the model is weak" without falsifying the wrapper first
 - ❌ Saying "memory is bad" without showing the contamination path
 - ❌ Letting a clean current state erase a dirty historical incident
 - ❌ Treating markdown prose as a trustworthy internal protocol
 - ❌ Accepting "must use tool" in prompt text when code never enforces it
 ## Full Audit Skill
 For a comprehensive, production-tested audit skill with 10 code-level anti-patterns, 9 audit playbooks, and structured JSON report schema, see:
 **[oh-my-agent-check](https://github.com/huangrichao2020/oh-my-agent-check)**
 This skill has been integrated into production agent platforms including Langflow ([PR](https://github.com/langflow-ai/langflow/pull/12852)), GenericAgent ([PR](https://github.com/lsdefine/GenericAgent/pull/141)), superpowers ([PR](https://github.com/obra/superpowers/pull/1259)), Everything Claude Code ([PR](https://github.com/affaan-m/everything-claude-code/pull/1566)), and OpenCode ([PR](https://github.com/anomalyco/opencode/pull/24023)).