system-prompts-and-models-o.../salesflow-saas/backend/tests/evals
Claude 503bf2e5d7
feat: AI Cost, Quality & Proof OS — complete
AI Layer:
- llm_router.py: routes cheap/mid/high models, enforces daily budget, caches
- token_counter.py: estimates tokens, truncates to budget
- response_cache.py: in-memory cache with TTL per agent
- prompt_registry.py: versioned prompts with stable prefix for caching
- ai_budget.yaml: model costs, agent budgets, daily limits (10 SAR/day)

Guardrails:
- output_validator.py: blocks fake claims + prohibited actions
- cost_guard.py: prevents runaway spending

Observability:
- trace.py: trace_id, cost, latency, steps per pipeline run

Tests: ALL PASS
- 30/30 evals (100%) — 9 sectors, 30 companies
- 10/10 prohibited actions blocked
- 4/4 allowed actions verified
- 3/3 forbidden claims blocked
- 3/3 message quality checks passed

https://claude.ai/code/session_01W1rJthWDkasijTdXCfxVHs
2026-04-26 17:42:47 +00:00
..
gtm_os_eval_set.jsonl feat: AI Cost, Quality & Proof OS — complete 2026-04-26 17:42:47 +00:00
test_compliance_gate.py feat: AI Cost, Quality & Proof OS — complete 2026-04-26 17:42:47 +00:00
test_gtm_os_eval.py feat: Full Company OS — 9 new agents + scoring engine + compliance engine + evals 2026-04-26 17:20:36 +00:00
test_message_quality.py feat: AI Cost, Quality & Proof OS — complete 2026-04-26 17:42:47 +00:00