system-prompts-and-models-o.../dealix/dealix/masters/evidence_pack_spec.md
2026-05-01 14:03:52 +03:00

6.2 KiB
Raw Blame History

Evidence Pack Specification

The formal spec for what an Evidence Pack contains, who produces it, who reads it, and how long it lives.


1. Purpose

An Evidence Pack is the auditable, read-only bundle attached to every Tier-A or Tier-B decision in Dealix. It answers four questions, in one artifact:

  1. What was decided? (pointer to the DecisionOutput)
  2. On what basis? (sources, excerpts, hashes, freshness)
  3. What did the system actually do? (tool calls, intended vs actual, side-effects)
  4. How can a human understand it? (bilingual memo in board-grade Arabic + English)

2. Structure

Defined in code at dealix/contracts/evidence_pack.py::EvidencePack.

Fields:

Field Type Description
pack_id string Unique, format pack_<16 hex>
decision_id string FK → DecisionOutput
entity_id string Business entity (lead / deal / partner)
tenant_id string Multi-tenant scope
agent_name string Producing agent
model / model_version string LLM provenance
sources list[EvidenceSource] Everything consulted
tool_calls list[ToolCallRecord] Every tool invocation with intended vs actual
prompts list[PromptRecord] Prompt templates used
data_freshness_window_hours int Max age of sources at decision time
reviewer_id / reviewed_at string Optional HITL reviewer
memo BilingualMemo Title + body + exec summary AR + EN
trace_id string OTel trace linking decision → evidence
created_at ISO-8601 Immutable

3. When is a pack produced?

Mandatory for any decision where any of the following are true:

  • approval_class ∈ {A2, A3}
  • reversibility_class = R3
  • sensitivity_class = S3
  • confidence < 0.7 on a high-stakes decision

Optional but recommended for A1 decisions that a manager may want to audit.

Skip for A0 / R0 / S0 routine decisions (e.g. every lead intake doesn't need a full pack).


4. Who produces it?

The agent that emits the DecisionOutput is responsible for assembling a draft pack. In practice this happens in three steps:

  1. Agent emits DecisionOutput with a list of Evidence items.
  2. A pack assembler (in dealix.contracts.evidence_pack) promotes those items into EvidenceSource records and appends tool_calls from the ToolVerificationLedger for the same decision_id.
  3. If the decision requires HITL, the pack is marked unresolved until reviewer_id is set.

5. Content rules

5.1 Source rules

  • MUST include source name (source), and either a URI or an internal reference.
  • MUST include a verbatim excerpt, max 2000 characters.
  • SHOULD include a content hash (SHA-256) of the full retrieved content.
  • MUST include a retrieval timestamp.
  • Confidence in [0.0, 1.0] required.

5.2 Tool call rules

  • MUST record both intended_action and actual_action.
  • MUST flag contradictions (contradiction_flag = True when they differ).
  • MUST list side-effects plainly (e.g. "created contact id=123 in HubSpot").

5.3 Memo rules

  • Bilingual AR + EN.
  • Board-grade tone; Gulf business register for Arabic.
  • Length: executive summary ≤ 120 words each; body ≤ 600 words each.
  • Must reference the decision's top 3 evidence items inline.

6. Storage

  • Phase 01: in-memory during request; JSON persisted to Postgres evidence_packs table (TODO).
  • Phase 2+: object storage (S3-compatible) for full binary attachments (e.g. retrieved PDFs), with DB row pointing to the object key.
  • Retention: 7 years, aligned with commercial record retention and PDPL legal hold.

7. Access control

  • Evidence Pack Viewer UI reads by pack_id.
  • Authorization via OpenFGA (Phase 2): can_view_evidence_pack(user, pack_id).
  • Phase 01 fallback: role-based — viewer, approver, admin on the relevant tenant.

8. Export

  • Read-only by default. Exporting a pack:

    • Is an S2+ action (it's the customer's commercial data).
    • Requires approval class ≥ A1.
    • Is logged to the audit trail.
  • Export formats:

    • JSON (the canonical format)
    • PDF (generated on demand for handoff to a human)
    • Arabic PDF respects RTL, Gulf typography, board-document layout.

9. Integrity

  • pack_id is immutable.
  • Any edit creates a new pack version with a new pack_id; the old one remains accessible.
  • Optional: sign packs with a per-tenant signing key for tamper evidence (Phase 2+).

10. Anti-patterns

  • Assembling a pack from nothing but the LLM's own prose ("I found that X is true") — must cite sources.
  • Tool calls recorded only on success — failures are evidence too.
  • Memo in only one language.
  • Editing a pack in place instead of versioning.
  • Storing raw S3 content (personal data) in the memo — store pointers + hashes.

11. Example

See dealix/contracts/evidence_pack.py for the Pydantic model. A minimal example:

from dealix.contracts import EvidencePack, EvidenceSource, ToolCallRecord
from dealix.contracts.evidence_pack import BilingualMemo, PromptRecord

pack = EvidencePack(
    decision_id="dec_abc123",
    entity_id="lead_xyz789",
    agent_name="icp_matcher",
    model="claude-sonnet-4-5",
    sources=[
        EvidenceSource(
            source="crm.hubspot.contact",
            uri="hubspot://contacts/456",
            excerpt="Company size: 120 employees; industry: healthcare",
            content_hash="sha256:...",
        ),
    ],
    tool_calls=[
        ToolCallRecord(
            tool_name="hubspot.get_contact",
            intended_action="retrieve contact 456 read-only",
            actual_action="retrieved contact 456 read-only",
            outputs={"id": "456", "industry": "healthcare"},
            verification_status="verified",
        ),
    ],
    prompts=[PromptRecord(template_name="icp_reasoning", template_version="1.0")],
    memo=BilingualMemo(
        title_ar="توصية الملاءمة - مستشفى الرياض",
        title_en="Fit Recommendation — Riyadh Hospital",
        body_ar="...",
        body_en="...",
        executive_summary_ar="الشركة تطابق ملفنا المستهدف بدرجة 0.84 (Tier A)...",
        executive_summary_en="Company matches our ICP at 0.84 (Tier A)...",
    ),
)