mancitrus/system-prompts-and-models-of-ai-tools

mirror of https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools.git synced 2026-06-17 23:09:35 +00:00

Sami Assiri f79c69ff25 ci(dealix): root GitHub workflows, ai-company track, full Dealix API tree

Made-with: Cursor

2026-05-01 14:03:52 +03:00

6.2 KiB

Raw Blame History

Evidence Pack Specification

The formal spec for what an Evidence Pack contains, who produces it, who reads it, and how long it lives.

1. Purpose

An Evidence Pack is the auditable, read-only bundle attached to every Tier-A or Tier-B decision in Dealix. It answers four questions, in one artifact:

What was decided? (pointer to the DecisionOutput)
On what basis? (sources, excerpts, hashes, freshness)
What did the system actually do? (tool calls, intended vs actual, side-effects)
How can a human understand it? (bilingual memo in board-grade Arabic + English)

2. Structure

Defined in code at dealix/contracts/evidence_pack.py::EvidencePack.

Fields:

Field	Type	Description
`pack_id`	string	Unique, format `pack_<16 hex>`
`decision_id`	string	FK → DecisionOutput
`entity_id`	string	Business entity (lead / deal / partner)
`tenant_id`	string	Multi-tenant scope
`agent_name`	string	Producing agent
`model` / `model_version`	string	LLM provenance
`sources`	list[EvidenceSource]	Everything consulted
`tool_calls`	list[ToolCallRecord]	Every tool invocation with intended vs actual
`prompts`	list[PromptRecord]	Prompt templates used
`data_freshness_window_hours`	int	Max age of sources at decision time
`reviewer_id` / `reviewed_at`	string	Optional HITL reviewer
`memo`	BilingualMemo	Title + body + exec summary AR + EN
`trace_id`	string	OTel trace linking decision → evidence
`created_at`	ISO-8601	Immutable

3. When is a pack produced?

Mandatory for any decision where any of the following are true:

approval_class ∈ {A2, A3}
reversibility_class = R3
sensitivity_class = S3
confidence < 0.7 on a high-stakes decision

Optional but recommended for A1 decisions that a manager may want to audit.

Skip for A0 / R0 / S0 routine decisions (e.g. every lead intake doesn't need a full pack).

4. Who produces it?

The agent that emits the DecisionOutput is responsible for assembling a draft pack. In practice this happens in three steps:

Agent emits DecisionOutput with a list of Evidence items.
A pack assembler (in dealix.contracts.evidence_pack) promotes those items into EvidenceSource records and appends tool_calls from the ToolVerificationLedger for the same decision_id.
If the decision requires HITL, the pack is marked unresolved until reviewer_id is set.

5. Content rules

5.1 Source rules

MUST include source name (source), and either a URI or an internal reference.
MUST include a verbatim excerpt, max 2000 characters.
SHOULD include a content hash (SHA-256) of the full retrieved content.
MUST include a retrieval timestamp.
Confidence in [0.0, 1.0] required.

5.2 Tool call rules

MUST record both intended_action and actual_action.
MUST flag contradictions (contradiction_flag = True when they differ).
MUST list side-effects plainly (e.g. "created contact id=123 in HubSpot").

5.3 Memo rules

Bilingual AR + EN.
Board-grade tone; Gulf business register for Arabic.
Length: executive summary ≤ 120 words each; body ≤ 600 words each.
Must reference the decision's top 3 evidence items inline.

6. Storage

Phase 0–1: in-memory during request; JSON persisted to Postgres evidence_packs table (TODO).
Phase 2+: object storage (S3-compatible) for full binary attachments (e.g. retrieved PDFs), with DB row pointing to the object key.
Retention: 7 years, aligned with commercial record retention and PDPL legal hold.

7. Access control

Evidence Pack Viewer UI reads by pack_id.
Authorization via OpenFGA (Phase 2): can_view_evidence_pack(user, pack_id).
Phase 0–1 fallback: role-based — viewer, approver, admin on the relevant tenant.

8. Export

Read-only by default. Exporting a pack:
- Is an S2+ action (it's the customer's commercial data).
- Requires approval class ≥ A1.
- Is logged to the audit trail.
Export formats:
- JSON (the canonical format)
- PDF (generated on demand for handoff to a human)
- Arabic PDF respects RTL, Gulf typography, board-document layout.

9. Integrity

pack_id is immutable.
Any edit creates a new pack version with a new pack_id; the old one remains accessible.
Optional: sign packs with a per-tenant signing key for tamper evidence (Phase 2+).

10. Anti-patterns

❌ Assembling a pack from nothing but the LLM's own prose ("I found that X is true") — must cite sources.
❌ Tool calls recorded only on success — failures are evidence too.
❌ Memo in only one language.
❌ Editing a pack in place instead of versioning.
❌ Storing raw S3 content (personal data) in the memo — store pointers + hashes.

11. Example

See dealix/contracts/evidence_pack.py for the Pydantic model. A minimal example:

from dealix.contracts import EvidencePack, EvidenceSource, ToolCallRecord
from dealix.contracts.evidence_pack import BilingualMemo, PromptRecord

pack = EvidencePack(
    decision_id="dec_abc123",
    entity_id="lead_xyz789",
    agent_name="icp_matcher",
    model="claude-sonnet-4-5",
    sources=[
        EvidenceSource(
            source="crm.hubspot.contact",
            uri="hubspot://contacts/456",
            excerpt="Company size: 120 employees; industry: healthcare",
            content_hash="sha256:...",
        ),
    ],
    tool_calls=[
        ToolCallRecord(
            tool_name="hubspot.get_contact",
            intended_action="retrieve contact 456 read-only",
            actual_action="retrieved contact 456 read-only",
            outputs={"id": "456", "industry": "healthcare"},
            verification_status="verified",
        ),
    ],
    prompts=[PromptRecord(template_name="icp_reasoning", template_version="1.0")],
    memo=BilingualMemo(
        title_ar="توصية الملاءمة - مستشفى الرياض",
        title_en="Fit Recommendation — Riyadh Hospital",
        body_ar="...",
        body_en="...",
        executive_summary_ar="الشركة تطابق ملفنا المستهدف بدرجة 0.84 (Tier A)...",
        executive_summary_en="Company matches our ICP at 0.84 (Tier A)...",
    ),
)

6.2 KiB Raw Blame History Unescape Escape