system-prompts-and-models-o.../docs/governance/execution-fabric.md
Sami Assiri b4531f0a4c feat(tier1): docs-governance CI, evidence gate, closure artifacts, trust/execution docs
- Replace repo-preflight with docs-governance workflow and check_docs_links.py
- Class B bundle: require correlation_id for external_*; AuditMetadata trace fields
- Root-safe TIER1 §2; optional .githooks pre-push for main
- Add RELEASE_READINESS_MATRIX_AR, SOURCE_OF_TRUTH_INDEX, operational severity, external index
- ExecWeeklyGovernanceContract; expand trust-fabric, execution-fabric, ADR-0001, ws5, Saudi overlays
- Wire MASTER TOC, enterprise-readiness, completion-program, architecture_brief paths

Made-with: Cursor
2026-04-16 16:46:36 +03:00

4.8 KiB
Raw Blame History

Execution fabric — durable commitments (current vs Tier-1 target)

Canonical: MASTER_OPERATING_PROMPT.md.
Six tracks: ../dealix-six-tracks.md.

Principle

Anything that:

  • lasts hours to weeks,
  • crosses multiple systems,
  • needs retries, idempotency, or compensation,
  • and must not be lost on crash, restart, or deploy

belongs in the execution plane, implemented as deterministic workflows — not as ephemeral agent narration alone.

Current state (this repository — evidence-based)

Mechanism Role Typical use in Dealix
FastAPI Synchronous / async request path APIs, webhooks entrypoints
Celery Async tasks, beat schedules Notifications, SLA ticks, background jobs
LangGraph (where used) Stateful agent graphs, HITL interrupts Cognition + bounded flows with checkpoints
salesflow-saas/backend/app/flows/ Named durable-style flows Prospecting, self-improvement, etc. (verify each flows persistence model in code)

This stack is valid for many SaaS patterns. Tier-1 does not require ripping it out overnight; it requires clear ownership and criteria for when a flow must graduate to a stronger runtime.

Tier-1 target: Temporal (or equivalent durable workflow engine)

Temporal (or another workflow engine with the same properties) is the documented target for:

  • cross-system business commitments (signatures, partner activation, DD room state, PMI milestones),
  • worker versioning and safe rollout of workflow code,
  • crash-proof resume after process/network failure.

Status: Planned until an ADR-approved spike ships with tests. See ../adr/0001-tier1-execution-policy-spikes.md.

When to keep Celery vs graduate to Temporal

Signal Prefer Celery / short task Prefer Temporal / workflow engine
Duration Minutes, single service Hoursdays, multi-step state machine
State Task idempotency sufficient Long-lived state + human waits + versioning
Compensation Rare, manual acceptable Required, audited, replayable
Failure domain Retry and DLQ enough Must resume exact step after deploy

LangGraph vs Temporal (division of labor)

  • LangGraph: decision-centric cognition, structured outputs, interrupts, bounded execution loops tied to agent sessions.
  • Temporal: system-of-record for long-lived business processes, external side effects, and compensation — especially when multiple teams or services participate.

Do not duplicate the same external commitment path in both without an explicit boundary (one source of truth for “what step are we on?”).

Evidence before production promotion

Any new execution path that sends customer messages, moves money, signs contracts, or opens external systems must:

  1. Carry approval_class, reversibility_class, sensitivity_class (see approval-policy.md).
  2. Emit correlation/trace IDs and persist audit-friendly records.
  3. Pass security gate and release checklist for the environment.

See also: events-and-schema.md, trust-fabric.md, github-and-release.md.


LangGraph durability modes (policy sketch)

Classify each graph by how state must survive process restarts and deploys:

Mode When to use Notes
exit Ephemeral assistance, no business state Graph ends with the HTTP/session; no recovery requirement.
async Bounded background continuation acceptable Tasks may be lost on crash unless explicitly checkpointed — document the loss window.
sync / durable checkpoint HITL waits, multi-step approvals, or any path that can cause external_message / external_commitment Require checkpointing + idempotency keys aligned with ExecutionIntent; prefer graduating external effects to Temporal per division-of-labor above.

External references: LangGraph durable execution — ../references/tier1-external-index.md.

HITL taxonomy (approve / edit / reject)

Human-in-the-loop steps on governed paths MUST record one of: approve (proceed as proposed), edit (proceed with amended structured payload), reject (terminate with reason). Map API fields and audit events to this taxonomy consistently (LangChain HITL vocabulary — same external index).

Rule: reject on Class B / R2+ MUST emit a policy-safe audit row and MUST NOT leave dangling ExecutionIntent rows marked runnable.