3.4 KiB
Execution fabric — durable commitments (current vs Tier-1 target)
Canonical: MASTER_OPERATING_PROMPT.md.
Six tracks: ../dealix-six-tracks.md.
Principle
Anything that:
- lasts hours to weeks,
- crosses multiple systems,
- needs retries, idempotency, or compensation,
- and must not be lost on crash, restart, or deploy
belongs in the execution plane, implemented as deterministic workflows — not as ephemeral agent narration alone.
Current state (this repository — evidence-based)
| Mechanism | Role | Typical use in Dealix |
|---|---|---|
| FastAPI | Synchronous / async request path | APIs, webhooks entrypoints |
| Celery | Async tasks, beat schedules | Notifications, SLA ticks, background jobs |
| LangGraph (where used) | Stateful agent graphs, HITL interrupts | Cognition + bounded flows with checkpoints |
salesflow-saas/backend/app/flows/ |
Named durable-style flows | Prospecting, self-improvement, etc. (verify each flow’s persistence model in code) |
This stack is valid for many SaaS patterns. Tier-1 does not require ripping it out overnight; it requires clear ownership and criteria for when a flow must graduate to a stronger runtime.
Tier-1 target: Temporal (or equivalent durable workflow engine)
Temporal (or another workflow engine with the same properties) is the documented target for:
- cross-system business commitments (signatures, partner activation, DD room state, PMI milestones),
- worker versioning and safe rollout of workflow code,
- crash-proof resume after process/network failure.
Status: Planned until an ADR-approved spike ships with tests. See ../adr/0001-tier1-execution-policy-spikes.md.
When to keep Celery vs graduate to Temporal
| Signal | Prefer Celery / short task | Prefer Temporal / workflow engine |
|---|---|---|
| Duration | Minutes, single service | Hours–days, multi-step state machine |
| State | Task idempotency sufficient | Long-lived state + human waits + versioning |
| Compensation | Rare, manual acceptable | Required, audited, replayable |
| Failure domain | Retry and DLQ enough | Must resume exact step after deploy |
LangGraph vs Temporal (division of labor)
- LangGraph: decision-centric cognition, structured outputs, interrupts, bounded execution loops tied to agent sessions.
- Temporal: system-of-record for long-lived business processes, external side effects, and compensation — especially when multiple teams or services participate.
Do not duplicate the same external commitment path in both without an explicit boundary (one source of truth for “what step are we on?”).
Evidence before production promotion
Any new execution path that sends customer messages, moves money, signs contracts, or opens external systems must:
- Carry approval_class, reversibility_class, sensitivity_class (see approval-policy.md).
- Emit correlation/trace IDs and persist audit-friendly records.
- Pass security gate and release checklist for the environment.
See also: events-and-schema.md, trust-fabric.md, github-and-release.md.