system-prompts-and-models-o.../dealix/docs/ops/DATABASE_STATE.md
2026-05-01 14:03:52 +03:00

4.6 KiB

🗄️ Dealix — Production Database State

Last seeded: 2026-04-24 Seed script: scripts/seed_production_db.py Target: Postgres on Railway (env: Dealix / service: web) Source: docs/ops/lead_machine/SAUDI_LEAD_GRAPH_MASTER.csv


What's in the DB right now

leads table

Field Example value
id auto-generated
source manual
company_name Foodics, Lucidya, Salla, ...
contact_name CEO / Founder / BD Lead (role-based)
sector SaaS Restaurant / CXM / E-commerce / Fintech / ...
region SA / AE-SA / KW-SA / etc.
locale ar
status new (after intake)
message first_message_angle + priority + opportunity_type + offer_recommended
fit_score (computed by Phase 8 pipeline on insert)
created_at 2026-04-24

Row count: 158 leads seeded (all eligible rows from the 159 in master CSV minus HIGH/BLOCKED/HOLD_FOR_APPROVAL)

deals table

Empty until first conversation moves to qualified stage.

agent_runs table

Populated as endpoints run. Not user-visible data.


Health check status

GET /health/deep:
  postgres   → async driver OK, writes succeeding (158 rows inserted)
               sync health-check fails (psycopg2 not installed — cosmetic only)
  redis      → skip (no REDIS_URL — optional, not needed for Phase 8)
  llm        → fail (no LLM keys configured)

Interpretation: writes work, reads work, the "fail" on postgres in /health/deep is a misleading sync-driver check. The async driver (asyncpg) is what the app actually uses.


How to re-seed / update

cd dealix
python scripts/seed_production_db.py
# Or with custom paths:
python scripts/seed_production_db.py --api https://web-dealix.up.railway.app --csv path/to/leads.csv

Idempotency note: The intake pipeline uses dedup_hash based on email+phone+company. Re-running the seed will NOT duplicate entries that already exist. New companies added to the CSV will be appended on next run.


Who sees this data

Access Path
Internal (Sami) Railway Postgres dashboard → leads table
Frontend form POST /api/v1/leads (live) — new leads auto-score + CRM sync
Public No listing endpoint yet (privacy)

Privacy note: no personal phone numbers / emails for the 158 seeded rows (only role names like "CEO"). Real customer data gets stored only when someone submits a form or replies to outreach and explicitly consents.


Upgrade path (when LLM key added)

Once GROQ_API_KEY or ANTHROPIC_API_KEY is set in Railway env Dealix/web:

  1. Each lead on insert goes through full pipeline:
    • Intake (dedup + normalize)
    • ICP Matcher (fit_score via LLM)
    • Pain Extractor (extract pains from message)
    • Qualification (BANT)
    • CRM sync (HubSpot if configured)
    • Booking (Calendly if auto_book=true)
  2. fit_score becomes meaningful (not 0.0)
  3. Pipeline response includes rich extraction + qualification details

Without LLM key, only intake + dedup run (all 158 rows have intake + dedup complete).


Next DB work (when revenue starts)

When 1st customer signs:

  • Populate deals table with their signed plan
  • Link deal → lead via foreign key
  • Add customer_success_tracker.csv as new table

When 5+ customers:

  • Add conversations table (for reply handling log)
  • Add payments table (manual + Moyasar)
  • Add partners table (agencies)

When 20+ customers:

  • Add case_studies table
  • Add testimonials table
  • Full CRM extension

Backup rhythm

Railway Postgres has automatic daily backups (7-day retention on free tier). Recommend:

  1. Weekly: export leads table → CSV → commit to docs/ops/lead_machine/db_snapshot_YYYY-MM-DD.csv
  2. Monthly: full pg_dump to local storage
  3. Before major migrations: manual backup

Cost estimate

Item Cost
Railway Postgres (starter) $5/month
Railway app (web service) $5/month
Total infrastructure $10/month
Optional at scale:
PostHog analytics Free tier fine for 100 customers
Sentry $26/month (Team plan, when ready)
Custom domain SSL Free (Let's Encrypt auto)

Scaling plan: infra stays < $30/month up to 1,000 customers.


Recovery playbook

If DB corrupts or gets wiped:

  1. Restore from Railway backup (last 24h always available)
  2. Re-run python scripts/seed_production_db.py
  3. Re-upload customer-specific data from manual_payment_log.md
  4. Verify with sample GET (once read endpoint is built)
  5. Document incident in docs/ops/INCIDENT_RUNBOOK.md

RTO: < 30 minutes from detection. RPO: < 24 hours (daily backup cadence).