mancitrus/system-prompts-and-models-of-ai-tools

mirror of https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools.git synced 2026-06-17 23:09:35 +00:00

Sami Assiri f79c69ff25 ci(dealix): root GitHub workflows, ai-company track, full Dealix API tree

Made-with: Cursor

2026-05-01 14:03:52 +03:00

3.4 KiB

Raw Blame History

🚨 Dealix — Incident Runbook

Use when production breaks. Goal: Restore service + communicate + learn.

Severity Levels

Level	Definition	Response Time	Communication
SEV-1	Full outage, no customers can use service	< 15 min	Immediate Slack + email customers
SEV-2	Major feature broken, most customers affected	< 1 hour	Slack alert, status page update
SEV-3	Minor bug, one customer affected	< 4 hours	Individual customer comms
SEV-4	Cosmetic, not user-blocking	< 24 hours	Ticket only

SEV-1 Response (Full Outage)

Within 5 minutes

Confirm: Open /healthz in browser. If 5xx or timeout → SEV-1.
Check Railway dashboard → service status
Check UptimeRobot → when did it start?

Within 15 minutes

Diagnose:
- Last deploy in Railway?
- Recent PR merged?
- DB connection?
- Moyasar API outage?
Mitigate:
- Roll back last deploy if caused by recent change
- Restart service in Railway
- Check env vars

Communicate

Post to customers (if any active):

نواجه مشكلة فنية مؤقتة في النظام. الفريق يعمل على حلها.
سنحدثكم خلال 30 دقيقة.
— فريق Dealix

After Resolution (within 48h)

Write post-mortem:

Timeline
Root cause
What worked
What didn't
Action items to prevent recurrence

Common Issues + Fixes

Issue: `/api/v1/*` returns 404

Likely cause: Deploy failed or wrong Start Command. Fix:

Railway → Deployments → check latest deploy status
If failed: check logs, fix, redeploy
If succeeded but still 404: Settings → Start Command = /app/start.sh

Issue: Moyasar webhook returns 401

Likely cause: Secret mismatch. Fix:

Railway → Variables → MOYASAR_WEBHOOK_SECRET
Moyasar Dashboard → Webhooks → same secret
Must be identical string

Issue: Database connection refused

Likely cause: DATABASE_URL wrong or Postgres add-on down. Fix:

Railway → PostgreSQL service → check status
Copy connection string
Update env var
Redeploy

Issue: High error rate in Sentry

Likely cause: New deploy or traffic spike. Fix:

Check last deploy diff
If unrelated: scale Railway resources
If related: roll back

Rollback Procedure

Railway Rollback (2 minutes)

Railway → Deployments
Find previous successful deployment
Click ... → Redeploy
Wait for Active status
Verify /healthz = 200

Git Revert (if code caused)

git checkout main
git revert <bad-commit-sha>
git push origin main
# CI runs, deploy triggered automatically

Who to Contact

Issue	Contact
Backend down	Sami (founder, on-call 24/7)
Payment processing	Moyasar support
Domain DNS	Domain registrar
Hosting	Railway support

Monitoring Setup Check

Run monthly:

Sentry alerts still firing? (trigger test error)
UptimeRobot still polling? (check dashboard)
Slack channel #dealix-alerts active?
Emergency phone numbers current?

Learning from Incidents

Every SEV-1 or SEV-2 requires:

Post-mortem within 48 hours
File in docs/ops/postmortems/YYYY-MM-DD-summary.md
Review in weekly team sync (even solo)
Update this runbook if new pattern

3.4 KiB Raw Blame History