feat: Add Second Brain wiki system + enhanced deployment checklist

- wiki/README.md: Wiki system guide with page templates and conventions
- wiki/architecture.md: Dealix architecture wiki page
- deployment-checklist.md: Enhanced with full pre-deploy, deploy, post-deploy, rollback procedures

https://claude.ai/code/session_01LsnvBa7HwF5hs99VZbgLGj
This commit is contained in:
Claude 2026-04-11 08:14:20 +00:00
parent 2717f2943b
commit afd37142fe
No known key found for this signature in database
3 changed files with 518 additions and 17 deletions

View File

@ -1,30 +1,280 @@
# Deployment Checklist — Dealix # Deployment Checklist — Dealix
**Last Updated**: 2026-04-11
**Stack**: FastAPI + Next.js + PostgreSQL 16 + Redis + Celery
---
## Pre-Deploy ## Pre-Deploy
- [ ] All tests pass: `pytest -v`
- [ ] No pending migrations: `alembic heads` ### 1. Database Migrations
- [ ] Environment variables set in production ```bash
- [ ] Docker images built: `docker-compose build` # Check current migration state
- [ ] Database backed up: `pg_dump dealix > backup.sql` cd backend && alembic current
# Verify migration chain has no branches
cd backend && alembic heads
# Run migrations on staging first
cd backend && alembic upgrade head
# Verify migration applied
cd backend && alembic current
```
- [ ] All migrations tested on staging with production-like data
- [ ] Destructive migrations (column drops, table deletes) have been reviewed
- [ ] Migration is reversible — `downgrade()` tested
- [ ] Large table migrations have been benchmarked for lock duration
### 2. Environment Variables
Verify all required variables are set in the target environment:
```bash
# Required — app will not start without these
DATABASE_URL=postgresql+asyncpg://user:pass@host:5432/dealix
REDIS_URL=redis://host:6379/0
JWT_SECRET_KEY=<random-64-char-string>
JWT_ALGORITHM=HS256
# Required — core features break without these
GROQ_API_KEY=<groq-api-key>
OPENAI_API_KEY=<openai-api-key>
ULTRAMSG_INSTANCE_ID=<instance-id>
ULTRAMSG_TOKEN=<token>
# Required for billing
STRIPE_SECRET_KEY=<stripe-secret>
STRIPE_WEBHOOK_SECRET=<stripe-webhook-secret>
# Required for monitoring
SENTRY_DSN=<sentry-dsn>
# Optional
SMTP_HOST=smtp.provider.com
SMTP_PORT=587
SMTP_USER=<email>
SMTP_PASSWORD=<password>
```
- [ ] No placeholder values (`changeme`, `xxx`, `TODO`)
- [ ] Secrets are not shared between staging and production
- [ ] JWT_SECRET_KEY is unique per environment
- [ ] Database credentials use a dedicated app user (not `postgres` superuser)
### 3. Secrets Management
- [ ] All secrets stored in environment variables (not in code or config files)
- [ ] `.env` file is NOT committed to git (verify: `git ls-files .env`)
- [ ] Production secrets are in a secrets manager (AWS SSM, Vault, etc.)
- [ ] API keys have appropriate scope (not admin keys for read-only operations)
### 4. DNS & SSL
- [ ] Domain DNS points to the correct server IP
- [ ] SSL certificate is valid and not expiring within 30 days
- [ ] HTTPS redirect is configured in Nginx
- [ ] API subdomain configured (e.g., `api.dealix.sa`)
- [ ] CORS origins updated to match production domain
### 5. Backup Verification
- [ ] PostgreSQL automated backup is configured and tested
- [ ] Last backup restore test completed within the past 7 days
- [ ] Redis persistence enabled (AOF or RDB)
- [ ] Backup retention policy: minimum 7 days, recommended 30 days
---
## Deploy ## Deploy
### 1. Build & Deploy
```bash ```bash
# Pull latest code
git pull origin main git pull origin main
docker-compose down && docker-compose build --no-cache && docker-compose up -d
docker-compose exec backend alembic upgrade head # Build all containers
docker compose build --no-cache
# Run migrations before starting the app
docker compose run --rm backend alembic upgrade head
# Start all services
docker compose up -d
# Verify all containers are running
docker compose ps
```
### 2. Health Checks
```bash
# Backend health
curl -f https://api.dealix.sa/api/v1/health || echo "BACKEND DOWN"
# Frontend health
curl -f https://app.dealix.sa/ || echo "FRONTEND DOWN"
# Database connectivity (via health endpoint)
curl -s https://api.dealix.sa/api/v1/health | python3 -c "
import sys, json
d = json.load(sys.stdin)
assert d.get('database') == 'ok', 'Database check failed'
print('Database: OK')
"
# Redis connectivity
curl -s https://api.dealix.sa/api/v1/health | python3 -c "
import sys, json
d = json.load(sys.stdin)
assert d.get('redis') == 'ok', 'Redis check failed'
print('Redis: OK')
"
# Celery worker
docker compose exec celery-worker celery -A app.workers inspect ping
```
- [ ] Backend returns 200 on `/api/v1/health`
- [ ] Frontend loads without JavaScript errors
- [ ] Database connection pool is healthy
- [ ] Redis is connected
- [ ] Celery worker is processing tasks
### 3. Smoke Tests
Run critical path tests against the deployed environment:
```bash
# Auth flow
curl -X POST https://api.dealix.sa/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "smoke-test@dealix.sa", "password": "test-password"}'
# Lead creation (with auth token)
TOKEN="<token-from-login>"
curl -X POST https://api.dealix.sa/api/v1/leads \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name": "اختبار", "phone": "+966501234567", "source": "smoke_test"}'
# WhatsApp connectivity check
curl -X POST https://api.dealix.sa/api/v1/integrations/whatsapp/test \
-H "Authorization: Bearer ${TOKEN}"
```
- [ ] Login returns valid JWT token
- [ ] Lead creation succeeds with Arabic name
- [ ] WhatsApp integration responds
- [ ] Deal creation and pipeline update work
- [ ] File upload endpoint accepts and stores files
---
## Post-Deploy
### 1. Monitor Errors (First 30 Minutes)
- [ ] Watch Sentry for new errors: `https://sentry.io/organizations/dealix/`
- [ ] Check application logs: `docker compose logs -f backend --since 30m`
- [ ] Monitor Celery worker: `docker compose logs -f celery-worker --since 30m`
- [ ] Check for 5xx errors in Nginx access logs
### 2. Performance Verification
```bash
# Response time check (should be <500ms for API, <2s for pages)
time curl -s -o /dev/null -w "%{http_code} %{time_total}s" https://api.dealix.sa/api/v1/health
time curl -s -o /dev/null -w "%{http_code} %{time_total}s" https://app.dealix.sa/
```
- [ ] API P95 latency < 500ms
- [ ] Frontend initial load < 3s
- [ ] Database query time < 100ms for common operations
- [ ] No memory leaks (container memory stable after 15 minutes)
### 3. Verify Arabic UI
- [ ] Dashboard displays correctly in RTL layout
- [ ] Arabic text renders without encoding issues
- [ ] Date displays in Hijri/Gregorian as configured
- [ ] Currency shows as SAR with Arabic numerals option
- [ ] Phone input accepts +966 format
- [ ] Email notifications render Arabic correctly
### 4. Notify Stakeholders
- [ ] Post deployment status in team channel
- [ ] Update status page (if applicable)
- [ ] Notify affected customers of any breaking changes
---
## Rollback Procedure
### Immediate Rollback (< 5 minutes)
If critical issues are discovered post-deploy:
```bash
# 1. Stop current containers
docker compose down
# 2. Revert to previous code
git log --oneline -5 # Find the previous stable commit
git checkout <previous-commit-hash>
# 3. Rebuild and restart
docker compose build --no-cache
docker compose up -d
# 4. Verify rollback
curl -f https://api.dealix.sa/api/v1/health curl -f https://api.dealix.sa/api/v1/health
``` ```
## Post-Deploy ### Database Rollback
- [ ] Check Sentry for errors (15 min) If the migration caused issues:
- [ ] Test login + Arabic UI
- [ ] Test WhatsApp send/receive
- [ ] Verify Celery workers active
## Rollback
```bash ```bash
docker-compose down # 1. Identify current and target revision
git checkout HEAD~1 docker compose exec backend alembic current
docker-compose build && docker-compose up -d docker compose exec backend alembic history --verbose | head -20
docker-compose exec backend alembic downgrade -1
# 2. Downgrade to the previous revision
docker compose exec backend alembic downgrade -1
# 3. Verify
docker compose exec backend alembic current
``` ```
**WARNING**: If the migration was destructive (dropped columns/tables), data may be lost. Restore from backup instead:
```bash
# Restore PostgreSQL from backup
pg_restore --clean --if-exists -d dealix /backups/dealix_<timestamp>.dump
# Verify data integrity after restore
docker compose exec backend python3 -c "
from sqlalchemy import text
from app.database import engine
with engine.connect() as conn:
result = conn.execute(text('SELECT count(*) FROM leads'))
print('Lead count:', result.scalar())
"
```
### Rollback Checklist
- [ ] Previous version is running and healthy
- [ ] Database is consistent (check foreign key integrity)
- [ ] No orphaned background tasks in Celery
- [ ] Caches cleared if schema changed: `docker compose exec redis redis-cli FLUSHDB`
- [ ] Stakeholders notified of rollback and timeline for fix
---
## Staging vs Production Differences
| Aspect | Staging | Production |
|--------|---------|------------|
| Domain | `staging.dealix.sa` | `app.dealix.sa` / `api.dealix.sa` |
| Database | `dealix_staging` | `dealix_production` |
| Stripe | Test mode keys | Live mode keys |
| WhatsApp | Sandbox instance | Production UltraMSG instance |
| Sentry | `staging` environment tag | `production` environment tag |
| AI Models | Lower-cost models OK | Production model configuration |
| Data | Synthetic test data | Real customer data |
| SSL | Let's Encrypt staging | Let's Encrypt production |
| Backups | Daily, 3-day retention | Hourly, 30-day retention |
| Scaling | Single instance | Load-balanced (when needed) |
| Feature flags | All enabled for testing | Controlled per-tenant |
| Logging | Debug level | Info level (debug on demand) |

View File

@ -0,0 +1,87 @@
# Dealix Wiki System — Second Brain
## Purpose
The wiki is the canonical knowledge layer for Dealix. Every important decision, architecture choice, customer insight, and operational pattern lives here in structured, linkable pages. AI agents and human contributors both read and write to this wiki.
## Page Template
Every wiki page must follow this frontmatter structure:
```markdown
# Page Title (عنوان الصفحة)
**Type**: architecture | product | gtm | customer | operations | security | tooling | glossary
**Summary**: One-line English summary
**Summary_AR**: ملخص بسطر واحد بالعربية
**Key Facts**:
- Fact 1
- Fact 2
- Fact 3
**Provenance**: Where this knowledge came from (e.g., "ADR-001", "Customer interview — Acme Corp", "Claude session 2026-04-10")
**Confidence**: high | medium | low
**Related Pages**: [page1](./page1.md), [page2](./page2.md)
**Last Updated**: 2026-04-11
**Stale**: false
---
(Page body goes here — use headers, bullet lists, code blocks, diagrams as needed.)
```
### Field Definitions
| Field | Required | Description |
|-------|----------|-------------|
| **Type** | Yes | Category for indexing. Must match one of the defined types. |
| **Summary** | Yes | English one-liner. Max 120 characters. |
| **Summary_AR** | Yes | Arabic one-liner. Max 120 characters. |
| **Key Facts** | Yes | 3-7 bullet points capturing the essential knowledge. |
| **Provenance** | Yes | Source of the information. Links to ADRs, sessions, interviews, docs. |
| **Confidence** | Yes | `high` = verified by multiple sources or production data. `medium` = single reliable source. `low` = inferred or speculative. |
| **Related Pages** | Yes | At least one link to another wiki page. Orphan pages are flagged by lint. |
| **Last Updated** | Yes | ISO date of last meaningful update. |
| **Stale** | Yes | `true` if page has not been reviewed in 30+ days. |
## How to Create a Page
1. Choose the correct **type** from the list above.
2. Create a new `.md` file in `memory/wiki/` using kebab-case naming: `feature-flags.md`, `customer-acme.md`.
3. Fill in all template fields. Do not leave any blank.
4. Add at least one link in **Related Pages** pointing to an existing wiki page.
5. Add the new page to `memory/indexes/master-index.md` under the appropriate section.
6. If the page summarizes a decision, also create or link to an ADR in `memory/adr/`.
## Linking Conventions
- Use relative paths: `[Architecture](./architecture.md)`
- Link to ADRs: `[ADR-001](../adr/001-multi-tenant.md)`
- Link to memory sections: `[Launch Plan](../growth/launch-plan.md)`
- Cross-reference inside page body using inline links, not footnotes.
- Every page should have at least 2 outbound links.
- When mentioning a glossary term for the first time, link to `[glossary](./glossary.md)`.
## Review Schedule
| Cadence | Action |
|---------|--------|
| **Weekly** | Run `KnowledgeBrain.lint()` to detect stale pages (>30 days without update), orphan pages (no inbound links), missing provenance, and duplicates. |
| **Bi-weekly** | Review all `low` confidence pages. Upgrade to `medium` if verified, or archive if obsolete. |
| **Monthly** | Review master index for completeness. Ensure every active service, integration, and process has a wiki page. |
| **Per release** | Update architecture and product pages affected by the release. |
## Stale Page Protocol
1. `KnowledgeBrain.lint()` marks pages as `Stale: true` if `Last Updated` is older than 30 days.
2. Stale pages appear in the weekly review report.
3. A reviewer either:
- Updates the page and sets `Stale: false` with a new `Last Updated` date.
- Archives the page by moving it to `memory/wiki/archive/` and removing it from the master index.
- Confirms the page is still accurate and bumps `Last Updated` without content changes.
## Quality Rules
- No page may exist without provenance. "Unknown" is not acceptable.
- Confidence must be justified: `high` requires a link to source material.
- Arabic summaries are mandatory. Dealix is Arabic-first.
- Pages must not exceed 500 lines. Split large topics into sub-pages.
- Code examples must be tested or marked with `<!-- untested -->`.

View File

@ -0,0 +1,164 @@
# Dealix System Architecture (بنية نظام ديلكس)
**Type**: architecture
**Summary**: Multi-tenant AI CRM built on FastAPI + Next.js + PostgreSQL + Redis + Celery with Arabic-first UX and PDPL compliance.
**Summary_AR**: نظام إدارة علاقات عملاء ذكي متعدد المستأجرين مبني على FastAPI و Next.js و PostgreSQL مع واجهة عربية أولاً والتوافق مع نظام حماية البيانات.
**Key Facts**:
- Backend: FastAPI 0.115.6 on Python 3.12, async everywhere (asyncpg, async SQLAlchemy)
- Frontend: Next.js 15 with App Router, TypeScript 5.7, RTL-first layout
- Database: PostgreSQL 16 with tenant_id isolation on every table; Alembic migrations
- Cache/Queue: Redis 7 for caching + Celery 5.x task broker with 4 workers + Celery Beat scheduler
- AI Engine: Groq (primary) with OpenAI fallback; Arabic NLP, lead scoring, conversation intelligence
- Compliance: PDPL-native — consent checked before every outbound message; SAR 5M penalty per violation
- Multi-agent system: Manus-style orchestrator with 8 specialized roles and event-to-agent routing
**Provenance**: AGENTS.md, CLAUDE.md, memory/architecture/system-overview.md, docker-compose.yml
**Confidence**: high
**Related Pages**: [glossary](./glossary.md), [system-overview](../architecture/system-overview.md)
**Last Updated**: 2026-04-11
**Stale**: false
---
## High-Level Architecture
```
┌──────────────────────────┐
│ Nginx Reverse Proxy │
└─────────┬────────┬─────────┘
│ │
┌─────────────┘ └──────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Next.js Frontend │ │ FastAPI Backend │
│ (Port 3000) │ │ (Port 8000) │
│ - Dashboard │ │ - API v1 │
│ - Landing Page │ │ - Services Layer │
│ - Auth Flows │ │ - AI Engine │
│ - Pipeline View │ │ - Agent System │
│ - RTL / Arabic │ │ - Integrations │
└──────────────────┘ └────────┬───────────┘
┌─────────────┼──────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌────────────┐
│ PostgreSQL │ │ Redis │ │ Celery │
│ 16 │ │ 7 │ │ Workers │
│ (asyncpg) │ │ (cache + │ │ (4) + │
│ │ │ broker) │ │ Beat │
└───────────┘ └───────────┘ └────────────┘
```
## Backend Layers
### API Layer (`backend/app/api/v1/`)
RESTful endpoints versioned under `/api/v1/`. JWT authentication via python-jose. All endpoints require tenant context.
Key route groups:
- **Auth**: registration, login, token refresh, password reset
- **Leads**: CRUD, scoring, bulk import, assignment
- **Deals**: pipeline management, stage transitions, forecasting
- **Inbox**: unified WhatsApp + Email + SMS conversation view
- **Sequences**: automated outreach cadences
- **Compliance**: PDPL consent management, data subject rights
- **Proposals / CPQ**: configure-price-quote with Arabic PDF generation
### Services Layer (`backend/app/services/`)
Business logic is isolated from API routes. Each service is a class with async methods.
Core services:
- `ai/` — Arabic NLP (intent, sentiment, entity extraction), lead scoring (0-100), conversation intelligence
- `pdpl/` — Consent manager, data rights handler, audit trail
- `cpq/` — Configure-Price-Quote with SAR currency handling
- `agents/` — Multi-agent orchestrator, 8 specialized roles, event routing, executor with retry
- `sequence_engine.py` — Automated multi-step outreach with channel rotation
- `model_router.py` — Task-specific LLM model selection across providers
- `security_gate.py` — Runtime security verification, PDPL enforcement
- `tool_verification.py` — Agent action audit trail (intent vs claim vs execution)
### Integration Layer (`backend/app/integrations/`)
Adapters for external services:
- **WhatsApp**: UltraMsg API (primary Saudi channel, 85% penetration)
- **Email**: SMTP with template rendering
- **SMS**: Twilio / local Saudi provider
- **Payments**: Stripe with SAR support
- **Tax**: ZATCA e-invoicing compliance
### Worker Layer (`backend/app/workers/`)
Celery tasks for async processing:
- Lead scoring recalculation
- Sequence step execution
- Email/WhatsApp delivery
- Analytics aggregation
- Scheduled reports
## Frontend Architecture
- **Framework**: Next.js 15 with App Router
- **Language**: TypeScript 5.7 in strict mode
- **Styling**: Tailwind CSS 3.4 with RTL-first layout (`dir="rtl"`)
- **Fonts**: IBM Plex Sans Arabic (primary), Tajawal (secondary)
- **Components**: Functional components with hooks
- **State**: Server components by default, client where interactivity needed
## Data Architecture
### Multi-Tenant Isolation
Every table includes a `tenant_id` column. All queries are scoped by tenant at the ORM level. Cross-tenant access is a Class C forbidden action.
### Key Models
- **Lead**: Contact with scoring, source tracking, assignment
- **Deal**: Pipeline stage, value (SAR), probability, close date
- **Company**: Organization with enrichment data
- **Sequence**: Multi-step outreach cadence
- **Consent**: PDPL consent record with purpose, channel, expiry (12 months)
- **Meeting**: Scheduled interactions with intelligence extraction
### Database Conventions
- All money fields use `Numeric` type (never Float)
- Soft-delete before hard-delete
- Alembic for all migrations
- Timezone: Asia/Riyadh (UTC+3)
- Currency: SAR default
## AI Architecture
### LLM Provider Chain
1. **Groq** (llama-3.1-70b): Fast classification, Arabic NLP
2. **Claude**: Sales copy, proposals, complex reasoning
3. **Gemini**: Research, analysis
4. **DeepSeek**: Code generation
5. **OpenAI GPT-4o-mini**: Fallback
### Agent System
Manus-style orchestrator with specialized agents:
- Lead Qualifier, Deal Advisor, Meeting Prep, Sequence Optimizer
- Content Generator, Analytics Reporter, Compliance Checker, Escalation Handler
Event-to-agent routing via `router.py`. Executor handles retry logic and escalation to human.
## Security & Compliance
### PDPL (نظام حماية البيانات الشخصية)
- Consent required before any outbound message
- Consent tracks: purpose, channel, timestamp, expiry
- Data subject rights: access, correction, deletion
- Full audit trail for consent changes
- Auto-expire after 12 months
- Penalty: up to SAR 5,000,000 per violation
### Authentication
- JWT tokens via python-jose
- Role-based access: owner, admin, manager, sales_rep, viewer
- Tenant-scoped permissions
### Policy Classes
- **Class A** (Auto-allowed): Reading, testing, documentation, analysis
- **Class B** (Approval required): Migrations, messaging, payments, deployments
- **Class C** (Forbidden): Secret exfiltration, cross-tenant access, ungoverned bulk messaging
## Deployment
- **Containerized**: Docker Compose for all services
- **Reverse proxy**: Nginx
- **CI/CD**: GitHub Actions (feature branch → PR → review → staging → canary 10% → production)
- **Monitoring**: Health checks, error tracking, performance metrics