mirror of
https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools.git
synced 2026-06-18 15:29:36 +00:00
146 lines
4.3 KiB
Markdown
146 lines
4.3 KiB
Markdown
# Dealix Data Lake + Lead Graph Playbook
|
||
|
||
How to use Dealix as a data ingestion + enrichment + outreach-prep system, not a blast tool.
|
||
|
||
## Mental model
|
||
|
||
```
|
||
Data Lake (raw_lead_imports + raw_lead_rows)
|
||
↓ normalize
|
||
Lead Graph (accounts + contacts + signals)
|
||
↓ enrich (providers)
|
||
Scored Accounts (lead_scores)
|
||
↓ suppression check + channel policy
|
||
Outreach Queue (always approval_required for first 30 days)
|
||
```
|
||
|
||
Raw rows are kept forever. Outreach happens only after compliance gates pass.
|
||
|
||
## 4 data types Dealix accepts
|
||
|
||
| Type | Example source | `source_type` |
|
||
|---|---|---|
|
||
| Owned | Customer CRM, your own form submissions | `owned` |
|
||
| Public | Google Search, Google Maps, business directories | `public` / `google_maps` / `google_search` |
|
||
| Paid | Vetted vendor lists with documented allowed-use | `paid` |
|
||
| Partner | Co-marketing list with explicit consent | `partner` |
|
||
|
||
**Never accept:** WhatsApp number lists with no source, scraped LinkedIn profiles,
|
||
personal emails without opt-in.
|
||
|
||
## Required metadata per import
|
||
|
||
```json
|
||
{
|
||
"source_name": "vendor_x_saudi_real_estate_2026",
|
||
"source_type": "paid",
|
||
"allowed_use": "business_contact_research_only",
|
||
"consent_status": "legitimate_interest",
|
||
"risk_level": "high",
|
||
"rows": [...]
|
||
}
|
||
```
|
||
|
||
If the vendor can't tell you `source`, `allowed_use`, and `last_updated` — don't buy the list.
|
||
|
||
## Step-by-step ingestion
|
||
|
||
### 1. Audit the file BEFORE upload
|
||
|
||
```bash
|
||
python scripts/audit_lead_file.py vendor_file.csv
|
||
```
|
||
|
||
Reports acceptance rate, phone/email validity, dedup risk. If acceptance < 50%,
|
||
reject the file or ask the vendor to clean it.
|
||
|
||
### 2. Upload
|
||
|
||
```bash
|
||
python scripts/import_leads.py vendor_file.csv \
|
||
--source-name "vendor_x_2026_q2" \
|
||
--source-type paid \
|
||
--allowed-use "business_contact_research_only" \
|
||
--risk-level high \
|
||
--auto-pipeline
|
||
```
|
||
|
||
`--auto-pipeline` runs normalize → dedupe → enrich automatically.
|
||
|
||
### 3. Or call the API directly
|
||
|
||
```
|
||
POST /api/v1/data/import
|
||
POST /api/v1/data/import/{id}/normalize
|
||
POST /api/v1/data/import/{id}/dedupe
|
||
POST /api/v1/data/import/{id}/enrich body: {enrichment_level: "standard", max_accounts: 25}
|
||
GET /api/v1/data/import/{id}/report
|
||
```
|
||
|
||
### 4. Discover local Saudi sectors via Google Maps
|
||
|
||
```bash
|
||
python scripts/discover_local_to_csv.py dental_clinic riyadh --max 20
|
||
# wrote 20 rows → dental_clinic_riyadh.csv
|
||
|
||
python scripts/import_leads.py dental_clinic_riyadh.csv \
|
||
--source-name "maps_dental_clinic_riyadh" \
|
||
--source-type google_maps \
|
||
--auto-pipeline
|
||
```
|
||
|
||
### 5. Suppress opt-outs
|
||
|
||
```
|
||
POST /api/v1/data/suppression
|
||
body: {"email": "...", "reason": "opt_out_request_2026_04"}
|
||
```
|
||
|
||
### 6. Prepare outreach
|
||
|
||
```
|
||
POST /api/v1/outreach/prepare-from-data
|
||
body: {"priority": ["P0","P1"], "max_accounts": 25, "persist": true}
|
||
```
|
||
|
||
Returns `ready` / `needs_review` / `blocked` lists. Persisted rows go to
|
||
`outreach_queue` with `approval_required=True` — Sami still approves manually.
|
||
|
||
### 7. Export a CSV for human send
|
||
|
||
```bash
|
||
python scripts/export_outreach_ready.py --priority P0,P1 --max 50 \
|
||
--out today_outreach_50.csv
|
||
```
|
||
|
||
## Compliance guardrails (already enforced)
|
||
|
||
- Suppression hits → `blocked`, never queued.
|
||
- `opt_out=true` on contact → `blocked`.
|
||
- `risk_level=high` → `needs_review`, requires explicit approval.
|
||
- Missing `allowed_use` → `needs_review`.
|
||
- All queue rows have `approval_required=True` for the first 30 days.
|
||
|
||
## Data quality scoring
|
||
|
||
Each account gets a `data_quality_score` 0..100 based on field completeness +
|
||
signal coverage − negatives (no source, opt-out, high risk). See
|
||
`auto_client_acquisition/pipelines/scoring.py::compute_data_quality`.
|
||
|
||
`/api/v1/data/accounts?priority=P0` lets you pull the highest-DQ + highest-score
|
||
accounts ready for action.
|
||
|
||
## Google Maps cache policy
|
||
|
||
Per Google Maps Platform terms, we store `place_id` (allowed) and refresh details
|
||
on demand rather than caching everything forever. See
|
||
`auto_client_acquisition/connectors/google_maps.py::discover_local`.
|
||
|
||
## Don't do
|
||
|
||
- ❌ Auto-send from raw rows. Always normalize → dedupe → enrich → score → queue.
|
||
- ❌ Cold-blast WhatsApp. WhatsApp is inbound only in Dealix.
|
||
- ❌ Scrape LinkedIn. Use it for manual research only.
|
||
- ❌ Use a list with no `source` or `allowed_use`. Reject the data.
|
||
- ❌ Send unapproved messages in the first 30 days of any new customer.
|