# Dealix Lead Intelligence Router — v1 Spec

**What it is:** A legal, evidence-based engine that discovers, enriches, scores, and routes leads (companies + people) against a natural-language Ideal Customer Profile, for 5 use cases: sales, partnership, collaboration, investor, b2c_audience.

**What it is not:** a scraper. No unauthorized LinkedIn automation, no bot messaging, no browser extensions that bypass anti-bot defenses.

---

## Core pipeline (10 stages)

```
(1) ICP intake           →  natural-language goal  →  structured ICP + signals required
(2) Source routing       →  decide connectors to call based on what's configured
(3) Discovery            →  produce candidate companies with source attribution
(4) Enrichment           →  technographics + firmographics + public pages
(5) Signal detection     →  buying / partnership / collab / investor / b2c signals
(6) Decision-makers      →  legal role + contact surface identification
(7) Scoring              →  100-point model → priority
(8) Personalization      →  short Arabic/English message tailored to evidence
(9) Compliance check     →  source, opt-out, channel legality, jurisdiction
(10) Export              →  CSV / pipeline_tracker.csv / CRM / GitHub issue
```

Every stage emits an **evidence record** — claim + source_url + source_type + collected_at + confidence.

---

## Minimum viable v1 (what actually ships today)

| Stage | v1 implementation | Connector needed |
|-------|-------------------|------------------|
| 1. ICP intake | Free-text form on landing + `POST /api/v1/prospect/discover` body | — |
| 2. Source routing | LLM-native (Claude/Gemini) — uses training knowledge of Saudi market | — |
| 3. Discovery | LLM-produced candidates with strict "no invention" prompt | (later: Google CSE) |
| 4. Enrichment | LLM + optional manual lookup | (later: Wappalyzer API) |
| 5. Signals | LLM extracts from its knowledge + prompt-provided evidence | (later: job-post/news crawlers) |
| 6. Decision-makers | LLM names public founders/execs; URL only if high-confidence | (later: Apollo People API) |
| 7. Scoring | 100-point model in [ICP_SCORING_MODEL.md](./ICP_SCORING_MODEL.md) | — |
| 8. Personalization | LLM generates ≤280-char Khaliji opening referencing one evidence item | — |
| 9. Compliance | Static checks: channel ≠ LinkedIn-bot; email has opt-out; no PII fabrication | — |
| 10. Export | JSON response → landing UI + `docs/ops/lead_machine/TOP_10_SCORED.csv` | — |

**Ships now.** Connector upgrades slot in later behind env vars (`GOOGLE_SEARCH_API_KEY`, `WAPPALYZER_API_KEY`, `APOLLO_API_KEY`) without changing the pipeline shape.

---

## Legal boundaries (hard rules)

Allowed:
- Google Custom Search API (100 free queries/day)
- Bing Search API
- Wappalyzer API (technographics)
- Apollo API (people search, enrichment — within plan limits)
- Company public pages (about, careers, pricing, partners, integrations, case studies)
- Public job postings (GulfTalent, Bayt, LinkedIn Jobs public listings)
- Public funding / press pages (MAGNiTT, Crunchbase public, Wamda, ArabNews)
- Customer-provided CSVs
- Manual LinkedIn research (human-driven, browser, no automation)

Not allowed:
- LinkedIn scraping via bots or browser extensions
- Automated LinkedIn DM sending (violates ToS, risks account ban)
- Bypassing anti-bot systems
- Harvesting private/authenticated data
- Storing sensitive PII without operational need
- Mass email spam
- Deceptive outreach or impersonation

Every lead record **must include**:
- `source` — where the claim came from
- `source_type` — website | api | public_page | manual | customer_csv
- `reason` — why this lead is being suggested
- `confidence` — 0-100
- `recommended_channel` — LinkedIn_manual | email | partner_intro | phone | in_person
- `compliance_note` — short string stating legal basis

---

## Use cases (5 supported)

| Use case | Who | Signal priority | Recommended channel |
|----------|-----|-----------------|----------------------|
| `sales` | B2B decision-makers w/ budget | CRM + booking tool + hiring sales + paid ads + recent funding | LinkedIn manual, email |
| `partnership` | agencies, integrators, resellers | agency service + SME customer base + retainer model + complementary tech | LinkedIn manual, partner form |
| `collaboration` | founders, creators, thought leaders | public content on sales/growth + newsletter + podcast + community | LinkedIn manual, email |
| `investor` | VCs, angels active in MENA SaaS/AI | portfolio overlap + recent thesis posts + MENA mandate | warm intro, LinkedIn manual |
| `b2c_audience` | consumer audiences | demographics + behavior + purchase channels | paid ads, WhatsApp broadcast, content |

Each use case has a different scoring weight profile defined in [ICP_SCORING_MODEL.md](./ICP_SCORING_MODEL.md).

---

## API surface (shipped)

```
POST /api/v1/prospect/discover
  body: {"icp": str, "use_case": str, "count": int}
  returns: ProspectResult JSON (see LEAD_OUTPUT_SCHEMA.json)

POST /api/v1/prospect/demo
  returns: canned 3-lead preview for UI smoke test

GET  /api/v1/prospect/use-cases
  returns: {use_cases: {...}, max_count: 20}
```

---

## Evidence store (contract every field must honor)

```json
{
  "claim": "Foodics raised Series C at $170M in 2025",
  "source_url": "https://magnitt.com/...",
  "source_type": "public_page",
  "collected_at": "2026-04-24T15:40:00Z",
  "confidence": 85
}
```

Fields without a source must be `null`. No invented URLs. No invented phone numbers. No invented emails.

---

## Feedback loop (the moat)

Every lead that moves through outreach stages writes back:
- `sent_at` — when outreach went out
- `replied_at` + `reply_sentiment` — positive / neutral / negative
- `demo_booked_at`
- `paid_at` + `revenue_sar`
- `lost_reason` — objection category

This data flows back into:
- Signal weights (which signals actually predict conversion?)
- Message angles (which openings actually got replies?)
- Segment priority (which segments closed fastest?)

**This is the sovereign layer.** Apollo has 300M contacts; we have the ground-truth feedback loop for the Saudi/GCC B2B market.

---

## Product positioning

> **Arabic-first Lead Intelligence + AI Sales Operations layer for companies and agencies that need to qualify leads, book demos, follow up, and convert inbound/outbound interest into revenue.**

Not: generic chatbot · scraping tool · AI-for-everything · spam machine.

Differentiators:
1. Arabic-first GTM (Saudi Khaliji dialect output by default)
2. Saudi/GCC signal intelligence (local CR, local hiring boards, local funding press)
3. Manual-to-automation ops model (ship by hand, automate what works)
4. Agency/reseller motion (built-in commission model, white-label path)
5. Evidence-backed lead scoring (every claim has a source)
6. Payment + onboarding workflow (Moyasar automated + manual fallback)
7. Founder-led launch engine (sovereign dataset grows with every outreach)
8. Legally safer sourcing (no scraping lock-in risk)

---

## Files in this directory

- [SIGNAL_TAXONOMY.md](./SIGNAL_TAXONOMY.md) — the signal dictionary by use case
- [ICP_SCORING_MODEL.md](./ICP_SCORING_MODEL.md) — 100-point scoring model with weights
- [LEAD_OUTPUT_SCHEMA.json](./LEAD_OUTPUT_SCHEMA.json) — canonical JSON schema
- [TOP_10_SCORED.csv](./TOP_10_SCORED.csv) — today's top 10 leads scored by this model
- [CONNECTOR_ENV_VARS.md](./CONNECTOR_ENV_VARS.md) — required env vars when upgrading connectors