system-prompts-and-models-o.../dealix/docs/ops/lead_machine/LEAD_MACHINE_SPEC.md
2026-05-01 14:03:52 +03:00

7.4 KiB

Dealix Lead Intelligence Router — v1 Spec

What it is: A legal, evidence-based engine that discovers, enriches, scores, and routes leads (companies + people) against a natural-language Ideal Customer Profile, for 5 use cases: sales, partnership, collaboration, investor, b2c_audience.

What it is not: a scraper. No unauthorized LinkedIn automation, no bot messaging, no browser extensions that bypass anti-bot defenses.


Core pipeline (10 stages)

(1) ICP intake           →  natural-language goal  →  structured ICP + signals required
(2) Source routing       →  decide connectors to call based on what's configured
(3) Discovery            →  produce candidate companies with source attribution
(4) Enrichment           →  technographics + firmographics + public pages
(5) Signal detection     →  buying / partnership / collab / investor / b2c signals
(6) Decision-makers      →  legal role + contact surface identification
(7) Scoring              →  100-point model → priority
(8) Personalization      →  short Arabic/English message tailored to evidence
(9) Compliance check     →  source, opt-out, channel legality, jurisdiction
(10) Export              →  CSV / pipeline_tracker.csv / CRM / GitHub issue

Every stage emits an evidence record — claim + source_url + source_type + collected_at + confidence.


Minimum viable v1 (what actually ships today)

Stage v1 implementation Connector needed
1. ICP intake Free-text form on landing + POST /api/v1/prospect/discover body
2. Source routing LLM-native (Claude/Gemini) — uses training knowledge of Saudi market
3. Discovery LLM-produced candidates with strict "no invention" prompt (later: Google CSE)
4. Enrichment LLM + optional manual lookup (later: Wappalyzer API)
5. Signals LLM extracts from its knowledge + prompt-provided evidence (later: job-post/news crawlers)
6. Decision-makers LLM names public founders/execs; URL only if high-confidence (later: Apollo People API)
7. Scoring 100-point model in ICP_SCORING_MODEL.md
8. Personalization LLM generates ≤280-char Khaliji opening referencing one evidence item
9. Compliance Static checks: channel ≠ LinkedIn-bot; email has opt-out; no PII fabrication
10. Export JSON response → landing UI + docs/ops/lead_machine/TOP_10_SCORED.csv

Ships now. Connector upgrades slot in later behind env vars (GOOGLE_SEARCH_API_KEY, WAPPALYZER_API_KEY, APOLLO_API_KEY) without changing the pipeline shape.


Allowed:

  • Google Custom Search API (100 free queries/day)
  • Bing Search API
  • Wappalyzer API (technographics)
  • Apollo API (people search, enrichment — within plan limits)
  • Company public pages (about, careers, pricing, partners, integrations, case studies)
  • Public job postings (GulfTalent, Bayt, LinkedIn Jobs public listings)
  • Public funding / press pages (MAGNiTT, Crunchbase public, Wamda, ArabNews)
  • Customer-provided CSVs
  • Manual LinkedIn research (human-driven, browser, no automation)

Not allowed:

  • LinkedIn scraping via bots or browser extensions
  • Automated LinkedIn DM sending (violates ToS, risks account ban)
  • Bypassing anti-bot systems
  • Harvesting private/authenticated data
  • Storing sensitive PII without operational need
  • Mass email spam
  • Deceptive outreach or impersonation

Every lead record must include:

  • source — where the claim came from
  • source_type — website | api | public_page | manual | customer_csv
  • reason — why this lead is being suggested
  • confidence — 0-100
  • recommended_channel — LinkedIn_manual | email | partner_intro | phone | in_person
  • compliance_note — short string stating legal basis

Use cases (5 supported)

Use case Who Signal priority Recommended channel
sales B2B decision-makers w/ budget CRM + booking tool + hiring sales + paid ads + recent funding LinkedIn manual, email
partnership agencies, integrators, resellers agency service + SME customer base + retainer model + complementary tech LinkedIn manual, partner form
collaboration founders, creators, thought leaders public content on sales/growth + newsletter + podcast + community LinkedIn manual, email
investor VCs, angels active in MENA SaaS/AI portfolio overlap + recent thesis posts + MENA mandate warm intro, LinkedIn manual
b2c_audience consumer audiences demographics + behavior + purchase channels paid ads, WhatsApp broadcast, content

Each use case has a different scoring weight profile defined in ICP_SCORING_MODEL.md.


API surface (shipped)

POST /api/v1/prospect/discover
  body: {"icp": str, "use_case": str, "count": int}
  returns: ProspectResult JSON (see LEAD_OUTPUT_SCHEMA.json)

POST /api/v1/prospect/demo
  returns: canned 3-lead preview for UI smoke test

GET  /api/v1/prospect/use-cases
  returns: {use_cases: {...}, max_count: 20}

Evidence store (contract every field must honor)

{
  "claim": "Foodics raised Series C at $170M in 2025",
  "source_url": "https://magnitt.com/...",
  "source_type": "public_page",
  "collected_at": "2026-04-24T15:40:00Z",
  "confidence": 85
}

Fields without a source must be null. No invented URLs. No invented phone numbers. No invented emails.


Feedback loop (the moat)

Every lead that moves through outreach stages writes back:

  • sent_at — when outreach went out
  • replied_at + reply_sentiment — positive / neutral / negative
  • demo_booked_at
  • paid_at + revenue_sar
  • lost_reason — objection category

This data flows back into:

  • Signal weights (which signals actually predict conversion?)
  • Message angles (which openings actually got replies?)
  • Segment priority (which segments closed fastest?)

This is the sovereign layer. Apollo has 300M contacts; we have the ground-truth feedback loop for the Saudi/GCC B2B market.


Product positioning

Arabic-first Lead Intelligence + AI Sales Operations layer for companies and agencies that need to qualify leads, book demos, follow up, and convert inbound/outbound interest into revenue.

Not: generic chatbot · scraping tool · AI-for-everything · spam machine.

Differentiators:

  1. Arabic-first GTM (Saudi Khaliji dialect output by default)
  2. Saudi/GCC signal intelligence (local CR, local hiring boards, local funding press)
  3. Manual-to-automation ops model (ship by hand, automate what works)
  4. Agency/reseller motion (built-in commission model, white-label path)
  5. Evidence-backed lead scoring (every claim has a source)
  6. Payment + onboarding workflow (Moyasar automated + manual fallback)
  7. Founder-led launch engine (sovereign dataset grows with every outreach)
  8. Legally safer sourcing (no scraping lock-in risk)

Files in this directory