mancitrus/system-prompts-and-models-of-ai-tools

mirror of https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools.git synced 2026-06-18 07:19:35 +00:00

Sami Assiri f79c69ff25 ci(dealix): root GitHub workflows, ai-company track, full Dealix API tree

Made-with: Cursor

2026-05-01 14:03:52 +03:00

7.4 KiB

Raw Blame History

Dealix Lead Intelligence Router — v1 Spec

What it is: A legal, evidence-based engine that discovers, enriches, scores, and routes leads (companies + people) against a natural-language Ideal Customer Profile, for 5 use cases: sales, partnership, collaboration, investor, b2c_audience.

What it is not: a scraper. No unauthorized LinkedIn automation, no bot messaging, no browser extensions that bypass anti-bot defenses.

Core pipeline (10 stages)

(1) ICP intake           →  natural-language goal  →  structured ICP + signals required
(2) Source routing       →  decide connectors to call based on what's configured
(3) Discovery            →  produce candidate companies with source attribution
(4) Enrichment           →  technographics + firmographics + public pages
(5) Signal detection     →  buying / partnership / collab / investor / b2c signals
(6) Decision-makers      →  legal role + contact surface identification
(7) Scoring              →  100-point model → priority
(8) Personalization      →  short Arabic/English message tailored to evidence
(9) Compliance check     →  source, opt-out, channel legality, jurisdiction
(10) Export              →  CSV / pipeline_tracker.csv / CRM / GitHub issue

Every stage emits an evidence record — claim + source_url + source_type + collected_at + confidence.

Minimum viable v1 (what actually ships today)

Stage	v1 implementation	Connector needed
1. ICP intake	Free-text form on landing + `POST /api/v1/prospect/discover` body	—
2. Source routing	LLM-native (Claude/Gemini) — uses training knowledge of Saudi market	—
3. Discovery	LLM-produced candidates with strict "no invention" prompt	(later: Google CSE)
4. Enrichment	LLM + optional manual lookup	(later: Wappalyzer API)
5. Signals	LLM extracts from its knowledge + prompt-provided evidence	(later: job-post/news crawlers)
6. Decision-makers	LLM names public founders/execs; URL only if high-confidence	(later: Apollo People API)
7. Scoring	100-point model in ICP_SCORING_MODEL.md	—
8. Personalization	LLM generates ≤280-char Khaliji opening referencing one evidence item	—
9. Compliance	Static checks: channel ≠ LinkedIn-bot; email has opt-out; no PII fabrication	—
10. Export	JSON response → landing UI + `docs/ops/lead_machine/TOP_10_SCORED.csv`	—

Ships now. Connector upgrades slot in later behind env vars (GOOGLE_SEARCH_API_KEY, WAPPALYZER_API_KEY, APOLLO_API_KEY) without changing the pipeline shape.

Legal boundaries (hard rules)

Allowed:

Google Custom Search API (100 free queries/day)
Bing Search API
Wappalyzer API (technographics)
Apollo API (people search, enrichment — within plan limits)
Company public pages (about, careers, pricing, partners, integrations, case studies)
Public job postings (GulfTalent, Bayt, LinkedIn Jobs public listings)
Public funding / press pages (MAGNiTT, Crunchbase public, Wamda, ArabNews)
Customer-provided CSVs
Manual LinkedIn research (human-driven, browser, no automation)

Not allowed:

LinkedIn scraping via bots or browser extensions
Automated LinkedIn DM sending (violates ToS, risks account ban)
Bypassing anti-bot systems
Harvesting private/authenticated data
Storing sensitive PII without operational need
Mass email spam
Deceptive outreach or impersonation

Every lead record must include:

source — where the claim came from
source_type — website | api | public_page | manual | customer_csv
reason — why this lead is being suggested
confidence — 0-100
recommended_channel — LinkedIn_manual | email | partner_intro | phone | in_person
compliance_note — short string stating legal basis

Use cases (5 supported)

Use case	Who	Signal priority	Recommended channel
`sales`	B2B decision-makers w/ budget	CRM + booking tool + hiring sales + paid ads + recent funding	LinkedIn manual, email
`partnership`	agencies, integrators, resellers	agency service + SME customer base + retainer model + complementary tech	LinkedIn manual, partner form
`collaboration`	founders, creators, thought leaders	public content on sales/growth + newsletter + podcast + community	LinkedIn manual, email
`investor`	VCs, angels active in MENA SaaS/AI	portfolio overlap + recent thesis posts + MENA mandate	warm intro, LinkedIn manual
`b2c_audience`	consumer audiences	demographics + behavior + purchase channels	paid ads, WhatsApp broadcast, content

Each use case has a different scoring weight profile defined in ICP_SCORING_MODEL.md.

API surface (shipped)

POST /api/v1/prospect/discover
  body: {"icp": str, "use_case": str, "count": int}
  returns: ProspectResult JSON (see LEAD_OUTPUT_SCHEMA.json)

POST /api/v1/prospect/demo
  returns: canned 3-lead preview for UI smoke test

GET  /api/v1/prospect/use-cases
  returns: {use_cases: {...}, max_count: 20}

Evidence store (contract every field must honor)

{
  "claim": "Foodics raised Series C at $170M in 2025",
  "source_url": "https://magnitt.com/...",
  "source_type": "public_page",
  "collected_at": "2026-04-24T15:40:00Z",
  "confidence": 85
}

Fields without a source must be null. No invented URLs. No invented phone numbers. No invented emails.

Feedback loop (the moat)

Every lead that moves through outreach stages writes back:

sent_at — when outreach went out
replied_at + reply_sentiment — positive / neutral / negative
demo_booked_at
paid_at + revenue_sar
lost_reason — objection category

This data flows back into:

Signal weights (which signals actually predict conversion?)
Message angles (which openings actually got replies?)
Segment priority (which segments closed fastest?)

This is the sovereign layer. Apollo has 300M contacts; we have the ground-truth feedback loop for the Saudi/GCC B2B market.

Product positioning

Arabic-first Lead Intelligence + AI Sales Operations layer for companies and agencies that need to qualify leads, book demos, follow up, and convert inbound/outbound interest into revenue.

Not: generic chatbot · scraping tool · AI-for-everything · spam machine.

Differentiators:

Arabic-first GTM (Saudi Khaliji dialect output by default)
Saudi/GCC signal intelligence (local CR, local hiring boards, local funding press)
Manual-to-automation ops model (ship by hand, automate what works)
Agency/reseller motion (built-in commission model, white-label path)
Evidence-backed lead scoring (every claim has a source)
Payment + onboarding workflow (Moyasar automated + manual fallback)
Founder-led launch engine (sovereign dataset grows with every outreach)
Legally safer sourcing (no scraping lock-in risk)

Files in this directory

SIGNAL_TAXONOMY.md — the signal dictionary by use case
ICP_SCORING_MODEL.md — 100-point scoring model with weights
LEAD_OUTPUT_SCHEMA.json — canonical JSON schema
TOP_10_SCORED.csv — today's top 10 leads scored by this model
CONNECTOR_ENV_VARS.md — required env vars when upgrading connectors

7.4 KiB Raw Blame History