5.3 KiB
Saudi B2B Data Source Catalog
Curated list of Saudi business directory sources — legality-graded for Dealix ingestion.
Hard rule: never ingest a source unless its recommended_use is green or yellow. Red-flagged sources fall outside Dealix policy and don't go into the Data Lake.
Source rating system
| Color | Meaning | Ingestion |
|---|---|---|
| 🟢 Green | Public + clearly permissive | Direct ingest with source_type=public |
| 🟡 Yellow | Public but ToS-sensitive | Ingest only company-level data; require manual approval; cite source |
| 🔴 Red | Scraping forbidden, paywalled-without-allowed-use, or platform-restricted | Do NOT ingest |
1. Saudi Chambers of Commerce (Official) 🟢
| Source | URL | Coverage | Access |
|---|---|---|---|
| Riyadh Chamber Member Directory | chamber.org.sa | Riyadh members | Public business directory |
| Jeddah Chamber Member Directory | jcci.org.sa | Jeddah members | Public business directory |
| Asharqia (Eastern Province) Chamber | chamber.org.sa/eastern | Dammam/Khobar/Jubail | Public business directory |
| Madinah Chamber | mcci.org.sa | Madinah | Public business directory |
| Makkah Chamber | mcc.org.sa | Mecca | Public business directory |
| Federation of Saudi Chambers | csc.org.sa | National | Public business directory |
Why green: These are public chambers serving members; member directories are explicitly published for business networking.
Ingest plan: crawl with requests_bs4 provider → extract company name, sector, contact info → import as source_type=public, source_name=chamber_<city>_<year_month>, allowed_use=business_contact_research_only.
2. Government / Open Data 🟢
| Source | URL | Coverage |
|---|---|---|
| SDAIA Open Data Portal | data.gov.sa | National datasets, business registry samples |
| MCI (Ministry of Commerce) public registry | mci.gov.sa | Commercial registrations |
| MISA (Investment) license registry | misa.gov.sa | Foreign-investment-licensed entities |
| ZATCA taxpayer registry (limited) | zatca.gov.sa | VAT-registered entities (lookups only) |
Why green: Government open-data programs publish business-registry-style datasets explicitly for reuse.
3. Trade Associations 🟢
| Source | Sector |
|---|---|
| Saudi Real Estate Refereeing Center | real estate |
| Saudi Contractors Authority | construction |
| Saudi Tourism Authority registry | hospitality, events |
| Saudi Authority for Data and AI vendor catalog | tech |
4. Saudi Business Directory Websites 🟡
| Source | Notes |
|---|---|
| daleel-saudi.com | Yellow — public listings, but ToS limits bulk export. Ingest individual lookups only. |
| saudisteps.com | Yellow — sector-specific directories |
| arabbusinessguide.com | Yellow — pan-GCC, includes Saudi |
Ingestion rule: lookup individual companies for enrichment, not bulk download.
5. Map Data via Google Places 🟢
Already wired into Dealix as MapsProvider. Per Google Maps Platform terms:
- Store
place_id(allowed) - Store internal scores + Dealix-derived metadata
- Refresh place details on demand
- Display Google attribution where shown to end users
Treat Google Places as the canonical real-time enrichment source for sectors where directory coverage is thin (clinics, salons, gyms, training centers).
6. LinkedIn 🔴 — DO NOT SCRAPE
LinkedIn explicitly prohibits automated data collection. Dealix uses LinkedIn for:
- Manual research only
- Manual message sending only
- Never for data ingestion
7. Paid B2B Vendors (Procurement Channel) 🟡
When you eventually buy data, demand from the vendor:
source_documentation— where each row came fromcollection_method— how it was gatheredallowed_useclause in writinglast_updatedtimestamp per row- Sample of 100 rows for the audit script BEFORE you pay
- Refund clause if acceptance rate < 80%
Reject vendors who can't answer all 6.
Indicative Saudi/MENA vendors (always vet):
- ZoomInfo (paid; check Saudi coverage)
- Apollo.io (paid; check Saudi coverage; Dealix already has Apollo plugin scaffolding)
- Common Room (paid; check Saudi coverage)
- Local Saudi data brokers (vet very carefully)
8. Customer-Owned Data 🟢
Best source. When a Dealix customer signs up:
- Their CRM export
- Their website form submissions
- Their old email lists (with consent record)
Always classify as source_type=owned and consent_status=owned.
Ingestion priority order
1. Customer-owned (green)
2. Chamber directories (green)
3. Government open data (green)
4. Google Places (green via MapsProvider)
5. Saudi business directory websites (yellow — lookup only)
6. Trade associations (green)
7. Paid vendors (yellow — vet thoroughly)
8. LinkedIn (red — manual only, never ingest)
Anti-patterns
- ❌ Scraping behind login walls
- ❌ Reusing leaked databases
- ❌ Buying lists "without source documentation"
- ❌ Mixing sources in one import without
source_typeper row - ❌ Treating personal-domain emails (gmail/hotmail) as business contact under PDPL legitimate-interest
Quarterly review
Every quarter, audit the Data Lake:
- Sources by
source_typecount - Average
data_quality_scoreper source - Reply rate per source (after first month live)
- Suppression hits per source (proxy for spam complaints)
- Drop sources with poor quality or compliance flags