system-prompts-and-models-of-ai-tools

mancitrus/system-prompts-and-models-of-ai-tools

Fork 0

mirror of https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools.git synced 2026-06-18 07:19:35 +00:00

Commit Graph

Author	SHA1	Message	Date
Dealix Builder	342bcf8ea5	feat(paid-beta): operational layer for first 499 SAR — playbook + workflow + board + scorecard + landing CTA Move from GO_PRIVATE_BETA (technical readiness) to PAID_BETA_READY (first revenue) — operational, not architectural. Deliverables: - docs/PAID_BETA_OPERATING_PLAYBOOK.md 10-section Arabic playbook: gate to Paid Beta, 7-day day-by-day plan (Staging → Outreach → Demos → Diagnostic → Pilot Sale → Pilot Day1/Day2 → Proof+Upsell), weekly targets (50-70 messages / 5-10 replies / 3-5 demos / 1+ payment), 8 hard operational rules, daily cadence, what NOT to add, Public Launch criteria. - docs/FIRST_PILOT_DELIVERY_WORKFLOW.md 48-hour Arabic Pilot delivery: T+0 intake (15 fields) → T+24 Free Diagnostic (3 opportunities + 1 Arabic message + 1 risk + 1 service recommendation) → T+48 Pilot 499 (10 opportunities + 7-day follow-up plan + Proof Pack) → T+7 final Proof Pack + 30min review + 3 upgrade paths. Pilot success criteria + 8-row metrics table. - docs/PRIVATE_BETA_OPERATING_BOARD.md 15-column Sheet template (company, person, segment, source, channel, message_sent, reply_status, demo_booked, diagnostic_sent, pilot_offered, price, paid, proof_pack_sent, next_step, notes) + status flow + ICP distribution + 3-wave follow-up templates + daily routine + PDPL privacy rules + CSV header. - landing/private-beta.html Pilot 499 SAR offer prominent at top (badge + hero CTA), dedicated 3-card pricing section (Pilot 499 / Free Diagnostic / Growth OS Monthly 2,999), 7-day refund/case-study guarantee, mailto CTAs with prefilled subject + body, removed duplicate pricing block. - scripts/paid_beta_daily_scorecard.py (274 lines) argparse with --messages, --replies, --demos, --pilots, --payments, --proof-packs, --as-of, --json. Computes reply_rate / demo_rate / pilot_rate / payment_rate, daily verdict (ON_TRACK / BEHIND / OFF_TRACK), weekly verdict (BLOCKERS / STRETCH_PENDING / WEEKLY_TARGETS_HIT), and rule-based next_actions in Arabic. Targets: 50-70 messages / 5-15 replies / 3-7 demos / 2-3 pilots / 1-2 paid / 1+ proof pack per week. - tests/unit/test_paid_beta_scorecard.py 12 tests: zero-input, on-track day, tone-action trigger, payment → proof-pack action, full-week target hit, conversion rates, Arabic text rendering, JSON validity, CLI text/json modes, --as-of today/explicit. Hard rules (unchanged): - No live WhatsApp / Gmail / Calendar send without env flag + approval. - No Moyasar API charge — manual invoice/payment-link only. - No LinkedIn scraping / auto-DM — Lead Gen Forms + manual outreach. - No cold WhatsApp without opt-in (PDPL hard-block). - Every message passes safety_eval + saudi_tone_eval. - Every action recorded in Action Ledger. Validation: - python -m compileall api auto_client_acquisition: clean. - pytest tests/unit (excl. tenacity-dep tests): 950 passed, 2 skipped. - python scripts/smoke_inprocess.py: SMOKE_INPROCESS_OK (8/8 endpoints). - python scripts/paid_beta_daily_scorecard.py text + --json: both render correctly with Arabic + verdict + next_actions. - tests/unit/test_positioning_lock.py: 10 passed (no prohibited phrases introduced in updated landing/private-beta.html). Test count: 949 → 962 (+12 new, 1 prior already counted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 18:39:36 +03:00
Dealix Builder	bcf545c22e	feat(self-improving): Hermes-inspired Agent Platform — 6 layers + 30 endpoints + 76 tests + Private Beta launch Security Curator (4 modules) — جدار الحماية الأول - secret_redactor: 11 patterns (GitHub PAT, OpenAI/Anthropic/Supabase/WhatsApp/Moyasar/Sentry/Google/AWS/private keys); never returns raw secret - patch_firewall: blocks .env / credentials.json / RSA keys; scans added lines for secret patterns - trace_redactor: masks phones (+966...) and emails for PII safety - tool_output_sanitizer: cleans tool outputs before they hit ledger/Proof Pack/UI/observability Growth Curator (5 modules) — التحسين الذاتي - message_curator: grades Arabic messages (0..100), detects 8 risky phrases, suggests Saudi-tone skeleton - playbook_curator: scores playbooks by outcome (accept/reply/meeting/deal); winner/promising/needs_work/archive - mission_curator: scores completed missions; ship_it_widely/iterate/rework_or_retire - skill_inventory: deterministic 23-skill catalog across 5 layers - curator_report: weekly Arabic summary "ماذا تعلمنا هذا الأسبوع" Meeting Intelligence (5 modules) — ذكاء الاجتماعات - transcript_parser: accepts Google Meet entries OR plain "Speaker: text" format - meeting_brief: 6-section pre-meeting brief in Arabic (objective/questions/objections/offer/next-step) - objection_extractor: 8 categories (price/timing/authority/trust/integration/competitor/results/complexity) - followup_builder: email + WhatsApp drafts; live_send_allowed=False always - deal_risk: 0..100 score from objections + missing next-step + decision-maker absence + days-since-touch Model Router (5 modules) — موجّه النماذج - provider_registry: 7 providers (Claude Sonnet/Haiku, GPT-4-class, GPT-4o-mini, Gemini Pro, Azure OAI KSA-region, Local Qwen Arabic-tuned) - task_router: 10 task types × routing decisions with reasons_ar - cost_policy: bulk → low; output > 1500 tokens → high - fallback_policy: high-sensitivity workloads prefer KSA-region/self-hosted FIRST - usage_dashboard: deterministic demo of all task routes Connector Catalog (3 modules) — كتالوج التكاملات - 14 connectors (WhatsApp Cloud, Gmail, Calendar, Google Meet, Moyasar, LinkedIn Lead Forms, Google Business Profile, X API, Instagram, Sheets, CRM, Website Forms, Composio, MCP Gateway) - Each has launch_phase (1-4), risk_level, allowed_actions, blocked_actions, Arabic risk dossier - WhatsApp blocks cold_send_without_consent; Moyasar blocks store_card_number; MCP requires allowlist Agent Observability (5 modules) — مراقبة الوكلاء + التقييمات - trace_events: SHA256-hashes user/company IDs; sanitizes payload/output before logging - safety_eval: 7 rules (guarantee, scarcity_fake, medical_claim, financial, regulatory, personal_data, urgency); 0..100 → safe/needs_review/blocked - saudi_tone_eval: positive markers (هلا, لاحظت, يناسبك) vs negative (تحية طيبة وبعد, synergy, leverage); arabic_ratio bonus - eval_pack: 5 curated cases with expected verdicts - cost_tracker: per workflow/provider/task_type aggregation Routers (6 new) — 30 endpoints - /api/v1/security-curator/{demo, redact, inspect-diff, sanitize-output} - /api/v1/growth-curator/{skills/inventory, messages/grade, messages/improve, messages/duplicates, missions/next, report/weekly, report/demo} - /api/v1/meeting-intelligence/{brief, brief/demo, transcript/summarize, followup/draft, deal-risk} - /api/v1/model-router/{providers, tasks, route, cost-class, usage/demo} - /api/v1/connector-catalog/{catalog, summary, status, risks, {key}} - /api/v1/agent-observability/{trace/build, safety/eval, tone/eval, evals/run} Tests (6 new files, 76 tests) - test_security_curator: 16 tests (PAT detect, key redact, env diff block, payload scan, trace mask) - test_growth_curator: 16 tests (Arabic grade, risky phrases, dup detect, playbook scoring, mission recommend, weekly report) - test_meeting_intelligence: 13 tests (transcript parse, brief sections, objection extract, followup drafts, deal risk) - test_dealix_model_router: 11 tests (every task → ≥1 provider, KSA-region for high sensitivity, cost class, primary override) - test_agent_observability: 12 tests (trace hashing, safety verdicts, tone scoring, eval pack) - test_connector_catalog: 11 tests (≥12 connectors, every has risk/blocked actions, WA cold-send blocked, Moyasar card-storage blocked) Docs (8 new + 1 updated) - AGENT_SECURITY_CURATOR.md (Arabic) - GROWTH_CURATOR_STRATEGY.md (Arabic) - MEETING_INTELLIGENCE.md (Arabic) - MODEL_PROVIDER_ROUTER.md (Arabic) - CONNECTOR_CATALOG.md (Arabic) - AGENT_OBSERVABILITY_EVALS.md (Arabic) - PRIVATE_BETA_LAUNCH_TODAY.md (Arabic) — go-checklist + offer + risks - DEMO_SCRIPT_12_MINUTES.md (Arabic) — minute-by-minute demo flow - FIRST_20_OUTREACH_MESSAGES.md (Arabic) — 7 personas + 3 follow-ups, all under safety/tone evals - DEALIX_100_PERCENT_LAUNCH_PLAN.md — added §34 Self-Improving Agent Platform + §35 Private Beta Launch Landing - landing/private-beta.html — Arabic RTL, dark theme, pricing, 11 demo endpoints, safety banner Test results - 76/76 new tests pass - Full suite: 663 passed, 2 skipped (missing API keys, unrelated) - 0 existing tests broken Safety - All 6 layers honor approval-first, draft-only, no-live-send - Hash user/company IDs before any trace - No secrets in logs/embeddings/traces (3-layer defense: redactor + sanitizer + firewall) - Saudi tone eval rejects "تحية طيبة وبعد" + "synergy" auto-corporate language - Safety eval blocks "ضمان 100%" + medical claims + fake urgency - Connector Catalog: WhatsApp blocks cold-send, Moyasar blocks card storage, MCP requires allowlist Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:30:18 +03:00

Author

SHA1

Message

Date

Dealix Builder

342bcf8ea5

feat(paid-beta): operational layer for first 499 SAR — playbook + workflow + board + scorecard + landing CTA

Move from GO_PRIVATE_BETA (technical readiness) to PAID_BETA_READY
(first revenue) — operational, not architectural.

Deliverables:
- docs/PAID_BETA_OPERATING_PLAYBOOK.md
  10-section Arabic playbook: gate to Paid Beta, 7-day day-by-day
  plan (Staging → Outreach → Demos → Diagnostic → Pilot Sale →
  Pilot Day1/Day2 → Proof+Upsell), weekly targets (50-70 messages /
  5-10 replies / 3-5 demos / 1+ payment), 8 hard operational rules,
  daily cadence, what NOT to add, Public Launch criteria.

- docs/FIRST_PILOT_DELIVERY_WORKFLOW.md
  48-hour Arabic Pilot delivery: T+0 intake (15 fields) → T+24
  Free Diagnostic (3 opportunities + 1 Arabic message + 1 risk + 1
  service recommendation) → T+48 Pilot 499 (10 opportunities + 7-day
  follow-up plan + Proof Pack) → T+7 final Proof Pack + 30min review +
  3 upgrade paths. Pilot success criteria + 8-row metrics table.

- docs/PRIVATE_BETA_OPERATING_BOARD.md
  15-column Sheet template (company, person, segment, source, channel,
  message_sent, reply_status, demo_booked, diagnostic_sent,
  pilot_offered, price, paid, proof_pack_sent, next_step, notes) +
  status flow + ICP distribution + 3-wave follow-up templates +
  daily routine + PDPL privacy rules + CSV header.

- landing/private-beta.html
  Pilot 499 SAR offer prominent at top (badge + hero CTA), dedicated
  3-card pricing section (Pilot 499 / Free Diagnostic / Growth OS
  Monthly 2,999), 7-day refund/case-study guarantee, mailto CTAs
  with prefilled subject + body, removed duplicate pricing block.

- scripts/paid_beta_daily_scorecard.py (274 lines)
  argparse with --messages, --replies, --demos, --pilots, --payments,
  --proof-packs, --as-of, --json. Computes reply_rate / demo_rate /
  pilot_rate / payment_rate, daily verdict (ON_TRACK / BEHIND /
  OFF_TRACK), weekly verdict (BLOCKERS / STRETCH_PENDING /
  WEEKLY_TARGETS_HIT), and rule-based next_actions in Arabic.
  Targets: 50-70 messages / 5-15 replies / 3-7 demos / 2-3 pilots /
  1-2 paid / 1+ proof pack per week.

- tests/unit/test_paid_beta_scorecard.py
  12 tests: zero-input, on-track day, tone-action trigger, payment
  → proof-pack action, full-week target hit, conversion rates,
  Arabic text rendering, JSON validity, CLI text/json modes,
  --as-of today/explicit.

Hard rules (unchanged):
- No live WhatsApp / Gmail / Calendar send without env flag + approval.
- No Moyasar API charge — manual invoice/payment-link only.
- No LinkedIn scraping / auto-DM — Lead Gen Forms + manual outreach.
- No cold WhatsApp without opt-in (PDPL hard-block).
- Every message passes safety_eval + saudi_tone_eval.
- Every action recorded in Action Ledger.

Validation:
- python -m compileall api auto_client_acquisition: clean.
- pytest tests/unit (excl. tenacity-dep tests): 950 passed, 2 skipped.
- python scripts/smoke_inprocess.py: SMOKE_INPROCESS_OK (8/8 endpoints).
- python scripts/paid_beta_daily_scorecard.py text + --json: both render
  correctly with Arabic + verdict + next_actions.
- tests/unit/test_positioning_lock.py: 10 passed (no prohibited
  phrases introduced in updated landing/private-beta.html).

Test count: 949 → 962 (+12 new, 1 prior already counted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 18:39:36 +03:00

Dealix Builder

bcf545c22e

feat(self-improving): Hermes-inspired Agent Platform — 6 layers + 30 endpoints + 76 tests + Private Beta launch

Security Curator (4 modules) — جدار الحماية الأول
- secret_redactor: 11 patterns (GitHub PAT, OpenAI/Anthropic/Supabase/WhatsApp/Moyasar/Sentry/Google/AWS/private keys); never returns raw secret
- patch_firewall: blocks .env / credentials.json / RSA keys; scans added lines for secret patterns
- trace_redactor: masks phones (+966...) and emails for PII safety
- tool_output_sanitizer: cleans tool outputs before they hit ledger/Proof Pack/UI/observability

Growth Curator (5 modules) — التحسين الذاتي
- message_curator: grades Arabic messages (0..100), detects 8 risky phrases, suggests Saudi-tone skeleton
- playbook_curator: scores playbooks by outcome (accept/reply/meeting/deal); winner/promising/needs_work/archive
- mission_curator: scores completed missions; ship_it_widely/iterate/rework_or_retire
- skill_inventory: deterministic 23-skill catalog across 5 layers
- curator_report: weekly Arabic summary "ماذا تعلمنا هذا الأسبوع"

Meeting Intelligence (5 modules) — ذكاء الاجتماعات
- transcript_parser: accepts Google Meet entries OR plain "Speaker: text" format
- meeting_brief: 6-section pre-meeting brief in Arabic (objective/questions/objections/offer/next-step)
- objection_extractor: 8 categories (price/timing/authority/trust/integration/competitor/results/complexity)
- followup_builder: email + WhatsApp drafts; live_send_allowed=False always
- deal_risk: 0..100 score from objections + missing next-step + decision-maker absence + days-since-touch

Model Router (5 modules) — موجّه النماذج
- provider_registry: 7 providers (Claude Sonnet/Haiku, GPT-4-class, GPT-4o-mini, Gemini Pro, Azure OAI KSA-region, Local Qwen Arabic-tuned)
- task_router: 10 task types × routing decisions with reasons_ar
- cost_policy: bulk → low; output > 1500 tokens → high
- fallback_policy: high-sensitivity workloads prefer KSA-region/self-hosted FIRST
- usage_dashboard: deterministic demo of all task routes

Connector Catalog (3 modules) — كتالوج التكاملات
- 14 connectors (WhatsApp Cloud, Gmail, Calendar, Google Meet, Moyasar, LinkedIn Lead Forms, Google Business Profile, X API, Instagram, Sheets, CRM, Website Forms, Composio, MCP Gateway)
- Each has launch_phase (1-4), risk_level, allowed_actions, blocked_actions, Arabic risk dossier
- WhatsApp blocks cold_send_without_consent; Moyasar blocks store_card_number; MCP requires allowlist

Agent Observability (5 modules) — مراقبة الوكلاء + التقييمات
- trace_events: SHA256-hashes user/company IDs; sanitizes payload/output before logging
- safety_eval: 7 rules (guarantee, scarcity_fake, medical_claim, financial, regulatory, personal_data, urgency); 0..100 → safe/needs_review/blocked
- saudi_tone_eval: positive markers (هلا, لاحظت, يناسبك) vs negative (تحية طيبة وبعد, synergy, leverage); arabic_ratio bonus
- eval_pack: 5 curated cases with expected verdicts
- cost_tracker: per workflow/provider/task_type aggregation

Routers (6 new) — 30 endpoints
- /api/v1/security-curator/{demo, redact, inspect-diff, sanitize-output}
- /api/v1/growth-curator/{skills/inventory, messages/grade, messages/improve, messages/duplicates, missions/next, report/weekly, report/demo}
- /api/v1/meeting-intelligence/{brief, brief/demo, transcript/summarize, followup/draft, deal-risk}
- /api/v1/model-router/{providers, tasks, route, cost-class, usage/demo}
- /api/v1/connector-catalog/{catalog, summary, status, risks, {key}}
- /api/v1/agent-observability/{trace/build, safety/eval, tone/eval, evals/run}

Tests (6 new files, 76 tests)
- test_security_curator: 16 tests (PAT detect, key redact, env diff block, payload scan, trace mask)
- test_growth_curator: 16 tests (Arabic grade, risky phrases, dup detect, playbook scoring, mission recommend, weekly report)
- test_meeting_intelligence: 13 tests (transcript parse, brief sections, objection extract, followup drafts, deal risk)
- test_dealix_model_router: 11 tests (every task → ≥1 provider, KSA-region for high sensitivity, cost class, primary override)
- test_agent_observability: 12 tests (trace hashing, safety verdicts, tone scoring, eval pack)
- test_connector_catalog: 11 tests (≥12 connectors, every has risk/blocked actions, WA cold-send blocked, Moyasar card-storage blocked)

Docs (8 new + 1 updated)
- AGENT_SECURITY_CURATOR.md (Arabic)
- GROWTH_CURATOR_STRATEGY.md (Arabic)
- MEETING_INTELLIGENCE.md (Arabic)
- MODEL_PROVIDER_ROUTER.md (Arabic)
- CONNECTOR_CATALOG.md (Arabic)
- AGENT_OBSERVABILITY_EVALS.md (Arabic)
- PRIVATE_BETA_LAUNCH_TODAY.md (Arabic) — go-checklist + offer + risks
- DEMO_SCRIPT_12_MINUTES.md (Arabic) — minute-by-minute demo flow
- FIRST_20_OUTREACH_MESSAGES.md (Arabic) — 7 personas + 3 follow-ups, all under safety/tone evals
- DEALIX_100_PERCENT_LAUNCH_PLAN.md — added §34 Self-Improving Agent Platform + §35 Private Beta Launch

Landing
- landing/private-beta.html — Arabic RTL, dark theme, pricing, 11 demo endpoints, safety banner

Test results
- 76/76 new tests pass
- Full suite: 663 passed, 2 skipped (missing API keys, unrelated)
- 0 existing tests broken

Safety
- All 6 layers honor approval-first, draft-only, no-live-send
- Hash user/company IDs before any trace
- No secrets in logs/embeddings/traces (3-layer defense: redactor + sanitizer + firewall)
- Saudi tone eval rejects "تحية طيبة وبعد" + "synergy" auto-corporate language
- Safety eval blocks "ضمان 100%" + medical claims + fake urgency
- Connector Catalog: WhatsApp blocks cold-send, Moyasar blocks card storage, MCP requires allowlist

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 16:30:18 +03:00

2 Commits