Product Block 01 · Sellable today

Bilingual Voice Receptionist

Answer every call. In Spanish or English. Book the appointment. One vendor. $400/mo.

Deploy: 1 day with the self-bootstrap plan $1,500 setup $400/mo + per-min usage (cost-plus 30%)

Live demo Try the demo → https://voice.cafecito-ai.com/

Existing live product → GitHub mirror → Source in monorepo →

Best fit: Any Miami SMB whose phone rings during business hours and someone misses calls — HVAC, dental, salons, restaurants, law firms, contractors, condo management. ~70% of the hot list qualifies.

⚡ Self-bootstrap · paste into Claude Code or Codex

Bilingual Voice Receptionist — build it without writing code

Drop the prompt below into Claude Code (`/plan` mode) or Codex. The agent will source the prospect, build the receptionist, deploy it, test it, and hand you the demo Loom + cold-pitch draft. Your job becomes "approve at the gates and make the sales call."

You provide

You provide: (1) a prospect slug from /new-hire/prospects, (2) the prospect owner's email (so the agent can address the cold pitch). That's it.

You get back

You get: a live Twilio phone number, a working booking flow, a demo Loom recording, a draft cold pitch + WhatsApp, and a 1-line Slack-ready summary. You make the sales call.

Runtime & cost

Roughly 90 minutes wall-clock with you watching at the approval gates. Costs ~$2-3 in OpenAI Realtime tokens during testing + $1.15/mo for the Twilio number.

📋 Copy the entire block below into Claude Code (`/plan`) or Codex

You are building a bilingual voice receptionist (Block 01 in the Cafecito AI new-hire playbook). The full block reference lives at https://cafecito-ai.com/new-hire/blocks/01-bilingual-voice-receptionist — read it first, then execute the plan below. Use plan mode (`/plan` in Claude Code, or equivalent in Codex). Stop at every approval gate marked [GATE].

INPUTS YOU NEED FROM THE HUMAN (ask before doing anything else):
1. Prospect slug from https://cafecito-ai.com/new-hire/prospects (e.g. "garrido-hvac")
2. Prospect owner's email (for the cold-pitch draft)

ENVIRONMENT (already in place — verify, don't change):
- Working dir: /home/eratner/cafecito-ai
- Cloudflare account: f7a9b24f679e1d3952921ee5e72e677e (cafecito-ai worker)
- Wrangler authenticated via OAuth (`npx wrangler whoami` should show the user)
- Reference scaffold: https://github.com/openai/openai-realtime-twilio-demo
- Reference live build (old stack): https://cafecito-ai.com/suite-air/

SECRETS YOU MUST CONFIRM EXIST (don't try to create — ask the human if missing):
- OPENAI_API_KEY (with Realtime access — the human's OpenAI org)
- TWILIO_SID, TWILIO_TOKEN (cafecito-ai master account)
- CAL_API_KEY (will be generated per-prospect in step 4)

THE PLAN — execute step by step, asking for approval at each [GATE]:

STEP 1 — RESEARCH THE PROSPECT (5 min)
- Fetch /new-hire/prospects, find the prospect by slug, extract: business name, address, phone, primary services, any URLs.
- Fetch their public website. Extract: hours, top services, language(s) used on the site, owner name if stated, escalation phone (after-hours number).
- If the site has fewer than 3 services explicitly listed: ask the human for them.
- Output a 5-line research summary.
[GATE 1 — show research summary, ask "proceed?"]

STEP 2 — STAND UP THE WORKER (10 min)
- Create directory /home/eratner/cafecito-ai/voice-<prospect-slug>/.
- Inside it, scaffold a Cloudflare Worker that bridges Twilio Media Streams ↔ OpenAI Realtime API. Use the prompt at https://cafecito-ai.com/new-hire/blocks/01-bilingual-voice-receptionist Prompt #2 ("CF Worker bridge scaffold") as the source of truth. Generate worker.js + wrangler.jsonc + D1 migration.
- Bindings: D1 VOICE_DB (create the database with `wrangler d1 create voice-<prospect-slug>-db`), KV CALL_DEDUP (create with `wrangler kv namespace create voice-<prospect-slug>-dedup`).
- Use placeholder values for OPENAI_API_KEY etc — set them as secrets in step 5.
[GATE 2 — show worker.js + wrangler.jsonc + the d1 create commands, ask "deploy this scaffold?"]

STEP 3 — GENERATE THE SYSTEM PROMPT (5 min)
- Use Prompt #1 from the block page ("OpenAI Realtime system-prompt builder") with the prospect's research from Step 1 as input.
- Save the generated instructions string + tools JSON into voice-<prospect-slug>/prompt.txt.
- Wire it into worker.js as the session.update payload sent on Realtime connection open.
[GATE 3 — print the generated prompt + tools, ask "this is the voice the prospect's customers will hear — looks right?"]

STEP 4 — CAL.COM SETUP (5 min)
- Ask the human: "What email should the prospect's Cal.com account be created under?" (suggest: <prospect-slug>@cafecito-ai.com or similar — they'll own it post-sale).
- Walk the human through creating the account (manual — Cal.com has no signup API). Tell them to enable the v2 API, create one event-type ("30-min consult", 15-min buffer, accepts walk-ins), and paste the API key + event-type ID back to you.
- Set both as Worker secrets: `wrangler secret put CAL_API_KEY` and `wrangler secret put CAL_EVENT_TYPE_ID` from /home/eratner/cafecito-ai/voice-<prospect-slug>/.
[GATE 4 — confirm Cal.com event-type is live, ask "proceed to deploy?"]

STEP 5 — DEPLOY + TWILIO PROVISIONING (10 min)
- Set the remaining secrets: OPENAI_API_KEY, TWILIO_SID, TWILIO_TOKEN, ESCALATION_PHONE (the prospect's after-hours number from Step 1).
- Set the vars: BUSINESS_NAME, BUSINESS_SLUG, MAX_MONTHLY_MINUTES=4000.
- Apply D1 migration: `wrangler d1 migrations apply voice-<prospect-slug>-db --remote`.
- Deploy: `unset CLOUDFLARE_API_TOKEN && NODE_OPTIONS="--dns-result-order=ipv4first" NODE_EXTRA_CA_CERTS=/etc/ssl/certs/ca-certificates.crt CLOUDFLARE_ACCOUNT_ID=f7a9b24f679e1d3952921ee5e72e677e npx wrangler deploy`.
- Provision a Twilio number via Twilio CLI or REST API. Filter for Miami area code (305 / 786 / 954). Set the Voice webhook to https://voice-<prospect-slug>.<account>.workers.dev/twilio-voice (POST, returns TwiML <Connect><Stream/></Connect>).
- Print the Twilio number prominently.
[GATE 5 — print the live Twilio number, ask "make the first test call now"]

STEP 6 — FIVE-CALL TEST PASS (15-20 min)
- Wait for the human to make 5 test calls (EN-book, ES-book, EN-emergency-transfer, mid-Spanglish, LTE-from-outside).
- Tail the worker logs (`wrangler tail voice-<prospect-slug>`) and the D1 call_log table during testing.
- Ask the human "what failed?" After each failure, edit worker.js or the system prompt, redeploy in 60 seconds, ask them to re-test.
- Loop until all five pass.
[GATE 6 — confirm 5/5 calls passed, ask "ready to record the demo?"]

STEP 7 — RECORD THE DEMO (5 min, human action)
- Tell the human: "Open Loom or QuickTime. Make two recorded calls — one EN→book, one ES→book. Save the URL or the file path."
- Wait for the URL.

STEP 8 — DRAFT THE COLD PITCH (5 min)
- Use Prompt #3 from the block page ("Cold-pitch outreach") with: research from Step 1, Twilio number from Step 5, demo Loom URL from Step 7, prospect owner email (from initial inputs).
- Output: subject line + email body + WhatsApp message. Don't send — present to human for review.
[GATE 7 — show the draft, ask the human to approve / edit / send manually]

STEP 9 — SHIP THE SUMMARY (2 min)
- Write a single-line summary suitable for Slack: "[BUSINESS NAME] receptionist live at [TWILIO NUMBER] · demo: [LOOM URL] · pitch sent to [EMAIL] at [TIME]."
- Append the prospect to a "shipped" log at /home/eratner/cafecito-ai/voice-shipped.md.
- Print the summary + the next-prospect suggestion ("Run me again with a different slug from /new-hire/prospects.").

DONE. End the plan.

GUARDRAILS (apply throughout):
- Never deploy with OPENAI_API_KEY missing — fail loudly.
- Never silently absorb a 5xx — alert the human.
- Never bypass [GATE] checkpoints, even if you think you're confident.
- If a step takes more than 2x its estimated time, stop and escalate.
- Cost ceiling for testing: $5 in OpenAI tokens. If you hit it, stop and escalate.

META: this plan IS the new hire's job for the first 30 days. Once they've watched it run a few times, they'll be the human at the [GATE]s — not the executor. The goal is to push the executor role onto the agent and keep the human at the approval + sales-call layer. The next plan in the series builds the SOURCER agent (replaces /new-hire/prospects with a cron that finds + scores prospects automatically) and the BUILDER agent (runs THIS plan unattended until the demo Loom step, where it pings the human for the sales handoff).

01Stack▾

OpenAI Realtime API (voice + brain + ES/EN auto-detection + function calling — gpt-realtime model)
Twilio (phone number $1.15/mo + Media Streams to bridge audio)
Cloudflare Worker (WebSocket bridge + Cal.com booking handler + D1 call log)
Cal.com (free tier — booking)

02Live references▾

Working production examples. Read the code. Steal the pattern.

Suite Air (Lucia) Live HVAC receptionist · +1-954-858-5311 · current-gen VAPI build (the v0). The OpenAI Realtime version replaces this stack — read the system prompt structure here, ignore the orchestration layer.
OpenAI Realtime + Twilio reference Official OpenAI reference for the Twilio Media Streams ↔ Realtime API bridge. Clone it as the starting scaffold.
Cal.com booking API V2 API. POST /bookings creates an appointment. Use the API key from a fresh Cal.com account in the prospect's name.

03Day-1 plan▾

A real prospect. A real demo. A real outbound message — all before 5pm.

09:00–09:15 Drop the SELF-BOOTSTRAP plan into Claude Code, pick a prospect.
Open this page, copy the green "Self-bootstrap" plan box at the top, paste into Claude Code in plan mode. The plan asks you for the prospect slug + an email address — that's the only input. Pick from /new-hire/prospects.
09:15–10:30 Watch Claude Code execute the plan: research, scaffold, deploy.
The agent: scrapes the prospect site for hours/services/language, clones the OpenAI Realtime + Twilio reference, generates the prompt for that business, sets up Cal.com via API, deploys a Cloudflare Worker bridge, provisions a Twilio number. You watch and approve at the gates the plan flags.
10:30–11:00 Make the first real call from your phone.
The agent prints the new Twilio number. Call it. The OpenAI Realtime model picks up, greets in English, switches to Spanish if you do, books a slot via Cal.com function-calling. Verify the booking landed in the prospect-named Cal.com account.
11:00–12:00 Five-call quality pass + tone tuning.
Call EN → book. Call ES → book. Call EN → "this is an emergency" → assistant transfers cleanly with handoff sentence. Call mid-Spanglish. Call from outside on LTE. Tell Claude Code which calls failed — it edits the system prompt + redeploys in 60 seconds per fix.
12:00–13:00 Lunch. Let the build season.
Walk away. Come back, listen to the call recordings (D1 logs them, the dashboard plays them). Anything that grates listened back, fix it after lunch.
13:00–14:00 Record the demo Loom.
Two takes: one EN→book, one ES→book. The recording IS the demo. The Loom + the live number = the entire pitch artifact.
14:00–15:00 Have the agent draft the cold pitch.
Tell Claude Code "draft the cold pitch + WhatsApp to <prospect owner email>." It pulls from this page's Cold-pitch prompt + the prospect research from step 09:15. You read it, edit one sentence, you send.
15:00–16:00 Send the cold pitch. Get back to sourcing the next prospect.
Email + WhatsApp the prospect. Then come back to /new-hire/prospects and queue the next one — the agent can run the same plan against it tomorrow morning. The point is: you graduate from doing the build to running the build queue.
16:00–17:00 Document what the agent struggled with. Update the plan.
Open this page. Edit the Self-bootstrap plan box (it's in /home/eratner/cafecito-ai/new-hire/blocks/data.js). Add the friction you hit. Push the change. Tomorrow's build run is faster than today's. Compounding.

04Best practices & gotchas▾

Always build for a NAMED prospect, not a generic vertical.

Why: Suite Air closes because the demo says "Hi, this is Suite Air, how can I help?" — not "Hi, this is HVAC Company." Specificity is the entire wedge. The self-bootstrap plan refuses to run without a prospect slug for this reason.
Use one voice (sage or alloy) across both languages.

Why: OpenAI Realtime ships multilingual voices that don't need a separate ElevenLabs layer. "sage" is the warmest; "alloy" is the most neutral and works well for professional services. Pick one, don't switch by language — brand consistency beats accent perfection.
Make the EN↔ES switch silent — never have the model announce it.

Why: Realtime's default behavior is to acknowledge a language switch ("Of course, we can speak in Spanish"). Bake into the system prompt: "If the caller speaks Spanish, switch silently. Never announce or comment on the language change." Customers shouldn't hear the seam.
Function calling for the booking — not "I'll have someone call you."

Why: The Realtime API supports function calling mid-conversation. Define a `book_appointment` function the model invokes synchronously while the caller is on the line. The booking confirmation comes back in 1.5 seconds and the model speaks it naturally. The "we'll get back to you" pattern is what makes voicemail bad — don't recreate it.
The escalation prompt is the most important field.

Why: A failed booking is not a failed call. A failed transfer is. The system prompt must give the model a hard escalation rule: "If the caller says ANY of [list of trigger words: emergency, leak, broken, lawsuit, sue, attorney], or if you fail twice to extract the appointment fields, IMMEDIATELY call the `transfer_to_human` function with a 1-sentence summary."
Never quote prices the customer can argue with later.

Why: If the business has dynamic pricing, the assistant says "I'll send a tech for an estimate" not "$95 service call." Wrong price = lost trust + lost sale. Bake "NEVER quote prices for [list of services]" into the prompt as a hard rule, not a guideline.
Cap usage on the customer side from day one.

Why: OpenAI Realtime runs $0.06/min input + $0.24/min output ≈ $0.30/min for a 50/50 conversation. At 4,000 min/mo that's $1,200 in pass-through alone. Either price as $400/mo + cost-plus 30% on usage with a usage alert at the cap, or set a hard monthly minute cap in the Worker. Never silently absorb runaway pass-through.
Test on a real iPhone over LTE, never WiFi-only.

Why: Twilio Media Streams + Realtime adds 200-400ms of latency over the OpenAI direct call. WiFi hides the latency that breaks turn-taking. Walk outside, dial, listen. If the demo only works on WiFi, the demo is fake.

05Prompts (copy-paste)▾

Drop these into Claude Code. Replace the [BRACKETED] fields with the prospect's details.

Prompt 1 OpenAI Realtime system-prompt builder

Generates the system prompt for the OpenAI Realtime session, tailored to the specific prospect, with hard rules for language switching, function calling, and escalation.

I'm building an OpenAI Realtime API voice receptionist for [BUSINESS NAME] in Miami. Their context:
- Industry: [HVAC / dental / salon / law firm / etc.]
- Hours: [9-6 M-F or 24/7]
- Top services: [list 3-7 most-asked-for services]
- Default language: English. Switch silently to Spanish if the caller speaks Spanish.
- Booking: Cal.com event-type ID [N] in the prospect's Cal.com account
- Escalation phone (for emergencies / out-of-scope / angry callers): [PHONE]
- Emergency trigger words (any → immediate transfer): emergency, leak, broken, lawsuit, sue, attorney, arrest
- Things the receptionist will NEVER do: [e.g. quote service-call prices, promise weekend service, accept payments over the phone]

Write the OpenAI Realtime "instructions" string for the session.update message. Hard rules to bake in:
1. Greet in English first. NEVER announce a language switch. If the caller's first word is Spanish, silently respond in Spanish and continue in Spanish for the rest of the call.
2. Ask the 3 triage questions in this order: (a) urgency (today / this week / flexible), (b) service type, (c) ZIP code or address.
3. For routine requests: call the function `book_appointment` with the extracted fields. Confirm the booking time naturally after the function returns.
4. For ANY trigger word OR if you fail twice to extract appointment fields: call the function `transfer_to_human` with a 1-sentence summary of the situation. Do not retry.
5. NEVER quote prices for non-fixed services. Say "a tech will confirm pricing on-site" if asked.
6. End every successful booking call with: "You'll get a text confirmation in about 30 seconds with the time and a callback number. Anything else?"
7. Voice: sage. Speed: 1.0. Use natural pauses; don't over-fill with verbal acknowledgments.

Also output the JSON block defining the two functions (book_appointment, transfer_to_human) as the OpenAI Realtime tools array.

Keep the instructions string under 1,800 tokens. Output: instructions string + tools JSON, both labeled.

Prompt 2 CF Worker bridge scaffold (Twilio Media Streams ↔ OpenAI Realtime)

Scaffolds the Cloudflare Worker that bridges a Twilio inbound call audio stream to an OpenAI Realtime session, handles function calls (booking + transfer), and logs to D1.

Build me a Cloudflare Worker that bridges Twilio Media Streams to the OpenAI Realtime API for [BUSINESS NAME].

Architecture:
- Twilio inbound call → Twilio Voice webhook → Worker returns TwiML <Connect><Stream url="wss://<worker>.workers.dev/twilio-stream"/></Connect>
- Twilio opens WebSocket to /twilio-stream → Worker accepts via WebSocketPair, opens parallel WebSocket to OpenAI Realtime (wss://api.openai.com/v1/realtime?model=gpt-realtime).
- Worker pipes audio frames bidirectionally (g711_ulaw both directions; OpenAI Realtime supports g711_ulaw input/output natively).
- Worker handles function-call events from OpenAI:
  - book_appointment(name, phone, urgency, service_type, zip) → POST to Cal.com v2 API → return result JSON to model
  - transfer_to_human(summary) → mark for transfer in D1, send TwiML redirect via Twilio REST API to dial [ESCALATION_PHONE], speak the summary first
- On call.ended: write call_log entry to D1 with caller phone, duration, intent, outcome, language, transcript (from response.audio_transcript.done events).
- On any error: fall through to TwiML <Say>"One moment — connecting you to a person"</Say><Dial>[ESCALATION_PHONE]</Dial>. Never let a 5xx kill the call silently.

Bindings expected:
- D1: VOICE_DB (call_log table — see migration below)
- KV: CALL_DEDUP (24h TTL on Twilio CallSid for idempotency)
- Secrets: OPENAI_API_KEY, TWILIO_SID, TWILIO_TOKEN, CAL_API_KEY, CAL_EVENT_TYPE_ID, BUSINESS_SLUG, ESCALATION_PHONE
- Vars: BUSINESS_NAME, MAX_MONTHLY_MINUTES (cap for cost control — default 4000)

Cost-cap logic: before each call.started, check sum of duration_seconds in call_log for current month. If > MAX_MONTHLY_MINUTES * 60, immediately transfer to human + alert via webhook.

Reference the OpenAI/openai-realtime-twilio-demo repo on GitHub for the WebSocket bridging pattern. Use Hono for HTTP routing.

Output:
1. Complete worker.js (single file, ESM, Hono)
2. wrangler.jsonc additions (D1 + KV bindings + secrets + vars)
3. D1 migration: call_log (id, ts, twilio_call_sid, caller_phone, duration_seconds, language, intent, outcome, transcript_text, business_slug)
4. The deploy command for our environment (OAuth-based wrangler deploy from /home/eratner/cafecito-ai/)

Make it ship-ready — no TODOs, no placeholder stubs.

Prompt 3 Cold-pitch outreach

Generates the email + WhatsApp message you send the prospect after Day-1 build is recorded.

Write a 4-sentence cold outreach email + a 2-sentence WhatsApp follow-up for [BUSINESS OWNER NAME] at [BUSINESS NAME] in Miami.

I built them a working bilingual voice receptionist as a demo. The receptionist's number is [TWILIO_NUMBER]. The demo Loom is at [LOOM_URL]. Pricing: $1.5k setup, $400/mo, cancel anytime.

The email should:
- Open with a specific observation about THEIR business (not "I noticed Miami HVAC companies…") — pull from their site/Google reviews
- State exactly what I built and that they can call it right now
- Name the price plainly
- Ask one yes/no question that makes "yes" easy

The WhatsApp should:
- Be conversational, not corporate
- Reference the email
- Include the demo Loom URL
- Offer to swap out the voice for the owner's voice on a 90-second sample

No emojis. No "I hope this finds you well." No "circle back."

Prompt 4 90-second Loom demo script

Tight script for the Loom you record as the proof artifact. Reusable per-prospect.

Write me a 90-second Loom script for a cold demo of a bilingual voice receptionist I built for [PROSPECT BUSINESS NAME].

Structure:
- 0-10s: hook ("I called your number at 7:15pm last night. Here's what happened. [voicemail clip]. Now call this number.")
- 10-35s: live call demo — I dial, receptionist answers, I speak in Spanish, books an appointment
- 35-60s: live call demo — I dial again, English, mark as emergency, the assistant transfers cleanly with a handoff sentence
- 60-80s: the price ($1.5k setup / $400/mo, cancel anytime — about 1/8th the cost of a part-time bilingual receptionist)
- 80-90s: the close ("If you want this answering your phone tomorrow morning, reply YES and I'll port your number tonight.")

Output as 3-second-paced bullets I can read off without rehearsal. No filler. No "today I'm going to show you."

06Selling script▾

Discovery question (ask this first)

"How many calls do you miss during lunch, after 6pm, and on weekends? When you do answer them on your cell — how often is it actually a customer ready to book vs. a tire-kicker?"

The frame

A part-time bilingual receptionist runs $1,800–2,400/mo in Miami and quits after eight months. This costs $400/mo, never sleeps, never takes a break, and already speaks both your customers' languages on day one. The setup pays for itself the first time it books a $300 service call you would have missed.

The demo play

Do the live call ON the discovery call. Don't share screen, don't show a slide. Three-way the prospect into the live Twilio number from your phone. They hear it answer in their industry's voice. That's the demo. Five seconds of silence after the booking confirmation is more persuasive than any pitch deck.

Objections

"Can it really handle Spanish?"

"Call it right now and order a service in Spanish. I'll wait." (Then actually wait. Don't fill the silence.)
"What if it screws up?"

"It transfers to your cell. YOU decide what counts as an emergency. Show me your worst call from last week and I'll show you how this would have handled it."
"My customers want a human."

"Your customers want their problem solved. The fastest path to a human for the calls that need one is a 24/7 triage that knows which 20% of calls are real emergencies. Right now, all 100% go to voicemail."
"$400/mo is a lot."

"It's 8 hours of receptionist time at $50/hr. The receptionist works 200 hours a month. The math isn't close."

The close

"$1.5k now puts it live this week. $400/mo from the day it answers your first call. If it doesn't pay for itself in 30 days, kill it — no contract."

07Pricing notes▾

Anchor on local labor cost ($40-50k/yr for a bilingual front desk in Miami). New stack pricing reality: OpenAI Realtime is ~$0.30/min for a balanced conversation (vs ~$0.08/min on the old VAPI+ElevenLabs stack). Twilio ~$0.013/min. So total per-minute cost ~$0.32. Charge customer cost-plus 30% on usage with a hard cap (default 4,000 min/mo = ~$1,500 pass-through). Setup includes: Twilio number, OpenAI Realtime session config + system prompt, Cal.com wiring, CF Worker bridge deployment, 1hr training calls. Setup does NOT include: CRM integration (that's Block 09), multi-location routing, or custom voice cloning. Upsell path: 24/7 hotline with after-hours triage logic → $1k/mo for multi-location dental, law, large HVAC. The simpler stack means 1-day deploys (vs 2 prior) and less to break.