MVP architecture

Design notes for the first usable build

This MVP is optimized for Vercel deployment and travel-style two-person conversations in North America.

Why no server audio WebSocket?

Vercel Functions are not a good home for a long-running audio relay. The MVP therefore uses Vercel for room creation, QR links, short-lived provider tokens, and polling-based WebRTC signaling. Live audio flows browser-to-browser, then each listener opens a provider sidecar for the remote audio track.

Provider plan

Primary provider: gpt-realtime-translate. Backup provider: Gemini Live with gemini-3.1-flash-live-preview. Both API keys stay on the server; browsers receive only short-lived session credentials.

Production prerequisites

Set OPENAI_API_KEY. Set GEMINI_API_KEY if you want backup. Set UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN before deploying, otherwise rooms use local in-memory signaling and are not reliable across Vercel instances.