Built with ElevenLabs

We didn’t build this alone.

Vought is a thin layer of taste on top of an extraordinary stack. ElevenLabs is the engine. Vercel and Render keep it running. The source is public — read every line.

Lead partner
ElevenLabsElevenLabs

The pioneer that made the rest of the loop possible.

ElevenLabs didn’t just ship better TTS. They moved the entire field — voice quality past the uncanny valley, voice cloning from seconds of audio instead of hours, and now Speech Engine: the first primitive that puts STT, TTS, and turn detection on one socket. Every voice product built after them inherits the floor they raised.

We tried the obvious alternative — a Whisper transcription service, a frontier LLM, and a separate TTS vendor stitched together. End-to-end latency landed near two seconds. Voices drifted. Interrupting the AI mid-sentence required custom plumbing. Speech Engine collapsed all three into a single primitive and the loop dropped to well under a second on the first try.

Vought×ElevenLabsElevenLabs · the live loop

Animated edges are live audio and streaming LLM tokens. Static edges are control signals. The diagram is faithful — these are the exact services and the exact direction of flow in production.

One socket, three jobs.

Speech Engine collapses STT, TTS, and turn detection into a single connection. No round-trips between a transcription vendor, a thinking layer, and a synthesis vendor — the integration tax is gone.

TTS first byte under 300 ms.

eleven_flash_v2 ships the first audio chunk faster than most pipelines finish thinking. That is the entire reason Vought feels like a whisper instead of a robot.

Voice cloning from 30 seconds.

The product’s wow moment — the AI speaking in your voice — exists because cloning is a 30-second capture and a single API call, not a multi-week studio session.

WebRTC, browser-native.

No SIP gateway, no audio bridge, no native client. The same socket runs in a hackathon laptop browser and a production deployment unchanged.

Zero-retention by default.

Sessions are ephemeral. We opted into retention_days = -1 at engine creation so transcripts and audio never persist on their side. Compliance gets shorter, not longer.

AbortSignal-aware streams.

When the operator interrupts the whisper, the same AbortController that cancels the LLM also closes the TTS stream cleanly. End-to-end interrupt in under 200 ms.

The integration, in five lines
await elevenlabs.speechEngine.attach(SPEECH_ENGINE_ID, httpServer, '/ws', {
  onTranscript: async (transcript, signal, session) => {
    const stream = await llm.chat({ signal, ... });   // AbortSignal threaded
    await session.sendResponse(stream);                // STT → LLM → TTS
  },
});
elevenlabs.io
Co-partners

The infra that turns “works on my laptop” into production.

Vercel
Marketing site · Product app

Both Next.js apps — the marketing site at vought.com and the product app at app.vought.com — ship to Vercel on every push to main. Preview deployments are how the design review loop stays under five minutes.

vercel.com
Render
Echo Engine · Diarization sidecar · Postgres · Redis

The Node Echo Engine, the Python diart sidecar, and the managed Postgres + Redis instances all run on Render. One Dockerfile per service, deployed from the same monorepo with zero glue.

render.com
Architecture

Frameworks, services, and where each Dockerfile lives.

Frameworks
  • Next.js 15Both apps (App Router)
  • React 18UI runtime
  • Tailwind 3Design tokens
  • Framer MotionThe four signature motions
  • ZustandRealtime session state
  • TurborepoMonorepo orchestration
  • Node.js 20Echo Engine runtime
  • Python 3.11 + diartDiarization sidecar
  • Postgres 15 + pgvectorPlaybook RAG
  • RedisLive session memory
Repo · vought-os
vought-os/
├── apps/
│   ├── web/                  → Vercel (marketing)
│   └── app/                  → Vercel (product)
├── services/
│   ├── echo-engine/          → Render (Node, Dockerfile)
│   └── diarization-sidecar/  → Render (Python, Dockerfile)
├── packages/
│   ├── design-system/        → tokens + global.css
│   ├── motion/               → the four signature motions
│   └── ui/                   → Container, Grid, primitives
├── docker-compose.yml        → local Postgres + Redis
└── turbo.json                → pipelines

Each service has its own Dockerfile, its own deployment target, and its own scaling story. The monorepo glue is Turborepo pipelines and a shared design-system package — nothing exotic.

Source

Read the code. It’s all open.

The Echo Engine, the diarization sidecar, the live-call screen, the voice-clone onboarding, the marketing site — every file is on GitHub.

github.com/Sushant6095/vought-os