Built with ElevenLabs

We didn’t build this alone.

Vought is a thin layer of taste on top of an extraordinary stack. ElevenLabs is the engine. Vercel and Render keep it running. The source is public — read every line.

View on GitHub Read the architecture →

Lead partner

ElevenLabs

The pioneer that made the rest of the loop possible.

ElevenLabs didn’t just ship better TTS. They moved the entire field — voice quality past the uncanny valley, voice cloning from seconds of audio instead of hours, and now Speech Engine: the first primitive that puts STT, TTS, and turn detection on one socket. Every voice product built after them inherits the floor they raised.

We tried the obvious alternative — a Whisper transcription service, a frontier LLM, and a separate TTS vendor stitched together. End-to-end latency landed near two seconds. Voices drifted. Interrupting the AI mid-sentence required custom plumbing. Speech Engine collapsed all three into a single primitive and the loop dropped to well under a second on the first try.

Vought×ElevenLabs · the live loop

I/O

Operator mic

WebRTC, 16 kHz PCM

VOUGHT

Diart sidecar

self vs. other gate

ELEVENLABS

Speech Engine

STT · TTS · turn detection

VOUGHT

Echo Engine

LLM stream, AbortSignal

ELEVENLABS

Cloned voice

eleven_flash_v2 · 412ms

Animated edges are live audio and streaming LLM tokens. Static edges are control signals. The diagram is faithful — these are the exact services and the exact direction of flow in production.

One socket, three jobs.

Speech Engine collapses STT, TTS, and turn detection into a single connection. No round-trips between a transcription vendor, a thinking layer, and a synthesis vendor — the integration tax is gone.

TTS first byte under 300 ms.

eleven_flash_v2 ships the first audio chunk faster than most pipelines finish thinking. That is the entire reason Vought feels like a whisper instead of a robot.

Voice cloning from 30 seconds.

The product’s wow moment — the AI speaking in your voice — exists because cloning is a 30-second capture and a single API call, not a multi-week studio session.

WebRTC, browser-native.

No SIP gateway, no audio bridge, no native client. The same socket runs in a hackathon laptop browser and a production deployment unchanged.

Zero-retention by default.

Sessions are ephemeral. We opted into retention_days = -1 at engine creation so transcripts and audio never persist on their side. Compliance gets shorter, not longer.

AbortSignal-aware streams.

When the operator interrupts the whisper, the same AbortController that cancels the LLM also closes the TTS stream cleanly. End-to-end interrupt in under 200 ms.

The integration, in five lines

await elevenlabs.speechEngine.attach(SPEECH_ENGINE_ID, httpServer, '/ws', {
  onTranscript: async (transcript, signal, session) => {
    const stream = await llm.chat({ signal, ... });   // AbortSignal threaded
    await session.sendResponse(stream);                // STT → LLM → TTS
  },
});

elevenlabs.io

Co-partners

The infra that turns “works on my laptop” into production.

Vercel

Marketing site · Product app

Both Next.js apps — the marketing site at vought.com and the product app at app.vought.com — ship to Vercel on every push to main. Preview deployments are how the design review loop stays under five minutes.

vercel.com

Render

Echo Engine · Diarization sidecar · Postgres · Redis

The Node Echo Engine, the Python diart sidecar, and the managed Postgres + Redis instances all run on Render. One Dockerfile per service, deployed from the same monorepo with zero glue.

render.com

Architecture

Frameworks, services, and where each Dockerfile lives.

Frameworks

Next.js 15Both apps (App Router)
React 18UI runtime
Tailwind 3Design tokens
Framer MotionThe four signature motions
ZustandRealtime session state
TurborepoMonorepo orchestration
Node.js 20Echo Engine runtime
Python 3.11 + diartDiarization sidecar
Postgres 15 + pgvectorPlaybook RAG
RedisLive session memory

Repo · vought-os

vought-os/
├── apps/
│   ├── web/                  → Vercel (marketing)
│   └── app/                  → Vercel (product)
├── services/
│   ├── echo-engine/          → Render (Node, Dockerfile)
│   └── diarization-sidecar/  → Render (Python, Dockerfile)
├── packages/
│   ├── design-system/        → tokens + global.css
│   ├── motion/               → the four signature motions
│   └── ui/                   → Container, Grid, primitives
├── docker-compose.yml        → local Postgres + Redis
└── turbo.json                → pipelines

Each service has its own Dockerfile, its own deployment target, and its own scaling story. The monorepo glue is Turborepo pipelines and a shared design-system package — nothing exotic.

Source

Read the code. It’s all open.

The Echo Engine, the diarization sidecar, the live-call screen, the voice-clone onboarding, the marketing site — every file is on GitHub.

github.com/Sushant6095/vought-os