| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Akshay 🚀

@akshay_pachaar

Let's build a real-time Voice RAG Agent, step-by-step:

Drag Post #2

Akshay 🚀

@akshay_pachaar

Before we begin, here's a quick demo of what we're building Tech stack: - @Cartesia_AI for SOTA text-to-speech - @AssemblyAI for speech-to-text - @LlamaIndex to power RAG - @livekit for orchestration Let's go! 🚀

VIDEO

Apply Image

Drag Post #3

Akshay 🚀

@akshay_pachaar

Here's an overview of what the app does: 1. Listens to real-time audio 2. Transcribes it via AssemblyAI 3. Uses your docs (via LlamaIndex) to craft an answer 4. Speaks that answer back with Cartesia Now let's jump into code!

Drag Post #4

Akshay 🚀

@akshay_pachaar

1️⃣ Set up environment and logging This ensures we can load configurations from .env and keep track of everything in real time. Check this out👇

Apply Image

Drag Post #5

Akshay 🚀

@akshay_pachaar

2️⃣ Setup RAG This is where your documents get indexed for search and retrieval, powered by LlamaIndex. The agents answers would be grounded to this knowledge base. Check this out👇

Apply Image

Drag Post #6

Akshay 🚀

@akshay_pachaar

3️⃣ Setup Voice Activity Detection We also want Voice Activity Detection (VAD) for smooth real-time experience—so we’ll “prewarm” the Silero VAD model. This helps us detect when someone is actually speaking. Check this out👇

Apply Image

Drag Post #7

Akshay 🚀

@akshay_pachaar

4️⃣ The VoicePipelineAgent and Entry Point This is where we bring it all together. The agent: 1. Listens to real-time audio. 2. Transcribes it using AssemblyAI. 3. Crafts an answer with your documents via LlamaIndex. 4. Speaks that answer back using Cartesia. Check this out 👇

Apply Image

Drag Post #8

Akshay 🚀

@akshay_pachaar

5️⃣ Run the app Finally, we tie it all together. We run our agent with, specifying the prewarm function and main entrypoint. That’s it—your Real-Time Voice RAG Agent is ready to roll!

Apply Image

Drag Post #9

Akshay 🚀

@akshay_pachaar

The entire code is 100% open-source, you can find it here! GitHub repo: <a target="_blank" href="https://github.com/patchy631/ai-engineering-hub/tree/main/rag-voice-agent" color="blue">github.com/patchy631/ai-e…</a>

Drag Post #10

Akshay 🚀

@akshay_pachaar

That's a wrap! If you enjoyed this breakdown: Follow me → @akshay_pachaar ✔️ Every day, I share insights and tutorials on LLMs, AI Agents, RAGs, and Machine Learning!