| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Harsh

@ranaharshraj7

This was during campus placements-dec'24 (freshers take notes). CTC : 84 LPA (including esops) Disclaimer : No DSA was asked

Apply Image

Drag Post #2

Harsh

@ranaharshraj7

To get an interview call, we had to build a VAD (Voice Activity Detector) from scratch in 2.5 hours on-site (with proctorship), although we were allowed any tool we could use except any external api's (I do remember @ChatGPTapp giving me hallucinated responses that I had to go back to docs.) Dataset was provided (~50 audio files). We were judged on : 1) Accuracy of speech detection 2) Code quality 3) Possible improvements to the approach that we couldn't implement. Also any kind of architecture was welcome for building VAD, I went with Denoiser + WebRTC (GMM based) approach as I knew it would give the highest accuracy and they had the highest weightage for the same. 7 got shortlisted and I was one among them. The interview was led by the head of ASR team.

Drag Post #3

Harsh

@ranaharshraj7

---

Drag Post #4

Harsh

@ranaharshraj7

We started with my internship experience at Tokyo where I led the ASR, VAD and open source LLM's integration for a company which were into warehouse management robots, and pivoting into adding speech functionalities into the robots. We discussed : > how I patched the WER using NLP to correct/ fill in the gaps if voice breaks in between. > what VAD architecture I used > how did I reduce CPU/GPU load How I used different @OpenAI whisper models to get p95 latency <800ms. and high level scaling methodologies I used to benchmark and stress test STT models. Then we moved onto Ml and transformer's basics (because I was more into LLM's) : > explain whisper-jax architecture and how it processes audio chunks > coding naive gradient descent from scratch on docs (<s>as @GoogleColab was auto completing for me lmao</s>) > explain perplexity and what other benchmarks do we use for LLM's > touched self attention, differences between encoder - decoder architecture and that day i realized that almost all the new SOTA models are decoder only > He also went into a deep discussion as how we can relate linear algebra with transformers (<s>I took a LinAl course</s>) At last, we discussed @SarvamAI Bulbul models, especially why they use latent space decomposition and how that helps separate speech content from speaker/style representations. **PS: No tokens were harmed in writing this.