| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Banana

@banana_baeee

Finally broke the 3k token per second input/prompt processing barrier for Qwen 3.5 27B on Spark/GB10 thanks to FlashQLA! Results and steps to reproduce up on @LottoLabs LocalMaxxing here: <a target="_blank" href="https://www.localmaxxing.com/runs/cmouqgx9q00jtld01ajgiran7" color="blue">localmaxxing.com/runs/cmouqgx9q…</a> 3130t/s pp2048 is close to 4x faster than the fastest M5 Max number I could find on Reddit. For long running agents, input token processing can be at least as important as output token processing and Spark shines for that!

Drag Post #2

Banana

@banana_baeee

My DFlash decode optimized numbers are here for 3.6 - quite variable, but can make a big difference. I am hoping to combine the decode and prefill optimizations into one fast 27B dense solution and get the best of both! <a target="_blank" href="https://www.localmaxxing.com/runs/cmomgvsoo0007jj04ea52zhz1" color="blue">localmaxxing.com/runs/cmomgvsoo…</a>

Drag Post #3

Banana

@banana_baeee

My reproduction repositories are here if you want to try this yourself! (Though I hope that ultimately a lot of these sorts of optimizations become vLLM defaults in the future) <a target="_blank" href="https://github.com/my-other-github-account/spark-bench-reproducers" color="blue">github.com/my-other-githu…</a>

Drag Post #4

Banana

@banana_baeee

Still lots of room for optimization here, I’m still using generic cutlass NVFP4 kernels instead of something GB10 optimized - and I crudely hacked FlashQLA in so I’m positive there’s headroom there when someone smart gets better, official vLLM support for GB10 in there. GB10 has a lot of potential if the software can catch up!