Canvas & Ratio
Choose your destination platform format
Layout Template
Choose a content structure for your slides
Preset Themes
Typography & Sizing
Brand Kit Customization
AGENCYConfigure brand assets for headers & footers
Outro Slide CTA
Customize your closing call-to-action slide
Background Pattern
Build Your Carousel
Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Finally broke the 3k token per second input/prompt processing barrier for Qwen 3.5 27B on Spark/GB10 thanks to FlashQLA! Results and steps to reproduce up on @LottoLabs LocalMaxxing here: <a target="_blank" href="https://www.localmaxxing.com/runs/cmouqgx9q00jtld01ajgiran7" color="blue">localmaxxing.com/runs/cmouqgx9q…</a> 3130t/s pp2048 is close to 4x faster than the fastest M5 Max number I could find on Reddit. For long running agents, input token processing can be at least as important as output token processing and Spark shines for that!

My DFlash decode optimized numbers are here for 3.6 - quite variable, but can make a big difference. I am hoping to combine the decode and prefill optimizations into one fast 27B dense solution and get the best of both! <a target="_blank" href="https://www.localmaxxing.com/runs/cmomgvsoo0007jj04ea52zhz1" color="blue">localmaxxing.com/runs/cmomgvsoo…</a>

My reproduction repositories are here if you want to try this yourself! (Though I hope that ultimately a lot of these sorts of optimizations become vLLM defaults in the future) <a target="_blank" href="https://github.com/my-other-github-account/spark-bench-reproducers" color="blue">github.com/my-other-githu…</a>

Still lots of room for optimization here, I’m still using generic cutlass NVFP4 kernels instead of something GB10 optimized - and I crudely hacked FlashQLA in so I’m positive there’s headroom there when someone smart gets better, official vLLM support for GB10 in there. GB10 has a lot of potential if the software can catch up!