Carousel Studio

Repurpose X Threads into LinkedIn & Instagram Carousels

Canvas & Ratio

Choose your destination platform format


Layout Template

Choose a content structure for your slides


Preset Themes


Typography & Sizing

Title Font Size36px
Body Font Size18px
Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)
AGENCY
SAVE PRESETS (AGENCY)

Outro Slide CTA

Customize your closing call-to-action slide

#1
#2
#3

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1
Avi Chawla
@_avichawla

Let's generate our own LLM fine-tuning dataset (100% local):

Drag Post #2
Avi Chawla
@_avichawla

Before we begin, here's what we're doing today! We'll cover: - What is instruction fine-tuning? - Why is it important for LLMs? Finally, we'll create our own instruction fine-tuning dataset. Let's dive in!

Drag Post #3
Avi Chawla
@_avichawla

Once an LLM has been pre-trained, it simply continues the sentence as if it is one long text in a book or an article. For instance, check this to understand how a pre-trained LLM behaves when prompted 👇

Apply Image
Drag Post #4
Avi Chawla
@_avichawla

Generating a synthetic dataset using existing LLMs and utilizing it for fine-tuning can improve this. The synthetic data will have fabricated examples of human-AI interactions. Check this sample👇

Apply Image
Drag Post #5
Avi Chawla
@_avichawla

This process is called instruction fine-tuning. Distilabel is an open-source framework that facilitates generating domain-specific synthetic text data using LLMs. Check this to understand the underlying process👇

Drag Post #6
Avi Chawla
@_avichawla

Next, let's look at the code. First, we start with some standard imports. Check this👇

Apply Image
Drag Post #7
Avi Chawla
@_avichawla

Moving on, we load the Llama-3 models locally with Ollama. Here's how we do it👇

Apply Image
Drag Post #8
Avi Chawla
@_avichawla

Next, we define our pipeline: - Load dataset. - Generate two responses. - Combine the responses into one column. - Evaluate the responses with an LLM. - Define and run the pipeline. Check this👇

Apply Image
Drag Post #9
Avi Chawla
@_avichawla

Once the pipeline has been defined, we need to execute it by giving it a seed dataset. The seed dataset helps it generate new but similar samples. Check this code👇

Apply Image
Drag Post #10
Avi Chawla
@_avichawla

Done! This produces the instruction and response synthetic dataset as desired. Check the sample below👇

Apply Image
Drag Post #11
Avi Chawla
@_avichawla

Here's the instruction fine-tuning process again for your reference. - Generate responses from two LLMs. - Rank the response using another LLM. - Pick the best-rated response and pair it with the instruction. Check this👇

Drag Post #12
Avi Chawla
@_avichawla

That's a wrap! If you enjoyed this tutorial: Find me → @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.