| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Turing Post

@TheTuringPost

Good answers follow good reasoning VeriFree is a new method that keeps the benefits of reinforcement learning (RL) but gets rid of a verifier model and rule-based checking. It trains the model to get closer to a known good answer, called a reference answer. Benefits: • It's faster and simpler • Requires less compute • Is more stable Here's how VeriFree works🧵

Apply Image

Drag Post #2

Turing Post

@TheTuringPost

1. Step-by-step VeriFree workflow: • The model generates a reasoning trace. • The final answer isn't checked directly. • Instead, it's checked how likely the model is to generate the correct answer based on its reasoning. • That likelihood becomes the reward. It's higher if the model seems confident in the correct answer.

Drag Post #3

Turing Post

@TheTuringPost

2. Smart tokenization: Splitting the model’s response into <reasoning> and <answer> needs to be done carefully. Instead of splitting at "<answer>", researchers decided stop at "<answer" without the closing bracket. This avoids token mismatches and keeps training stable.

Drag Post #4

Turing Post

@TheTuringPost

3. VeriFree's working process also allows to train the model using just one correct answer per question, without needing to generate and verify each possible answer during training.

Drag Post #5

Turing Post

@TheTuringPost

4. Why is VeriFree more stable? It skips sampling the final answer — the system just calculates the chance the model would say the right thing. This removes randomness and makes the learning process more efficient.

Apply Image

Drag Post #6

Turing Post

@TheTuringPost

VeriFree reinforces answers that follow good reasoning. If the reasoning is off, the training signal gets weaker. This helps the model learn to reason better, not just guess answers. Paper: <a target="_blank" href="https://arxiv.org/abs/2505.21493" color="blue">arxiv.org/abs/2505.21493</a> Code: <a target="_blank" href="https://github.com/sail-sg/VeriFree" color="blue">github.com/sail-sg/VeriFr…</a>

Apply Image