@askalphaxiv: Your Base Model is Smarter Tha...

@askalphaxiv
10 views Oct 25, 2025
1
Your Base Model is Smarter Than You Think

This paper proposes a way to beat the lack of generation diversity in RL without RL!

By using Markov Chain Monte Carlo’s ‘power sampling’ that reuses a base LLM’s own probabilities, it’s able to beat GRPO without training & verifiers
Media image
Actions
Visual Editor Carousel Maker NEW
Update Thread
What You Can Do
  • Download as PDF
  • Save to Notion
  • Export as Markdown
  • Visual Editor
  • LinkedIn & Instagram Carousel Maker
Create Free Account

Includes 7-day Premium trial