✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
alphaXiv
@askalphaxiv
Your Base Model is Smarter Than You Think

This paper proposes a way to beat the lack of generation diversity in RL without RL!

By using Markov Chain Monte Carlo’s ‘power sampling’ that reuses a base LLM’s own probabilities, it’s able to beat GRPO without training & verifiers
Thread image
alphaXiv
@askalphaxiv
Thread image
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export