@askalphaxiv: Your Base Model is Smarter Tha...@askalphaxiv 10 views Oct 25, 2025 1 Your Base Model is Smarter Than You ThinkThis paper proposes a way to beat the lack of generation diversity in RL without RL!By using Markov Chain Monte Carlo’s ‘power sampling’ that reuses a base LLM’s own probabilities, it’s able to beat GRPO without training & verifiers 2 alphaxiv.org/pdf/2510.14901Save this thread — create a free accountSave this thread Sign Up