@DrJimFan: That a second paper dropped ...

4 views Jan 22, 2025

That a *second* paper dropped with tons of RL flywheel secrets and *multimodal* o1-style reasoning is not on my bingo card today. Kimi's (another startup) and DeepSeek's papers remarkably converged on similar findings:

> No need for complex tree search like MCTS. Just linearize the thought trace and do good old autoregressive prediction;
> No need for value functions that require another expensive copy of the model;
> No need for dense reward modeling. Rely as much as possible on groundtruth, end result.

Differences:

> DeepSeek does AlphaZero approach - purely bootstrap through RL w/o human input, i.e. "cold start". Kimi does AlphaGo-Master approach: light SFT to warm up through prompt-engineered CoT traces.
> DeepSeek weights are MIT license (thought leadership!); Kimi does not have a model release yet.
> Kimi shows strong multimodal performance (!) on benchmarks like MathVista, which requires visual understanding of geometry, IQ tests, etc.
> Kimi paper has a LOT more details on the system design: RL infrastructure, hybrid cluster, code sandbox, parallelism strategies; and learning details: long context, CoT compression, curriculum, sampling strategy, test case generation, etc.

Upbeat reads on a holiday!

Whitepaper link: github.com/MoonshotAI/Kim…

@DrJimFan: That a second paper dropped ...

Actions

What You Can Do