@rohanpaul_ai: LLM for financial trading/deci...
@rohanpaul_ai
60 views
Sep 19, 2025
1
LLM for financial trading/decision making.
A 4B model financial-domain model, Trading-R1, that writes clear analyst theses and turns them into trades.
Its trained on 100K cases over 18 months across 14 tickers, and its backtests show better risk-adjusted returns with smaller drawdowns.
The problem it tackles is simple, quant models are hard to read, and general LLMs write nice text that does not translate into disciplined trades.
The solution starts by forcing a strict thesis format, with separate sections for market data, fundamentals, and sentiment, and every claim must point to evidence from the given context.
Then it learns decisions by mapping outcomes into 5 labels, strong buy, buy, hold, sell, strong sell, using returns that are normalized by volatility over several horizons.
For training, it first copies high-quality reasoning distilled from stronger black-box models using supervised fine-tuning, then it improves with a reinforcement method called group relative policy optimization.
In held-out tests on NVDA, AAPL, AMZN, META, MSFT, and SPY, the combined approach beats small and large baselines on Sharpe and max drawdown, and the authors position it as research support, not high-frequency automation.
๐งต Read on ๐
A 4B model financial-domain model, Trading-R1, that writes clear analyst theses and turns them into trades.
Its trained on 100K cases over 18 months across 14 tickers, and its backtests show better risk-adjusted returns with smaller drawdowns.
The problem it tackles is simple, quant models are hard to read, and general LLMs write nice text that does not translate into disciplined trades.
The solution starts by forcing a strict thesis format, with separate sections for market data, fundamentals, and sentiment, and every claim must point to evidence from the given context.
Then it learns decisions by mapping outcomes into 5 labels, strong buy, buy, hold, sell, strong sell, using returns that are normalized by volatility over several horizons.
For training, it first copies high-quality reasoning distilled from stronger black-box models using supervised fine-tuning, then it improves with a reinforcement method called group relative policy optimization.
In held-out tests on NVDA, AAPL, AMZN, META, MSFT, and SPY, the combined approach beats small and large baselines on Sharpe and max drawdown, and the authors position it as research support, not high-frequency automation.
๐งต Read on ๐
2
๐งต2/n. The 3 steps used to train Trading-R1.
The first step is Structure. The model is taught how to write a thesis in a clear format. It must separate parts like market trends, company fundamentals, and sentiment, and it has to place each claim in the right section.
The second step is Claims. Here the model learns that any claim it makes must be supported by evidence. For example, if it says revenue is growing, it must back that with a source or number provided in the context.
The third step is Decision. The model turns the structured thesis into an actual trading action. It predicts outcomes like strong buy, buy, hold, sell, or strong sell. Its prediction is checked against the true outcome, and it gets rewards or penalties depending on accuracy.
Each step first uses supervised fine-tuning, which means training on examples with correct answers, and then reinforcement fine-tuning, which means refining the model by giving rewards when it produces better outputs.
Finally, all stages are combined, producing Trading-R1, a model that can both write well-structured financial reasoning and map that reasoning into actual trading decisions.
The first step is Structure. The model is taught how to write a thesis in a clear format. It must separate parts like market trends, company fundamentals, and sentiment, and it has to place each claim in the right section.
The second step is Claims. Here the model learns that any claim it makes must be supported by evidence. For example, if it says revenue is growing, it must back that with a source or number provided in the context.
The third step is Decision. The model turns the structured thesis into an actual trading action. It predicts outcomes like strong buy, buy, hold, sell, or strong sell. Its prediction is checked against the true outcome, and it gets rewards or penalties depending on accuracy.
Each step first uses supervised fine-tuning, which means training on examples with correct answers, and then reinforcement fine-tuning, which means refining the model by giving rewards when it produces better outputs.
Finally, all stages are combined, producing Trading-R1, a model that can both write well-structured financial reasoning and map that reasoning into actual trading decisions.
3
๐งต3/n. Three-Stage Financial Trading Model Training Pipeline
In Structure, the model learns to write in a clear format and keep sections organized.
In Claims, it learns to back every statement with quotes or sources, reducing hallucinations.
In Decision, it learns to turn the structured reasoning into buy, hold, or sell calls that are market-aware.
Each stage mixes supervised fine-tuning, reinforcement fine-tuning, and filtering of good examples to steadily improve.
In Structure, the model learns to write in a clear format and keep sections organized.
In Claims, it learns to back every statement with quotes or sources, reducing hallucinations.
In Decision, it learns to turn the structured reasoning into buy, hold, or sell calls that are market-aware.
Each stage mixes supervised fine-tuning, reinforcement fine-tuning, and filtering of good examples to steadily improve.
4
๐งต4/n. How Trading-R1 learns reasoning through distillation, i.e. transferring knowledge from stronger models into a smaller one.
In the top part, called investment thesis distillation, data from sources like news, financials, ratings, and insider info is sampled. A large reasoning model, such as GPT-4 or Qwen, generates a trading proposal. If the proposal is correct, it is kept as a training example. If not, it is rejected. This way, the smaller model learns from high-quality reasoning only.
In the bottom part, called reverse reasoning distillation, the process starts with a trading recommendation. A larger model then breaks this recommendation into reasoning factors, like competitor data, technical analysis, or insider transactions. These reasoning steps are distilled into a smaller model, which merges them into a compact but still structured form of reasoning.
Together, these two methods make sure the smaller Trading-R1 model learns both how to build a thesis from raw data and how to break down a decision into clear reasoning steps.
In the top part, called investment thesis distillation, data from sources like news, financials, ratings, and insider info is sampled. A large reasoning model, such as GPT-4 or Qwen, generates a trading proposal. If the proposal is correct, it is kept as a training example. If not, it is rejected. This way, the smaller model learns from high-quality reasoning only.
In the bottom part, called reverse reasoning distillation, the process starts with a trading recommendation. A larger model then breaks this recommendation into reasoning factors, like competitor data, technical analysis, or insider transactions. These reasoning steps are distilled into a smaller model, which merges them into a compact but still structured form of reasoning.
Together, these two methods make sure the smaller Trading-R1 model learns both how to build a thesis from raw data and how to break down a decision into clear reasoning steps.
5
๐งต5/n. How supervised fine-tuning is applied to make Trading-R1 write structured financial analysis.
The model is trained on sampled financial data that covers things like prices, filings, news, and sentiment. It learns through prompts that simulate the role of a financial analyst responding to stock analysis requests.
During training, the model produces outputs in a strict format, for example giving a buy or sell decision along with a structured thesis. The thesis is broken into key sections such as fundamentals, technical analysis, and insider transactions.
The important point is that supervised fine-tuning forces the model to always organize its reasoning in a consistent template, linking every recommendation back to clear evidence from the data.
This step makes the model reliable at producing well-structured reports instead of loose or unorganized text.
The model is trained on sampled financial data that covers things like prices, filings, news, and sentiment. It learns through prompts that simulate the role of a financial analyst responding to stock analysis requests.
During training, the model produces outputs in a strict format, for example giving a buy or sell decision along with a structured thesis. The thesis is broken into key sections such as fundamentals, technical analysis, and insider transactions.
The important point is that supervised fine-tuning forces the model to always organize its reasoning in a consistent template, linking every recommendation back to clear evidence from the data.
This step makes the model reliable at producing well-structured reports instead of loose or unorganized text.
6
๐งต6/n. How reinforcement learning is used to fine-tune Trading-R1 so its decisions match real market behavior.
The model starts with financial data such as news, filings, and sentiment. It generates a structured thesis and a transaction proposal, like strong buy, buy, hold, sell, or strong sell.
If the thesis is well-structured and the decision matches the correct market outcome, the model receives a reward. If the prediction is wrong or the reasoning is weak, it gets a penalty.
The rewards are combined from 3 parts: structure quality, evidence-based claims, and correctness of the final decision. This prevents the model from just guessing and instead pushes it to provide both sound reasoning and accurate predictions.
This step ensures the model learns to balance readable analysis with decisions that align with actual financial performance.
The model starts with financial data such as news, filings, and sentiment. It generates a structured thesis and a transaction proposal, like strong buy, buy, hold, sell, or strong sell.
If the thesis is well-structured and the decision matches the correct market outcome, the model receives a reward. If the prediction is wrong or the reasoning is weak, it gets a penalty.
The rewards are combined from 3 parts: structure quality, evidence-based claims, and correctness of the final decision. This prevents the model from just guessing and instead pushes it to provide both sound reasoning and accurate predictions.
This step ensures the model learns to balance readable analysis with decisions that align with actual financial performance.
7
Paper โ arxiv.org/abs/2509.11420
Paper Title: "Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning"
Paper Title: "Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning"





