@omarsar0: A Survey on LLMs in Scientific...
@omarsar0
8 views
May 20, 2025
2
What's the paper about?
This paper presents a conceptual framework to understand the evolving role of LLMs in scientific discovery, emphasizing their progression from task-specific tools to autonomous scientific agents.
Anchored in the stages of the scientific method, the survey proposes a three-level taxonomy, LLM as Tool, Analyst, and Scientist, and categorizes over 90 research works accordingly.
This paper presents a conceptual framework to understand the evolving role of LLMs in scientific discovery, emphasizing their progression from task-specific tools to autonomous scientific agents.
Anchored in the stages of the scientific method, the survey proposes a three-level taxonomy, LLM as Tool, Analyst, and Scientist, and categorizes over 90 research works accordingly.
3
Three Levels of Autonomy:
Tool (Level 1): LLMs automate discrete tasks (e.g., literature summarization, code snippets) with direct human supervision.
Analyst (Level 2): LLMs independently handle analytical workflows, such as statistical modeling or symbolic regression, requiring less human intervention.
Scientist (Level 3): LLMs autonomously conduct multi-stage research cycles, including hypothesis generation, experimentation, and refinement, with minimal human input.
Tool (Level 1): LLMs automate discrete tasks (e.g., literature summarization, code snippets) with direct human supervision.
Analyst (Level 2): LLMs independently handle analytical workflows, such as statistical modeling or symbolic regression, requiring less human intervention.
Scientist (Level 3): LLMs autonomously conduct multi-stage research cycles, including hypothesis generation, experimentation, and refinement, with minimal human input.
4
Mapping to the Scientific Method
The paper maps LLM applications to all six stages of the scientific method (e.g., hypothesis generation, data analysis, conclusion). The table shows a detailed breakdown of Level 1 works by task and domain.
Characteristics of Level 1 systems include:
- Operates with explicit prompts and limited autonomy
- Enhances researcher productivity in discrete tasks
- Outputs generally require human integration and validation
The paper maps LLM applications to all six stages of the scientific method (e.g., hypothesis generation, data analysis, conclusion). The table shows a detailed breakdown of Level 1 works by task and domain.
Characteristics of Level 1 systems include:
- Operates with explicit prompts and limited autonomy
- Enhances researcher productivity in discrete tasks
- Outputs generally require human integration and validation
5
Level 2
Here is the comparison and classification of Level 2 research works in LLM-based scientific discovery.
These are autonomous analytical agents that execute goal-oriented tasks with moderate human oversight.
Characteristics include:
- Capable of multi-step reasoning and data modeling
- Manages sequences of tasks (e.g., analyzing experiments, refining models)
- Requires humans mainly for goal definition and result validation
Here is the comparison and classification of Level 2 research works in LLM-based scientific discovery.
These are autonomous analytical agents that execute goal-oriented tasks with moderate human oversight.
Characteristics include:
- Capable of multi-step reasoning and data modeling
- Manages sequences of tasks (e.g., analyzing experiments, refining models)
- Requires humans mainly for goal definition and result validation
6
Level 3
Notable Level 3 systems include The AI Scientist, Agent Laboratory, and Zochi, which demonstrate autonomous literature review, idea development, experimentation, and report generation.
These systems often use agentic workflows and multi-agent feedback loop.
Unlike Level 2 systems, which require humans to define tasks or validate outputs, Level 3 systems may start from broad prompts or even operate autonomously within a domain, with human involvement limited to high-level oversight or quality control.
Notable Level 3 systems include The AI Scientist, Agent Laboratory, and Zochi, which demonstrate autonomous literature review, idea development, experimentation, and report generation.
These systems often use agentic workflows and multi-agent feedback loop.
Unlike Level 2 systems, which require humans to define tasks or validate outputs, Level 3 systems may start from broad prompts or even operate autonomously within a domain, with human involvement limited to high-level oversight or quality control.
7
Challenges and Future Directions
The authors highlight key challenges for advancing LLM-based science:
- enabling fully autonomous research cycles
- integrating robotic automation for physical experiments
- achieving transparent and interpretable reasoning
- ensuring continuous self-improvement
- addressing ethical governance and societal alignment
This paper has a comprehensive set of related works for further reading if anyone is interested in specific domains.
Paper: arxiv.org/abs/2505.13259
The authors highlight key challenges for advancing LLM-based science:
- enabling fully autonomous research cycles
- integrating robotic automation for physical experiments
- achieving transparent and interpretable reasoning
- ensuring continuous self-improvement
- addressing ethical governance and societal alignment
This paper has a comprehensive set of related works for further reading if anyone is interested in specific domains.
Paper: arxiv.org/abs/2505.13259





