WorkflowLLM enables LLMs to handle 70+ action workflows, a 10x improvement over current capabilities
An LLM that can orchestrate real-world automation workflows at production scale
Original Problem 🤔:
Current LLMs can only handle small workflows with around 6 actions and simple logical structures. This falls short of real-world needs where applications like Apple Shortcuts involve 70+ actions and complex branching/looping patterns.
-----
Solution in this Paper 🛠️:
→ Created WorkflowBench - a dataset with 106,763 workflow samples covering 1,503 APIs from 83 applications
→ Collected real workflows from Apple Shortcuts and RoutineHub, converted to Python code, added hierarchical thoughts using ChatGPT
→ Used ChatGPT to generate diverse task queries and expand dataset coverage
→ Trained an annotator model on collected data to generate workflows for new queries
→ Fine-tuned Llama-3.1-8B on this dataset to create WorkflowLlama
-----
Key Insights from this Paper 💡:
→ Data quality and scale are crucial for workflow orchestration capability
→ Three-phase data construction ensures diversity and complexity
→ Hierarchical thought generation improves model understanding
→ Quality confirmation steps maintain dataset integrity
-----
Results 📊:
→ Outperformed all baselines including GPT-4
→ Handled complex workflows with 70+ actions vs 6 actions for GPT-4
→ Demonstrated strong generalization to unseen APIs and instructions
→ Achieved 77.5% F1 score on out-of-distribution T-Eval benchmark


Paper Title: "WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models"
Generated below podcast on this paper with Google's Illuminate.
Generated below podcast on this paper with Google's Illuminate.
VIDEO
Generated by Thread Navigator
Press ⌘ + S to quick-export
