Laiye OpenAPA Achieves 78.3% on OSWorld Benchmark , Ranking #1 in Agentic Framework

Laiye has achieved a breakthrough milestone in the Agentic Framework category of OSWorld, the authoritative benchmark for Computer Use Agents. Laiye OpenAPA scored 78.3%, ranking #1 globally in Agentic framework. Laiye OpenAPA is now available as open-source on GitHub.

What Is OSWorld?
OSWorld is the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across operating systems. Developed by researchers from HKUNLP (University of Hong Kong), CMU (Carnegie Mellon University), and University of Waterloo, it has become the gold standard for evaluating AI agents' ability to operate computers.
Definition: OSWorld is a benchmark environment that tests AI agents on 361 real-world computer tasks using real Ubuntu and Windows systems with authentic applications like Chrome, VS Code, LibreOffice, and Thunderbird—not simulations or sandboxed versions.
When OpenAI, Anthropic, and Google release new models, they use OSWorld as their official benchmark. The benchmark's authority comes from three key characteristics:
- Real Environment:Tests run on actual Ubuntu/Windows systems with real applications (Chrome, VS Code, LibreOffice, Thunderbird)
- Real Tasks:361 tasks designed by human experts covering office work, programming, browsing, design, and system administration
- Objective Scoring:Each task includes executable validation scripts—success is determined automatically, not by human judgment
A Real OSWorld Task Example
Consider this representative task:
Task: "There's an email in the inbox containing the December AWS bill. Extract the PDF attachment and save it to the receipts folder, following the naming convention of existing files. Then add a corresponding entry to the tally book."
Even a skilled office worker would need significant time to complete this. For an AI agent, the challenge involves:
- 60+ Sequential Operations: Opening email client, locating the message, downloading attachment, checking existing file naming patterns, renaming the file, opening the ledger, finding the correct sheet/row/column, and formatting the entry correctly—any error cascades to failure
- Continuous Reasoning: The agent must comprehend the bill content, deduce naming patterns from existing files, and understand Excel row/column structures—not simply execute preset scripts
This is just one of 361 tasks. Every percentage point improvement represents substantial engineering and algorithmic breakthroughs.
Two Technical Approaches: Where OpenAPA Stands
OSWorld features two dominant technical approaches:
Approach 1: Specialized Model Route
Train large-scale models on GUI operation data to create "computer-operating" specialized models, paired with lightweight execution layers.
Approach 2: General Model + Agentic Framework Route
Use general-purpose LLMs (Gemini, Claude, GPT) and drive task completion through framework design, planning capabilities, multi-agent collaboration, and context engineering.
Key Trade-offs:
- Specialized models offer greater proficiency in specific capabilities
- Agentic frameworks provide superior transferability, composability, and controllability—the same framework automatically benefits from underlying model improvements and adapts better to enterprise customization needs
OpenAPA achieved 78.3% using Approach 2, securing the #1 position globally. This demonstrates that through architecture and engineering innovation alone—without relying on specially trained models—agent frameworks with general models can reach world-class performance levels.
Key Innovations Behind OpenAPA's Success
How did OpenAPA achieve leading performance using only "general models + agent framework"? The answer lies in its architecture:
1. Hierarchical Planning + Dynamic Reflection
Initial planning defines "what to do" without rigidly locking in "how to do it." The reflection module recalibrates based on latest screenshots at each step, effectively preventing "drift" in long-horizon tasks.
2. Coding Agent and GUI Agent Collaboration
Coding Agent handles programmatic work (numerical calculations, data cleaning, file parsing) while GUI Agent focuses on visual understanding and execution. They validate each other and share knowledge, balancing efficiency with robustness.
3. Context Engineering for Long-Horizon Tasks
Through sliding window + token budget mechanisms, OpenAPA dynamically retains recent critical screenshots and reasoning traces, enabling stable operation on 100+ step tasks with 60%+ reduction in token consumption.
4. Decoupled Reasoning and Localization Models
The main reasoning model handles task understanding and decision-making, while a dedicated vision model manages pixel-level coordinate localization. "Thinking" and "seeing" each focus on their specialty, preventing compromise from a single model handling both.
These designs pursue one goal: enabling "general models + general frameworks" to achieve reliability previously only possible with specialized systems in real-world complex tasks.
From RPA to APA to OpenAPA: The Evolution Path
Enterprise process automation follows a clear evolution:
Laiye recently upgraded RPA to APA by integrating agent capabilities across the entire automation lifecycle—development, execution, and maintenance. This makes enterprise automation more intelligent, flexible, and accessible. Processes previously requiring repeated IT team configuration can now be autonomously completed by agents based on objectives, significantly reducing deployment and change barriers.
OpenAPA explores the next critical evolution path for APA—the "vision-driven, semantic understanding, autonomous planning, self-healing" paradigm centered on Computer Use Agents. Unlike fixed interfaces or scripts, it operates like a human: viewing screens, making judgments, executing actions. This brings stronger interface understanding, task planning, and process self-healing capabilities to APA.
Laiye has proven the engineering value of "agents × process automation" through APA, while OpenAPA continuously injects more flexible, intelligent capabilities into future APA products at the frontier. Together, they represent Laiye's commitment to "next-generation enterprise automation."
Open-Sourced: Available Now
Computer Use Agent development remains in early stages. That's why we're open-sourcing OpenAPA—inviting community developers to explore, grow, and advance Computer Use Agent technology together.
🔗 GitHub Repository: https://github.com/laiye-ai/open-apa
When AI learns to see screens, enterprise process automation will evolve beyond "rule-based execution" toward "goal-driven autonomous completion." Laiye is committed to being a continuous driver and companion on this evolution path.
Frequently Asked Questions (FAQ)
What is OSWorld?
OSWorld is a benchmark environment for evaluating AI agents' ability to operate real computers. It features 361 real-world tasks running on actual Ubuntu and Windows systems with authentic applications like Chrome, VS Code, and LibreOffice. Developed by HKUNLP, CMU, and University of Waterloo researchers, it's considered the gold standard for Computer Use Agent evaluation.
What does the 78.3% score mean?
The 78.3% score means OpenAPA successfully completed 78.3% of the 361 OSWorld benchmark tasks. In the Agentic Framework category (using general models with agent architectures rather than specialized models), this represents the highest score globally as of the benchmark date.
What is the difference between Specialized Models and Agentic Frameworks?
Specialized Models are trained specifically on GUI operations and paired with lightweight execution layers. Agentic Frameworks use general-purpose LLMs (like GPT, Claude, Gemini) and rely on architecture design, planning, and multi-agent collaboration to complete tasks. Agentic Frameworks offer better transferability, composability, and adaptability to enterprise needs.
What makes OpenAPA different from other Computer Use Agent frameworks?
OpenAPA's key innovations include: (1) Hierarchical planning with dynamic reflection for long-horizon tasks; (2) Coding Agent + GUI Agent collaboration for efficiency and robustness; (3) Context engineering reducing token consumption by 60%+; and (4) Decoupled reasoning and localization models for specialized performance.
How does OpenAPA relate to Laiye's APA platform?
APA (Agentic Process Automation) is Laiye's enterprise platform combining agents with process automation. OpenAPA represents the research and architectural innovations that will inform future APA capabilities—particularly the "vision-driven, semantic understanding, autonomous planning" paradigm for screen-based automation.
Is OpenAPA free to use?
Yes. OpenAPA is open-source and freely available under the repository at https://github.com/laiye-ai/open-apa. Laiye encourages community contributions and collaboration to advance Computer Use Agent technology.
What types of tasks can OpenAPA handle?
OpenAPA excels at complex, multi-step computer tasks such as: email processing with attachment handling, file organization and renaming, data entry across applications, web browsing and information extraction, document processing, and cross-application workflows requiring reasoning and adaptation.



