Meet COACT-1: a new multi-mechanical system that can collaborate with GUI-based control and direct programming execution

by admin · August 8, 2025

A team of researchers from USC, Salesforce AI and the University of Washington introduced COACT-1, a groundbreaking multi-agent computer agent (CUA), marking a significant leap in automated computer operations. By improving coding Taking first-class action under traditional GUI manipulation, COACT-1 overcomes the long-term challenges of efficiency and reliability in complex long-horse computer tasks. In demanding OSWORLD benchmarks, Coact-1 sets a new gold standard to achieve The most advanced (SOTA) success rate is 60.76%making it the first CUA proxy to exceed 60% mark.

Why COCACT-1? Bridging the efficiency gap between computer agents

Conventional CUA proxy relies solely on pixel-based GUI interactions – emits human users through clicking, typing, and navigation interfaces. Although this approach mimics user workflows, it turns out to be fragile and inefficient for complex multi-step tasks, especially involving dense UI layouts, multi-application pipelines, or complex OS operations. A single error (for example, click) can derail the entire workflow and the sequence length gradually grows as the task increases.

Efforts to mitigate these issues include the GUI proxy for senior planners seen in systems such as GTA-1 and modular multi-agent frameworks. However, these methods cannot escape the bottleneck of GUI-centric action space, ultimately limiting efficiency and robustness.

COACT-1: Hybrid architecture used as operation

COCACT-1 adopts a fundamentally different approach by integrating three professional agents:

orchestra: Decompose complex tasks and dynamically delegate them to senior planners who programmers or GUI operators according to task requirements.
programmer: Perform back-end operations – file management, data processing, environment configuration – bypassing troublesome GUI action sequences directly through Python or Bash scripts.
GUI Operator: When using visual models to interact with visual interfaces, human-like UI navigation is essential.

this Mixed Model Enable Coact-1 strategically replaces brittle and lengthy mouse and keyboard operations with concise, reliable code execution while still leveraging GUI interactions when necessary.

OSWorld evaluation: record performance

OSWORLD – A leading benchmark with 369 tasks covering office productivity, IDE, browser, file manager and multi-application workflows, providing precise testing for proxy systems. Each task reflects real-world language objectives and is evaluated through a scoring system based on granular rules.

result

Overall SOTA success rate: COACT-1 Achievement 60.76% In 100+ step categoriesThe first CUA proxy that crosses the 60-point threshold. This outperforms GTA-1 (53.10%), OpenAI CUA 4O (31.40%), UI-TARS-1.5 (29.60%) and other leading frameworks.
Performance of ladder allowance: With a 100-step budget, Coact-1 scored 59.93%, leading all competitors again.
efficiency: Complete tasks evenly 10.15 Steps for each successful taskby comparison, GTA-1 is 15.22, UI-TARS is 14.90, and is higher than Openai Cua 4o, with only 31.40% success despite fewer steps (6.14).

break down

COACT-1 dominates across task types, especially in workflows that benefit from code execution:

Multiple applications: 47.88% (38.34% for GTA-1)
Operating system tasks: 75.00%
VLC: 66.07%
In the productivity and IDE domains (Libreoffice calc, Writer, VScode), it is always led or connected with SOTA.

Key Insights: What drives the benefits of Coact-1?

Encoding operations replace redundant GUI sequences: For operations like batch image resize or advanced file operations, a single script replaces dozens of error-prone clicks, reducing the steps and risks of failure.
Dynamic delegation: Flexible task allocation for orchestration ensures optimal use of coding and GUI actions.
Improve with stronger backbone: Best configuration for GUI operators using OpenAI CUA 4O, O4-Mini for cataloging OpenAI O3 and programmer, earning a score of up to 60.76%. System scores that use only smaller or less capable backbones have significantly reduced.
Efficiency is related to reliability: Fewer steps directly reduce the chance of error, which is the strongest predictor of successful completion.

Conclusion: A leap in generalized computer automation

By encoding First-class system action In addition to GUI manipulation, COACT-1 can both achieve success and efficiency quantum leaps, and illustrate practical ways to scale, reliable autonomous computer agents. Its hybrid architecture and dynamic execution logic set a new high-water mark for the CUA field and boosted strong advancements in real-world computer automation.

Check Paper and Technical details. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

Meet COACT-1: a new multi-mechanical system that can collaborate with GUI-based control and direct programming execution

Why COCACT-1? Bridging the efficiency gap between computer agents