Zhipu AI unveils Computerrrl: AI framework scaling computer end-to-end enhanced learning using proxy

by admin · August 22, 2025

In the rapid development of AI-driven automation, Zhipu AI introduces ComputerrlIt is a groundbreaking framework designed to enable agents to drive and manipulate complex digital workspaces. This innovation solves the core challenge in AI agent development: disconnection between computer agents and human-designed graphical user interfaces (GUIS). By integrating programmatic API calls with direct GUI interactions, Computerrl enables more efficient and versatile desktop operations, marking an important step towards using agents for autonomous computers.

API-GUI paradigm: bridging human-machine interaction

Traditional GUI agents often struggle in an environment optimized for human users, resulting in inefficiency in actions such as clicking or scrolling. Computerrl introduces the API-GUI paradigm that combines the precision of API calls with the flexibility of GUI-based operations. This hybrid approach allows agents to leverage machine-friendly APIs to complete tasks that benefit from programming control while relying on GUI actions for broader adaptability.

The framework is built using the Large Language Model (LLMS) automation API. Users provide sample tasks, system analysis requirements, implement APIs using relevant Python libraries, and generate test cases. This process ensures that API packages are universal, reduces complexity and enhances agent performance. For example, APIs for Ubuntu applications such as GIMP and Libreoffice are integrated, enabling tasks such as image processing or document formats instead of just GUI-GUI-GUI-GUI-LAINGLY methods.

Scalable infrastructure for large-scale RL training

A major obstacle to training desktop agents is the inefficiency of virtual environments. Computerrl overcomes this with a distributed enhanced learning (RL) infrastructure built on Docker and GRPC and supports thousands of parallel Ubuntu virtual machines. This setup is compatible with benchmarks like AgentBench and addresses issues in previous systems such as resource strength and network bottlenecks.

Key features include lightweight VM deployment via QEMU-IN-IN-DOCKER, scalable multi-node clustering, and a web-based monitoring interface. Paired with the AgentRL framework, it can be trained completely asynchronously, replacing data collection from parameter updates for increased efficiency. This infrastructure allows for high-throughput RL, dynamic batch size and mitigating bias in bias, thereby facilitating a non-stagnant extended training run.

Entropulse: Enhance RL through alternating training phases

To cope with entropy crashes, this is a common problem where the proxy loses exploratory behavior during prolonged RL-computerrl incorporates entropy. On a successful launch trajectory, the method alternates the RL phase with supervised fine tuning (SFT), restoring entropy and achieving sustained performance growth.

The training pipeline begins with behavioral cloning (BC), using trajectories from multiple LLMs for diversity. It then applies stepwise relative strategy optimization (GRPO) based on rule-based rewards, assigning only positive scores to correct actions in the success trajectory. Entropulse prevents premature convergence and scales effective training steps by curating diversified high-quality data of previously launched SFTs.

Experimental verification of OSWORLD benchmark test

The team applied Computerrl to open source models such as GLM-4-9B-0414 and QWEN2.5-14B, resulting in automatic symbols. On the OSWORLD benchmark for evaluating proxying in an interactive Ubuntu environment, AutoGLM-OS-9B achieved a 48.1% success rate, surpassing proprietary models such as OpenAI’s CUA O3 (42.9%) and Claude 4.0 (30.7%). It also performed well in OSWorld verification with a score of 47.3%.

Ablation studies highlight the advantages of this framework. The API-GUI paradigm improves success rate by 134% over GUI-GUI-Baselines only, especially in office and professional fields. Training ablation showed that BC provided 31.9% baseline, and by supporting entropy exploration, the RL phase totaled up to 45.8%. The entropy curve confirms the role of entropy in maintaining learning momentum.

Case studies demonstrate the efficacy of practicality, such as creating a sales summary table in Libreoffice calculations or generating system reports through terminal commands. However, error analysis reveals challenges such as visual perception problems (25.8% of failures) and multi-application coordination (34.4%), pointing to refinement areas.

Future Instructions for Desktop Autonomy

Looking ahead, Computerrl sets the stage for a more powerful agent capable of handling dynamic environments and long-distance tasks. Potential advances include expanding training diversity, integrating multimodal perception, and developing hierarchical plans. Security features such as licensing frameworks and action verification are critical to practical deployment, ensuring consistent and trustworthy automation.

Computerrl represents a key advancement in AI proxying, blending scalable RL with innovative interaction paradigms to transform desktop intelligence. As an open model such as AutoGLM-OS push boundaries, the framework paves the way for more capable general-purpose agents in everyday computing.

Check Technical documents are here. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please feel free to follow us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

Zhipu AI unveils Computerrrl: AI framework scaling computer end-to-end enhanced learning using proxy

API-GUI paradigm: bridging human-machine interaction

Scalable infrastructure for large-scale RL training

Entropulse: Enhance RL through alternating training phases

Experimental verification of OSWORLD benchmark test

Future Instructions for Desktop Autonomy

You may also like...

live chat

Recent Posts

Zhipu AI unveils Computerrrl: AI framework scaling computer end-to-end enhanced learning using proxy

API-GUI paradigm: bridging human-machine interaction

Scalable infrastructure for large-scale RL training

Entropulse: Enhance RL through alternating training phases

Experimental verification of OSWORLD benchmark test

Future Instructions for Desktop Autonomy

You may also like...

Convert diabetes care through personalized mobile health applications

Denas Grybauskas, Chief Governance and Strategy Officer, Oxylabs – Interview Series

G quadruples reveal molecular links between telomeres and telomerase: key findings in tumor transformation, aging and regeneration therapy

live chat

Recent Posts