Salesforce AI Research launches WALT (Web Agent for Learning Tools): enabling LLM agents to automatically discover reusable tools from any website

A team of Salesforce AI researchers has launched WALT (Web Agent for Learning Tools), a framework for reverse engineering potential website functionality into reusable callable tools. It re-architects browser automation around callable tools instead of long chains of clicks. The agent then calls something like search, filter, sort, post_commentand create_listing. This reduces reliance on step-by-step inference from large language models and increases determinism in execution.

What does WALT build?

Web proxies often fail when layout changes or when tasks require long sequences. WALT targets this failure mode by mining site functionality offline and then exposing it as a tool that encapsulates navigation, selection, extraction, and optional agent steps. Tools carry contracts in the form of patterns and examples. At runtime, the agent writes a short program through some tool calls to complete the task. The design goal is to achieve higher success with fewer steps and reduce reliance on free-form reasoning.

The pipeline is divided into two phases

The pipeline was discovered, constructed and validated. During the discovery process, WALT explores the website and proposes candidate tools that correspond to common goals such as discovery, content management, and communication. During construction and validation, WALT converts traces into deterministic scripts, stabilizes selectors, attempts URL promotion where possible, introduces input patterns, and only registers tools after end-to-end checks pass. This moves as much work as possible into stable URL and form operations, and leaves the proxy base for cases where it’s really needed.

Results for VisualWebArena and WebArena

On Visual Web Arena, WALT reports an average success rate of 52.9%, with per-split results of 64.1% on Classifieds, 53.4% ​​on Shopping, and 39.0% on Reddit. The table lists baselines such as 50.2% for SGV and 33.7% for ExaCT. The average human performance is 88.7%.

On WebArena, WALT’s average utilization rate reaches 50.1% on GitLab, Map, Shopping, CMS, Reddit and Multi. The table shows that WALT leads previous methods, outperforming the best skill entry baseline by nine percentage points. Human performance is 78.2%.

efficiency and ablation

Compared to the matching agent without the tool, the tool reduced the number of operations by nearly 1.4 times on average. In taxonomic classification, ablation shows consistent gains when the tool is used on different agent backbones. The WALT success rate using GPT 5 mini increased by 7%, the number of steps was reduced by 27%, and the success rate of manual demonstration strategies was as high as 66.0%. The fully autonomous WALT reaches 64.1%, with 5% fewer steps than the human demonstration case. Multi-modal DOM parsing has definitely improved by 2.6%. External validation increased checks by 3.3%. Across components, WALT recorded 21.3% fewer steps than the baseline strategy.

Design choices that reinforce determinism

WALT prefers URL-level operations when the site exposes query parameters or routes for searching and filtering. Tool scripts insert bounded proxy steps when a page needs to be grounded dynamically, such as content extraction or waiting for the page to load. Selector stability and schema validation reduce drift when sites change. This approach keeps the proportion of agent actions in the discovered toolset low and favors deterministic actions such as navigation, input, and clicks.

Main points

  1. method: WALT discovers and validates website-native functions, then exposes them as callable tools with input modes, selector stability, and URL hoisting, reducing the brittle sequence of steps for deterministic operations.
  2. Result – VisualWebArena: The average success rate is 52.9%, including 64.1% for classified ads, 53.4% ​​for shopping, and 39.0% for Reddit, which is better than several benchmarks reported in the paper.
  3. Results – WebArena: GitLab, Map, Shopping, CMS, Reddit and Multi achieved an average success rate of 50.1%, showing consistent gains over skill induction and search-based baselines.
  4. efficiency and ablation: Toolization reduces the steps by about 1.4 times and reduces operations by 21.3% on average. The absolute success rate of multi-mode DOM parsing increased by 2.6%, and external validation increased by 3.3%.

WALT is a useful hub from step sequence agents to functional tools. The framework reverse-engineers underlying website functionality into reusable callable tools across discovery, content management, and communication. By promoting UI tracing to a deterministic tool with schema validation and URL manipulation, WALT increased the web proxy success rate to 52.9% on VisualWebArena and 50.1% on WebArena, while reducing the amount of operations by approximately 21.3%. This version comes with a CLI, walt discover, walt agentMCP serves integration.


Check Paper and GitHub page. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.


Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an artificial intelligence media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.

You may also like...