Researchers at UCLA, UC Merced and Adobe propose Metal: A multi-agent framework that divides the task of graph generation into iterative collaborations among professional agents

In today’s world of data visualization, creating charts that accurately reflect complex data remains a nuanced challenge. Often, tasks involve not only capturing precise layouts, colors, and text placement, but also converting these visual details into code that reproduces the intended design. Traditional methods rely on direct prompt visual models (VLM), such as GPT-4V, often encounter difficulties when converting complex visual elements into syntactic correct Python code. The process requires both strong visual design sensitivity and careful coding – both areas, even small differences can lead to charts that fail to meet their design goals. These challenges are particularly important in areas such as financial analysis, academic research and educational reporting, where the clarity and accuracy of data representations are critical.
Metal: Thoughtful multi-agent framework
Researchers from UCLA, UC Merced and Adobe Research have proposed a new framework called Metal. The system divides the chart generation task into a series of centralized steps managed by professional agents. Metal includes four key agents: the generation agent that produces the initial Python code; the visual critic agent, which evaluates the generated chart based on reference; the code criticism agent, which reviews the underlying code; and the revision agent, which improves the code based on the feedback received. By assigning these roles to agents, Metal can adopt a more deliberate iterative approach to craft creation. This structured approach helps ensure careful consideration of the visual and technical elements of the chart and adjustments to the chart, resulting in a more faithful reflection of the original reference.

Technical insights and practical benefits
One of the notable features of metal is its modular design. Rather than expecting a single model to handle visual interpretation and code generation simultaneously, the framework allocates these responsibilities among dedicated agents. The generation agent first converts the visual information into a preliminary Python instruction. The visual critic agent then scrutinized the rendered chart and determined the differences in design elements such as layout or color fidelity. Meanwhile, the code criticizes the agent for checking the generated code to capture any syntactic errors or logical problems that may undermine the accuracy of the chart. Finally, the revision agent considered the feedback from both criticizing agents and adjusted the code accordingly.
Another notable aspect of metal is its method of extending resources when testing. The performance of the framework is observed to improve in a near linear manner and the increase in the logarithmic budget, from 512 to 8192 tokens. This relationship means that the framework can produce more refined output when other computing resources are available. By iteratively refine the code and charts, metal achieves an improved level of accuracy without sacrificing clarity or detail.

Experimental insights and measurement results
The properties of metals have been evaluated on a chart media dataset that contains carefully planned graph representations and their corresponding generation instructions. The evaluation focuses on key aspects such as text clarity, chart type accuracy, color consistency, and layout accuracy. In comparison with more traditional methods such as direct prompts and enhanced prompt methods, metals have improved in replicating reference charts. For example, when tested on an open source model such as Llama 3.2-11b, the output generated by metal is closer to the reference graph than the graph generated by traditional methods. Similar patterns were observed using GPT-4O (e.g. GPT-4O), where the refinement of increments results in both accurate and visually consistent output.
Further analysis involving ablation studies highlights the importance of different critical mechanisms that maintain visual and code. Performance tends to decline when these ingredients are combined into a single review agent. This observation shows that tailored approaches (nuances of visual design and code correctness are addressed separately) play a key role in ensuring high-quality graph generation.

Conclusion: A measure method to enhance graph generation
In summary, metal provides a balanced, multi-agent approach to the challenges generated by charts by breaking down tasks into professional, iterative steps. Instead of relying on a single model to manage the artistic and technical dimensions of tasks, metal distributes workloads among agents dedicated to power generation, visual criticism, code criticism, and revision. This approach not only helps to convert visual designs more carefully into Python code, but also allows for system error detection and correction processes.
Furthermore, the framework has no potential in environments where accuracy is critical through increased computing resources (through its near-linear scaling and illustration sales of other tokens). Although there is still room for optimization, especially when reducing computational overhead and further adjusting timely engineering, metal represents a thoughtful step. Its focus is on measuring, iterative improvement processes that make it a promising tool for reliable graph generation critical applications.
Check Paper, code and project pages. All credits for this study are to the researchers on the project. Also, please keep an eye on us twitter And don’t forget to join us 80k+ ml subcolumn count.
Recommended Reading – LG AI Research Unleashes Nexus: An Advanced System Integration Agent AI Systems and Data Compliance Standards to Address Legal Issues in AI Datasets
Postal researchers at UCLA, UC Merced and Adobe proposed Metal: a multi-agent framework that divides the task of graph generation into iterative collaborations between dedicated agents, which first appeared on Marktechpost.