AI

Microsoft Research introduces data recipes: transforming data with LLMS and creating rich visual AI applications


Most modern visual authoring tools such as chart markers, data illustrators, and Lyra, as well as libraries such as GGPLOT2 and Vegalite expect each variable to be visualized to be a column and each observation is a row. When the input data is in a neat format, the author simply needs to bind the data column to the visual channel, otherwise they need to prepare the data even if the original data is clean and contains all the information. Additionally, users must use a dedicated library (such as Tidyverse or Pandas) to convert their data, or a separate tool like Wrangler before they can create visualizations. This requirement presents two main challenges – the need for programming expertise or expertise in tools, as well as the inefficiency of workflows that constantly switch between data transformation and visualization steps.

Various methods have emerged to simplify visual creation, starting with the syntax of the graphical concept, which maps the visual element. The concise syntax and abstraction of complex implementation details of high-level syntax tools such as GGPLOT2, VEGA-LITE, and ALTAIR have gained popularity. More advanced methods include visualization through demonstration tools such as Lyra 2 and VBD, which allow users to specify visualizations through direct manipulation. Natural language interfaces such as NCNET and VISQA have also been developed to make visual creation more intuitive. However, these solutions either require organizing data input or introducing new complexity by focusing on low-level specifications similar to FALX.

The team at Microsoft Research proposed the Data Formula, an innovative visual creation tool built around a new paradigm called concept binding. It allows users to express their visual intent by binding data concepts to visual channels, where data concepts can be from existing columns or created on demand. The tool supports two ways to create new concepts: natural language prompt data derivation and example-based data reshaping. When the user selects a chart type and maps the concepts it needs, the AI ​​backend of the data recipe intrudes into the necessary data transformations and generates candidate visualizations. The system provides explanatory feedback for multiple candidates, allowing users to inspect, refine and iterate their visualization through an intuitive interface.

The architecture of the data formular revolves around the core concept of treating the concept of data as a first-class object that is an abstraction of existing and potential future table columns. This design is fundamentally different from the traditional approach, by focusing on concept-level transformations rather than table-level operators, which allows users to communicate with AI agents more intuitively and verify results. The natural language component of the tool utilizes LLMS’ ability to understand advanced intentions and natural concepts, while the programming of sample components provides precise, explicit reshaping operations through demonstrations. This hybrid architecture allows users to use familiar shelf configuration tools while accessing powerful conversion capabilities.

Evaluation of the data formular through user-tested shows promising results for task completion and availability. Participants completed all assigned visualization tasks within an average of 20 minutes, and task 6 required a lot of time due to its complexity involving 7-day moving average calculations. Although some participants required occasional tips on concept type selection and data type management, the system’s dual communication method was proven to be effective. For the concept of derivatives, users made an average of 1.62 quick attempts, using a relatively concise description (7.28 words on average), and the system generates about 1.94 candidates for each prompt. Most of the challenges encountered are secondary and are related to the interface familiarity rather than basic usability issues.

In summary, the team introduced the Data Maker, which represents a significant advance in visual creation that effectively addresses the ongoing challenges of data transformation through its concept-driven approach. The innovative combination of AI help and user interactions enables authors to create complex visualizations without having to deal with data transformations directly. User research has verified the effectiveness of the tool, showing that even users facing complex data conversion requirements can successfully create the required visualizations. Going forward, this concept-driven visualization approach shows hope to influence next-generation visual data exploration and creation tools, potentially eliminating long-term barriers to data transformation in visual creation.


Check Paper and github pages. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 75K+ ml reddit.

🚨 Recommended open source AI platform: ‘Intellagent is an open source multi-proxy framework that evaluates complex dialogue AI systems(Promotion)

Microsoft Research introduces the Data Formula: leveraging LLMS to transform data and create rich visual AI applications, first appearing on Marktechpost.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button