LG AI Research Unleashes Nexus: Advanced System Integration Agent AI Systems and Data Compliance Standards to Address Legal Issues in AI Datasets

After the emergence of LLMS, AI research only focused on the development of powerful daily models. These cutting-edge new models can improve users’ experience in various inference, content generation tasks, and more. However, trust in the results and the basic reasoning used by these models has been in the spotlight lately. When developing these models, the quality of the data, its compliance and associated legal risks have become key issues, as the output of the model depends on the underlying dataset.
LG AI Research is a pioneer in the field of AI and has previously successfully launched the Exaone model and has developed Agent AI to solve the above problems. The life cycle of the proxy AI tracks the training dataset will be used to AI models to comprehensively analyze legal risks and evaluate potential threats related to the dataset. LG AI Research Also introduced relationusers can directly explore the results generated by the proxy AI system.
LG AI Research Focus on training data for AI models. This is worrying, as AI has been rapidly expanding to various departments, with the biggest concern being its legal, security and ethical advancements. Through this study, LG AI Research It was found that AI training datasets have been reassigned several times and sometimes linked to hundreds of datasets, making it impossible for humans to track their origins. Lack of transparency can lead to some serious legal and compliance risks.
LG AI research ensures data compliance by providing proxy AI embedded in Nexus that is tracking the life cycle of complex data sets. The team through its powerful Agent AI can automatically find and analyze complex layer and dataset relationships. They developed the proxy AI system using a comprehensive data compliance framework and its Exaone 3.5 model. The proxy AI system consists of three core modules, each of which is fine-tuned:
- Navigation Module: This module is extensively trained to navigate web documents and analyze text data generated by AI. It performs navigation based on the name and type of the entity to find links to web pages or license files related to the entity.
- QA Module: In this module, the model is trained to take the collected documents as input and extract dependency and licensing information from the documents.
- Scoring module: Finally, it is trained using a refined dataset marked by attorneys that analyzes license details along with the entity’s metadata to assess and quantify potential legal risks.
Through this powerful development, Agent AI is 45 times faster than human experts and is less than 700 times faster.
Other notable results include: When 216 randomly selected datasets were evaluated from the top 1,000 downloads that embrace Face, Agent AI accurately detected the dependencies by about 81.04% and identified 95.83% of the license files.

In this proxy AI, the legal risk assessment of the data set is based on the data compliance framework developed by LG AI. This data compliance framework uses 18 key factors: license, data modification rights, derivative works permissions, potential copyright infringement of outputs, and privacy considerations. Each factor is weighted based on real-world disputes and case law to ensure a practical, reliable risk assessment. after, Data compliance results are divided into a seven-level risk rating system where A-1 is the highest and requires clear commercial use licenses or public domain status, as well as consistent rights to all sub-databases. A-2 to B-2 allow limited use and can usually be studied but commercially restricted. C-1 to C-2 pose higher risks due to licensing, rights issues or privacy issues.
Nexus’ research sets new standards for legal stability in AI training datasets. LG AI research concept has a long way to go. They conducted in-depth analysis of 3,612 major data sets through Nexus and found that the inconsistency between the rights relationship between the data set and the dependency relationship is much higher than expected. There are many inconsistent datasets in these datasets for the major AI models that are widely used. For example, in identifying the 2,852 AI training datasets that are commercially available, only 605 (21.21%) are still available for commercial use after considering dependency risks.
Recognizing these real-world problems, LG AI research has several future goals for developing AI technology and legal environments. The first direct goal is to expand the scope and depth of data sets analyzed by AI technology, aiming to understand the life cycle of all data around the world and to maintain assessment quality and results throughout the expansion. Another vision is to develop the data compliance framework into a global standard. LG AI Research Plans are designed to work with global AI communities and legal experts to develop these standards into international standards. Finally, in the long run LG AI Research Planned evolution relation Becoming a comprehensive legal risk management system for AI developers helps create a secure, legal, data-compliant and responsible AI ecosystem.
source:
- LG Agent AI Research Paper
- relation
- LG AI Research LinkedIn Page
- Exaone 3.5 Blog
Thanks to the LG AI Research Team for its thought leadership/resources in this article. The LG AI research team supports us in this content/article.
LG AI research post releases Nexus: An advanced system integration proxy AI systems and data compliance standards to address legal issues in AI datasets, which first appeared on Marktechpost.