EncounterLocagent: Graph-based AI agent, using code localization for scalable software maintenance
Software maintenance is an integral part of the software development lifecycle, and developers often revisit existing code bases to fix bugs, implement new features and optimize performance. A key task at this stage is code localization, pointing out specific locations in the code base that must be modified. This process has gained great significance in the increase in scale and complexity of modern software projects. The growing reliance on automation and AI-driven tools has led to the integration of large language models (LLMS) into supporting tasks such as error detection, code search, and suggestions. However, despite the advances in LLM’s development and advancement in language tasks, enabling these models to understand the semantics and structure of complex code bases remains a technical challenge that researchers strive to overcome.
When it comes to problems, one of the most persistent issues in software maintenance is to accurately identify relevant parts of the code base that needs to be changed based on user-reported issues or feature requests. Typically, a description is issued in natural language and mentions the symptoms rather than the actual root cause in the code. This disconnect makes it difficult for developers and automation tools to link descriptions to exact code elements that need to be updated. Furthermore, traditional methods add up with complex code dependencies, especially when related code spans multiple files or requires hierarchical reasoning. Poor code positioning can lead to inefficient error resolution, incomplete patches and longer development cycles.
Previous methods of code localization depend primarily on intensive search models or proxy-based methods. Intensive retrieval requires embedding the entire code base into a searchable vector space, which is difficult to maintain and update large repositories. These systems often perform poorly when the problem description lacks direct references to relevant code. On the other hand, some recent approaches use proxy-based models to simulate similar human explorations of code bases. However, they usually rely on directories to traverse directories, lacking an understanding of deeper semantic connections such as inheritance or function calls. This limits their ability to handle complex relationships between unspecified linked code elements.
Researchers from Yale University, USC, Stanford University, and all the hands AI developed Locent, a graphically guided proxy framework to change code localization. The location does not depend on vocabulary matching or static embedding, but instead converts the entire code base into a directed heterogeneous graph. These diagrams include nodes about directories, files, classes, functions, and edges to capture relationships such as function calls, file imports, and class inheritance. This structure allows the proxy to reason across multiple code abstractions. The system then applies tools such as search, traversal and search terms to allow LLMS to gradually explore the system. The use of sparse hierarchical indexes ensures quick access to entities, and graph design supports multi-hop traversals, which is essential for finding connections in distant parts of the code base.
Locagent executes indexing in seconds and supports real-time usage, making it practical for developers and organizations. The researchers fine-tuned two open source models QWEN2.5-7B and QWEN2.5-32B on a set of curated successful positioning trajectories. These models perform well on standard benchmarks. For example, on the SWE-Bench-Lite dataset, Locatent achieved 92.7% file-level accuracy using QWEN2.5-32B, while Claude-3.5 hit 86.13%, while other models scored lower. On the newly introduced LOC base dataset, which is included in 660 examples of error reports (282), feature requests (203), security issues (31), and performance issues (144), Locagent once again showed competitive results, achieving 84.59% ACC@5 and 87.06% ACC@10 at the file level. Even the smaller QWEN2.5-7B model offers performance close to high-cost proprietary models, while each example costs only $0.05, in stark contrast to the $0.66 cost of the Claude-3.5.
The core mechanism relies on a detailed graph-based indexing process. Each node (whether it is a representative class or a function) is uniquely identified by a fully qualified name and indexed with BM25 for flexible keyword search. This model enables the agent to simulate a chain of inference that starts with extracting keywords related to the problem, performs through graphical traversal, and retrieves conclusions with code for specific nodes. These actions are scored according to a confidence estimation method based on a multiple iteration prediction consistency. It is worth noting that when researchers disable tools like TraverseGraph or Searchity, performance dropped by up to 18%, highlighting their importance. In addition, multi-jump reasoning is crucial. Fixing the traversal hops to one resulted in a drop in the accuracy of the functional level from 71.53% to 66.79%.
When applied to downstream tasks such as GitHub release solutions, Locagent increases the issuance rate (via @10) from 33.58% in a baseline proxyless system to 37.59% by fine-tuning the QWEN2.5-32B model. The modular and open source nature of the framework makes it a compelling solution for organizations seeking an internal alternative to commercial LLM. The introduction of Loc-Bench and its broader maintenance tasks indicate that fair assessments are ensured without contamination by pre-training data.
Some key points of research on LocaTent include the following:
- Locatent converts the code base into a heterogeneous graph of multi-layer code reasoning.
- Using QWEN2.5-32B, it achieves up to 92.7% file-level accuracy on SWE-Bench-Lite.
- Compared to proprietary models, code localization costs are reduced by about 86%. The Loc-Bench dataset was introduced, with 660 examples: 282 errors, 203 features, 31 security, 144 performance.
- The fine-tuning model (QWEN2.5-7B, QWEN2.5-32B) is comparable to Claude-3.5.
- Tools such as TraverseGraph and search ranges have proven essential, with precision reduced when disabled.
- Real-world utility proves by increasing the GITHUB issue resolution rate.
- It provides a scalable, cost-effective and effective alternative to proprietary LLM solutions.
Check Paper and github pages. All credits for this study are to the researchers on the project. Also, please keep an eye on us twitter And don’t forget to join us 85k+ ml reddit.
The post meets: Graphics-based AI proxy conversion Extensible software-maintained code localization proxy appears first on Marktechpost.