0

AEGISLLM: Extend LLM security through an adaptive multi-proxy system during inference

LLM’s threat landscape is growing

LLM is a key target for rapid development of attacks, including timely injection, jailbreaking and sensitive data peeling. Due to the liquid nature of these threats, it is necessary to adapt to defense mechanisms beyond static safeguards. Current LLM security technologies are affected by relying on static training time intervention. Static filters and guardrails are vulnerable for smaller adversarial adjustments, while training time adjustments cannot be generalized to invisible attacks after deployment. Machine learning usually fails to completely eliminate knowledge, while sensitive information can easily surface. Current security extensions focus mainly on training time methods, and there is limited exploration of test time and system-level security.

Why are the existing LLM security methods insufficient

RLHF and security fine-tuning methods attempt to align models during training, but have limited effectiveness for novel post-deployment episodes. System-level guardrails and red team strategies provide additional layer of protection, but prove to be brittle against perturbations. In certain cases, unsafe behavior shows hope but cannot achieve complete suppression of knowledge. Multi-agent architectures can effectively distribute complex tasks, but their direct application on LLM Security is still unexplored. Proxy optimization methods such as TextGrad and Opto use structured feedback for iterative improvements, and DSPY facilitates timely optimization of multi-stage pipelines. However, they are not systematically applied to security enhancements when reasoning.

AEGISLLM: Adaptive Inference Time Security Framework

Researchers at the University of Maryland, Lawrence Livermore National Laboratory and Capital One have proposed AEGISLLM (Adaptive Agent Guardrail for LLM Security), a framework for improving LLM security through a collaborative, inference multi-agent system. It utilizes a structured proxy system of LLM-driven autonomous agents that continuously monitor, analyze and reduce adversarial threats. A key component of AEGISLLM is the coordinator, deflector, responder and evaluator. With automatic and timely optimization and Bayesian learning, the system can improve its defense capabilities without model retraining. This architecture allows for real-time adaptation to continuously evolve attack strategies, providing scalable inference time security while retaining the model’s utility.

Coordinated proxy pipelines and timely optimization

AEGISLLM operates through a coordinated pipeline of professional agents, where everyone is responsible for different functions when working together to ensure output security. All agents are guided by well-designed system prompts and user input. Each agent is governed by a system prompt that encodes its professional role and behavior, but manual prompts usually do not perform optimally in high-risk security schemes. Therefore, the system will automatically optimize the system prompts for each agent to maximize effectiveness through the iterative optimization process. In each iteration, the system samples a batch of queries and evaluates them using a candidate prompt configuration for a specific agent.

Benchmark Aegisllm: Defense for WMDP, Tofu and Jailbreak

On WMDP benchmarks using Llama-3-8b, AEGISLLM achieved the minimum accuracy of limiting topics in all methods, with WMDP-Cyber and WMDP-BIO accuracy reaching a theoretical minimum of 25%. In the tofu benchmark, it achieved almost perfect marking accuracy on the Llama-3-8B, QWEN2.5-72B and DEEPSEEK-R1 models, with the accuracy of almost 100% for all subsets. In the jailbreak defense, the results show strong performance against attack attempts while maintaining an appropriate response to strong censorship and legitimate inquiries of phtest. AEGISLLM scored a strong point of 0.038, with the competitiveness of the most advanced approach and an 88.5% compliance rate without extensive training, thus improving defensive capabilities.

Conclusion: Remark LLM security as proxy inference time coordination

In summary, the researchers introduced AEGISLLM, a framework that rebuilt LLM security into a dynamic multi-agent system that runs at inference. The success of AEGISLLM emphasizes that people should regard security as an emergency behavior rather than a coordinated professional agent, rather than a characteristic of a static model. From static, training time intervention to adaptive transition, the reasoning time defense mechanism addresses the limitations of the current approach while providing real-time adaptability to evolving threats. As language models continue to improve capabilities, enabling dynamic, scalable security frameworks like AEGISLLM will become increasingly important for responsible AI deployments.


Check Paper and Github page. All credits for this study are to the researchers on the project.

Sponsorship Opportunities
Attract the most influential AI developers worldwide. 1M+ monthly readers, 500K+ community builders, unlimited possibilities. [Explore Sponsorship]


Sajjad Ansari is a final year undergraduate student from IIT Kharagpur. As a technology enthusiast, he delves into the practical application of AI, focusing on understanding AI technology and its real-world impact. He aims to express complex AI concepts in a clear and easy way.