Qualifire AI open source Rogue: an end-to-end agent AI testing framework designed to evaluate the performance, compliance and reliability of AI agents
Agent systems are stochastic, context-sensitive, and policy-bound. Traditional quality assurance (unit tests, static prompts, or scalar “LL.M. as judge” scores) fails to expose multiple rounds of vulnerabilities and provide a weak audit trail. Developer teams need protocol-accurate conversations, clear policy checks, and machine-readable evidence that they can confidently control releases.
Qualifire AI is open source roguea Python framework for evaluating AI agents through Agent-to-Agent (A2A) protocol. Rogue transforms business policies into actionable scenarios, drives multiple rounds of interactions against target agents, and outputs deterministic reports suitable for CI/CD and compliance reviews.
Quick start
Prerequisites
- uvx – If not installed, follow the uv installation guide
- Python 3.10+
- API key for LLM provider (e.g. OpenAI, Google, Anthropic).
Install
Option 1: Quick installation (recommended)
Get up and running quickly with our automated installation script:
# TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI / CI/CD
uvx rogue-ai cli
Option 2: Manual installation
(a) Clone the repository:
git clone
cd rogue
(b) Install dependencies:
If you use UV light:
Or if you use pip:
(c) Optional: Set environment variables: Create a .env file in the root directory and add the API key. Rogue uses LiteLLM so you can set up keys for a variety of providers.
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."
running thief
Rogue runs on a client-server architecture, where the core evaluation logic runs in a backend server and various clients connect to it through different interfaces.
Default behavior
When you run uvx rogue-ai without specifying any mode, it:
- Start the Rogue server in the background
- Start the TUI (Terminal User Interface) client
Available modes
- Default (server + TUI): uvx rogue-ai – start server + TUI client in background
- server:uvx rogue-ai server – only runs the backend server
- TUI: uvx rogue-ai tui – only runs TUI client (requires running server)
- Web UI: uvx rogue-ai ui – only runs the Gradio web interface client (requires running server)
- command line interface:uvx rogue-ai cli – runs non-interactive command line assessments (requires running server, great for CI/CD)
Mode parameters
Server mode
uvx rogue-ai server [OPTIONS]
Options:
- –host HOST – The host running the server (default: 127.0.0.1 or HOST env var)
- –port PORT – The port on which the server is running (default: 8000 or PORT env var)
- –debug – Enable debug logging
TUI model
uvx rogue-ai tui [OPTIONS]
Web UI Mode
uvx rogue-ai ui [OPTIONS]
Options:
- –rogue-server-url URL – rogue server URL (default:
- –port PORT – The port on which the UI is running
- –workdir WORKDIR – working directory (default: ./.rogue)
- –debug – Enable debug logging
Example: Testing the T-Shirt Store proxy
This repository contains a simple agent example that sells T-shirts. You can use it to see Rogue in action.
Install example dependencies:
If you use UV light:
Or if you use pip:
pip install -e .[examples]
(a) Start the sample proxy server in a separate terminal:
If you use UV light:
uv run examples/tshirt_store_agent
If not:
python examples/tshirt_store_agent
This will start the agent
(b) Configure Rogue in the UI to point to the sample agent:
- Agent URL:
- Authentication: No authentication
(c) Run the assessment and watch Rogue test the T-Shirt agent’s strategy!
You can use TUI (uvx rogue-ai) or Web UI (uvx rogue-ai ui) mode.
Where Rogue fits: Real-world use cases
- Security and compliance enhancements: Use transcription-anchored evidence to verify PII/PHI handling, denial actions, confidentiality disclosure prevention, and regulatory domain policies.
- E-commerce and support agents: Enforce OTP gated discounts, chargeback rules, SLA-aware upgrades, and tool usage correctness (order lookup, ticketing) under adversarial and failure conditions.
- Developer/DevOps Agent: Evaluate code-mod and CLI copilot for workspace limitations, rollback semantics, rate limiting/rollback behavior, and unsafe command prevention.
- multi-agent system: Validate planner ↔ executor contracts, capability negotiation and pattern consistency on A2A; evaluate interoperability across heterogeneous frameworks.
- Regression and drift monitoring: Nightly suite for new model versions or prompt changes; detects behavioral deviations and enforces policy-critical passing criteria before release.
What exactly is Rogue—and why should agency development teams care?
Rogue is an end-to-end testing framework designed to evaluate the performance, compliance, and reliability of AI agents. Rogue synthesizes business context and risks into structured testing with clear goals, strategies, and success criteria. EvaluatorAgent runs protocol-correct conversations in fast single-round or deep multi-round adversarial modes. Bring your own models, or let Rogue use Qualifire’s custom SLM discriminator to drive testing. Streaming observability and deterministic artifacts: real-time recording, pass/fail decisions, fundamentals related to record span, time, and model/version lineage.
Behind the Scenes: How Rogue is Built
Rogue runs on a client-server architecture:
- Rogue server: Contains core evaluation logic
- client interface: Multiple interfaces to connect to the server:
- TUI (Terminal UI): A modern terminal interface built with Go and Bubble Tea
- web user interface: Gradient-based web interface
- command line interface: Command line interface for automated assessment and CI/CD
This architecture allows for flexible deployment and usage models; the server can run independently and multiple clients can connect to it simultaneously.
generalize
Rogue helps development teams test the behavior of agents as they actually run in a production environment. It translates written policies into concrete scenarios, walks through those scenarios through A2A, and records what happened through auditable transcripts. The result is a clear, repeatable signal that you can use in CI/CD to catch policy outages and regressions before they are released.
Thanks to the Qualifire team for providing thought leadership/resources for this article. The Qualifire team supports this content/article.
Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an AI media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.
🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.