Qualifire AI open source Rogue: an end-to-end agent AI testing framework designed to evaluate the performance, compliance and reliability of AI agents

by admin · October 16, 2025

Agent systems are stochastic, context-sensitive, and policy-bound. Traditional quality assurance (unit tests, static prompts, or scalar “LL.M. as judge” scores) fails to expose multiple rounds of vulnerabilities and provide a weak audit trail. Developer teams need protocol-accurate conversations, clear policy checks, and machine-readable evidence that they can confidently control releases.

Qualifire AI is open source roguea Python framework for evaluating AI agents through Agent-to-Agent (A2A) protocol. Rogue transforms business policies into actionable scenarios, drives multiple rounds of interactions against target agents, and outputs deterministic reports suitable for CI/CD and compliance reviews.

Quick start

Prerequisites

uvx – If not installed, follow the uv installation guide
Python 3.10+
API key for LLM provider (e.g. OpenAI, Google, Anthropic).

Install

Option 1: Quick installation (recommended)

Get up and running quickly with our automated installation script:

# TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI / CI/CD
uvx rogue-ai cli

Option 2: Manual installation

(a) Clone the repository:

git clone 
cd rogue

(b) Install dependencies:

If you use UV light:

Or if you use pip:

(c) Optional: Set environment variables: Create a .env file in the root directory and add the API key. Rogue uses LiteLLM so you can set up keys for a variety of providers.

OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."

running thief

Rogue runs on a client-server architecture, where the core evaluation logic runs in a backend server and various clients connect to it through different interfaces.

Default behavior

When you run uvx rogue-ai without specifying any mode, it:

Start the Rogue server in the background
Start the TUI (Terminal User Interface) client

Available modes

Default (server + TUI): uvx rogue-ai – start server + TUI client in background
server:uvx rogue-ai server – only runs the backend server
TUI: uvx rogue-ai tui – only runs TUI client (requires running server)
Web UI: uvx rogue-ai ui – only runs the Gradio web interface client (requires running server)
command line interface:uvx rogue-ai cli – runs non-interactive command line assessments (requires running server, great for CI/CD)

Mode parameters

Server mode

uvx rogue-ai server [OPTIONS]

Options:

–host HOST – The host running the server (default: 127.0.0.1 or HOST env var)
–port PORT – The port on which the server is running (default: 8000 or PORT env var)
–debug – Enable debug logging

TUI model

uvx rogue-ai tui [OPTIONS]
Web UI Mode
uvx rogue-ai ui [OPTIONS]

Options:

–rogue-server-url URL – rogue server URL (default:
–port PORT – The port on which the UI is running
–workdir WORKDIR – working directory (default: ./.rogue)
–debug – Enable debug logging

Example: Testing the T-Shirt Store proxy

This repository contains a simple agent example that sells T-shirts. You can use it to see Rogue in action.

Install example dependencies:

If you use UV light:

Or if you use pip:

pip install -e .[examples]

(a) Start the sample proxy server in a separate terminal:

If you use UV light:

uv run examples/tshirt_store_agent

If not:

python examples/tshirt_store_agent

This will start the agent

(b) Configure Rogue in the UI to point to the sample agent:

Agent URL:
Authentication: No authentication

(c) Run the assessment and watch Rogue test the T-Shirt agent’s strategy!

You can use TUI (uvx rogue-ai) or Web UI (uvx rogue-ai ui) mode.

Where Rogue fits: Real-world use cases

Security and compliance enhancements: Use transcription-anchored evidence to verify PII/PHI handling, denial actions, confidentiality disclosure prevention, and regulatory domain policies.
E-commerce and support agents: Enforce OTP gated discounts, chargeback rules, SLA-aware upgrades, and tool usage correctness (order lookup, ticketing) under adversarial and failure conditions.
Developer/DevOps Agent: Evaluate code-mod and CLI copilot for workspace limitations, rollback semantics, rate limiting/rollback behavior, and unsafe command prevention.
multi-agent system: Validate planner ↔ executor contracts, capability negotiation and pattern consistency on A2A; evaluate interoperability across heterogeneous frameworks.
Regression and drift monitoring: Nightly suite for new model versions or prompt changes; detects behavioral deviations and enforces policy-critical passing criteria before release.

What exactly is Rogue—and why should agency development teams care?

Rogue is an end-to-end testing framework designed to evaluate the performance, compliance, and reliability of AI agents. Rogue synthesizes business context and risks into structured testing with clear goals, strategies, and success criteria. EvaluatorAgent runs protocol-correct conversations in fast single-round or deep multi-round adversarial modes. Bring your own models, or let Rogue use Qualifire’s custom SLM discriminator to drive testing. Streaming observability and deterministic artifacts: real-time recording, pass/fail decisions, fundamentals related to record span, time, and model/version lineage.

Behind the Scenes: How Rogue is Built

Rogue runs on a client-server architecture:

Rogue server: Contains core evaluation logic
client interface: Multiple interfaces to connect to the server:
- TUI (Terminal UI): A modern terminal interface built with Go and Bubble Tea
- web user interface: Gradient-based web interface
- command line interface: Command line interface for automated assessment and CI/CD

This architecture allows for flexible deployment and usage models; the server can run independently and multiple clients can connect to it simultaneously.

generalize

Rogue helps development teams test the behavior of agents as they actually run in a production environment. It translates written policies into concrete scenarios, walks through those scenarios through A2A, and records what happened through auditable transcripts. The result is a clear, repeatable signal that you can use in CI/CD to catch policy outages and regressions before they are released.

Thanks to the Qualifire team for providing thought leadership/resources for this article. The Qualifire team supports this content/article.

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an AI media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.

Qualifire AI open source Rogue: an end-to-end agent AI testing framework designed to evaluate the performance, compliance and reliability of AI agents

Quick start

Prerequisites

Install

Option 1: Quick installation (recommended)

Option 2: Manual installation

running thief

Default behavior

Available modes

Mode parameters

Server mode

Where Rogue fits: Real-world use cases

What exactly is Rogue—and why should agency development teams care?

Behind the Scenes: How Rogue is Built

generalize

You may also like...

live chat

Recent Posts

Qualifire AI open source Rogue: an end-to-end agent AI testing framework designed to evaluate the performance, compliance and reliability of AI agents

Quick start

Prerequisites

Install

Option 1: Quick installation (recommended)

Option 2: Manual installation

running thief

Default behavior

Available modes

Mode parameters

Server mode

Where Rogue fits: Real-world use cases

What exactly is Rogue—and why should agency development teams care?

Behind the Scenes: How Rogue is Built

generalize

You may also like...

How Google’s AI unlocks the secrets of dolphin communication

China’s new vision started in space has been created for 39 years

Right column placement chaos – Jon Loomer number

live chat

Recent Posts