MLFlow is an open source platform for managing and tracking machine learning experiments. When used with OpenAI proxy SDK, MLFLOW is automatically:
- Log all proxy interactions and API calls
- Capture tool usage, input/output messages and intermediate decisions
- Orbital operation for debugging, performance analysis and repeatability
This is especially useful when you build a multi-proxy system with multiple agents
In this tutorial, we will detail two key examples: simple handover between agents and the use of the proxy guardrail – while using MLFlow to track its behavior.
Set up dependencies
Install the library
pip install openai-agents mlflow pydantic pydotenv
OpenAI API Keys
To obtain the OpenAI API key, access and generate a new key. If you are a new user, you may need to add billing details and pay a minimum of $5 to activate API access.
After generating the key, create a .env file and enter the following:
replace
Multi-agent system (Multi_agent_demo.py)
In this script (Multi_agent_demo.py), we built a simple multi-agent assistant using the OpenAI proxy SDK, designed to route user queries to coding experts or cooking experts. We enable mlflow.openai.autolog()it automatically tracks and records all proxy interactions with OpenAI API (including input, output and proxy exchanges), making it easy to monitor and debug the system. MLFlow is configured to use a local file-based tracking URI (./mlruns) and record all activities under the experiment name “Agent coded cooking“.
import mlflow, asyncio
from agents import Agent, Runner
import os
from dotenv import load_dotenv
load_dotenv()
mlflow.openai.autolog() # Auto‑trace every OpenAI call
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Coding‑Cooking")
coding_agent = Agent(name="Coding agent",
instructions="You only answer coding questions.")
cooking_agent = Agent(name="Cooking agent",
instructions="You only answer cooking questions.")
triage_agent = Agent(
name="Triage agent",
instructions="If the request is about code, handoff to coding_agent; "
"if about cooking, handoff to cooking_agent.",
handoffs=[coding_agent, cooking_agent],
)
async def main():
res = await Runner.run(triage_agent,
input="How do I boil pasta al dente?")
print(res.final_output)
if __name__ == "__main__":
asyncio.run(main())
mlflow UI
To open the MLFlow UI and view all recorded proxy interactions, run the following command in the new terminal:
This will start the MLFlow tracking server and display a prompt indicating the URL and port that the UI can access – usually By default.
We can view the entire interactive stream track Part – From the initial input of the user to how the assistant routes the request to the appropriate proxy, and finally the response generated by that proxy. This end-to-end tracking provides valuable insights into decision making, handover and output to help you debug and optimize agent workflows.
Tracking Guardrails (Guardrails.py)
In this example, we implement a guardrail-protected customer support agent using the OpenAI proxy SDK with MLFlow tracking. The agent is designed to help users conduct general queries, but is limited to answering medical-related questions. A dedicated guardrail agent checks for such inputs and blocks the request if detected. MLFlow captures the entire traffic (including guardrail activation, inference and proxy response) to provide comprehensive traceability and insight into security mechanisms.
import mlflow, asyncio
from pydantic import BaseModel
from agents import (
Agent, Runner,
GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
input_guardrail, RunContextWrapper)
from dotenv import load_dotenv
load_dotenv()
mlflow.openai.autolog()
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Guardrails")
class MedicalSymptons(BaseModel):
medical_symptoms: bool
reasoning: str
guardrail_agent = Agent(
name="Guardrail check",
instructions="Check if the user is asking you for medical symptons.",
output_type=MedicalSymptons,
)
@input_guardrail
async def medical_guardrail(
ctx: RunContextWrapper[None], agent: Agent, input
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output,
tripwire_triggered=result.final_output.medical_symptoms,
)
agent = Agent(
name="Customer support agent",
instructions="You are a customer support agent. You help customers with their questions.",
input_guardrails=[medical_guardrail],
)
async def main():
try:
await Runner.run(agent, "Should I take aspirin if I'm having a headache?")
print("Guardrail didn't trip - this is unexpected")
except InputGuardrailTripwireTriggered:
print("Medical guardrail tripped")
if __name__ == "__main__":
asyncio.run(main())
The script defines the Customer Support Agent with an input guardrail, which detects medical-related issues. It uses a separate GuardRail_Agent to evaluate whether the user’s input contains a medical advice request. If such input is detected, the guardrail triggers and prevents the body from responding. Use MLFlow to automatically record and track the entire process, including guardrail inspections and results.
mlflow UI
To open the MLFlow UI and view all recorded proxy interactions, run the following command in the new terminal:
In this example, we ask the agent, “If I have a headache, should I take aspirin?”, which triggers the guardrail. In the MLFLOW UI, we can clearly see that the input is marked, and the reasoning provided by the guardrail agent is the reason why the request is blocked.
Check the code. All credits for this study are to the researchers on the project. Ready to connect with 1 million+ AI development/engineers/researchers? See how NVIDIA, LG AI Research and Advanced AI companies leverage Marktechpost to reach target audiences [Learn More] |

I am a civil engineering graduate in Islamic Islam in Jamia Milia New Delhi (2022) and I am very interested in data science, especially neural networks and their applications in various fields.