Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ openai-assistant-agent
langgraph-agent
llamaindex-agent
local-llms-ollama-litellm
mlflow-gateway
instrumenting
topic-subscription-scenarios
structured-output-agent
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using MLflow AI Gateway with AutoGen\n",
"\n",
"[MLflow AI Gateway](https://mlflow.org/docs/latest/llms/gateway/index.html) is a database-backed LLM proxy built into the MLflow tracking server (MLflow ≥ 3.0). It gives you a **single OpenAI-compatible endpoint** that can route to dozens of LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more.\n",
"\n",
"Key features:\n",
"- **Multi-provider routing** — switch models without changing agent code\n",
"- **Secrets management** — provider API keys stored encrypted on the server; your application sends no provider keys\n",
"- **Fallback & retry** — automatic failover to backup models\n",
"- **Budget tracking** — per-endpoint or per-user token budgets\n",
"- **Usage tracing** — every call logged as an MLflow trace automatically\n",
"\n",
"Because MLflow Gateway speaks the OpenAI API, you can use `OpenAIChatCompletionClient` with a custom `base_url` to point any AutoGen agent at it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Prerequisites\n\n1. **Start an MLflow server:**\n ```bash\n pip install mlflow\n mlflow server --host 127.0.0.1 --port 5000\n ```\n\n2. **Create a gateway endpoint** via the MLflow UI at [http://localhost:5000](http://localhost:5000): \n Navigate to **AI Gateway → Create Endpoint**, give it a name (e.g. `my-chat-endpoint`), select a provider and model, and enter your API key (stored encrypted on the server).\n\n ![Create Endpoint UI](mlflow_gateway_images/create_endpoint.png)\n\n See the [MLflow AI Gateway documentation](https://mlflow.org/docs/latest/genai/governance/ai-gateway/) for advanced setup options including programmatic endpoint creation via the REST API."
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pip install -U 'autogen-agentchat' 'autogen-ext[openai]'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect to MLflow Gateway\n",
"\n",
"Use `OpenAIChatCompletionClient` with:\n",
"- `base_url` pointing to the MLflow Gateway OpenAI-compatible endpoint\n",
"- `model` set to your **gateway endpoint name**\n",
"- `api_key` set to any non-empty string (the gateway manages provider keys server-side)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
"\n",
"MLFLOW_GATEWAY_URL = \"http://localhost:5000\"\n",
"ENDPOINT_NAME = \"my-chat-endpoint\" # the endpoint name you created in MLflow\n",
"\n",
"model_client = OpenAIChatCompletionClient(\n",
" model=ENDPOINT_NAME,\n",
" base_url=f\"{MLFLOW_GATEWAY_URL}/gateway/openai/v1\",\n",
" api_key=\"unused\", # provider keys are stored on the MLflow server\n",
" model_capabilities={\n",
" \"json_output\": False,\n",
" \"vision\": False,\n",
" \"function_calling\": True,\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Single-turn Chat Example\n",
"\n",
"Use the model client directly to verify the connection:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from autogen_core.models import UserMessage\n",
"\n",
"result = await model_client.create(\n",
" messages=[UserMessage(content=\"What is MLflow AI Gateway?\", source=\"user\")]\n",
")\n",
"print(result.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-Agent Chat Example\n",
"\n",
"Here we create two agents — a user proxy and an assistant — and run a short conversation through MLflow Gateway."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "from autogen_agentchat.agents import AssistantAgent\nfrom autogen_agentchat.ui import Console\nfrom autogen_agentchat.teams import RoundRobinGroupChat\nfrom autogen_agentchat.conditions import MaxMessageTermination\n\n# Create the assistant using the MLflow Gateway client\nassistant = AssistantAgent(\n name=\"assistant\",\n model_client=model_client,\n system_message=\"You are a helpful AI assistant. Keep answers concise.\",\n)\n\n# Run a quick conversation\ntermination = MaxMessageTermination(max_messages=3)\nteam = RoundRobinGroupChat([assistant], termination_condition=termination)\n\nawait Console(team.run_stream(task=\"Explain LLM gateways in two sentences.\"))"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Streaming\n",
"\n",
"MLflow Gateway supports streaming. AutoGen uses streaming automatically when available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "from autogen_core.models import UserMessage\n\nasync for chunk in model_client.create_stream(\n messages=[UserMessage(content=\"Write a haiku about LLM gateways.\", source=\"user\")]\n):\n if hasattr(chunk, 'content') and chunk.content:\n print(chunk.content, end=\"\", flush=True)\n\n# Close the client after all examples are done\nawait model_client.close()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Gateway Features\n\nAll of these are configured in the MLflow UI — no code changes needed in your AutoGen application:\n\n| Feature | Description |\n|---------|-------------|\n| **Fallback** | If the primary model fails or is rate-limited, the gateway retries with a backup model automatically |\n| **Traffic splitting** | Route X% of requests to model A and Y% to model B for A/B testing |\n| **Budget tracking** | Set token/cost limits per endpoint or per user |\n| **Usage tracing** | Every call is logged as an MLflow trace — inputs, outputs, latency, token counts |\n\nYour `model=ENDPOINT_NAME` value stays the same regardless of which provider or model the gateway routes to behind the scenes.\n\n### Budget Tracking\n\n![Budget Tracking UI](mlflow_gateway_images/budget_tracking.png)\n\n### Usage Tracing\n\nThe **Usage dashboard** shows request volume, latency, and error rates at a glance:\n\n![Usage Dashboard](mlflow_gateway_images/usage_dashboard.png)\n\nThe **Logs tab** lists every traced request with its response, token counts, execution time, and status:\n\n![Usage Traces](mlflow_gateway_images/usage_traces.png)\n\nClick any trace to see the full **request and response detail**:\n\n![Trace Detail](mlflow_gateway_images/usage_trace_detail.png)"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.