microsoft · PattaraS · Apr 2, 2026 · Apr 9, 2026 · Apr 10, 2026
diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/index.md b/python/docs/src/user-guide/core-user-guide/cookbook/index.md
@@ -15,6 +15,7 @@ openai-assistant-agent
 langgraph-agent
 llamaindex-agent
 local-llms-ollama-litellm
+mlflow-gateway
 instrumenting
 topic-subscription-scenarios
 structured-output-agent

diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow-gateway.ipynb b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow-gateway.ipynb
@@ -0,0 +1,151 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using MLflow AI Gateway with AutoGen\n",
+    "\n",
+    "[MLflow AI Gateway](https://mlflow.org/docs/latest/llms/gateway/index.html) is a database-backed LLM proxy built into the MLflow tracking server (MLflow ≥ 3.0). It gives you a **single OpenAI-compatible endpoint** that can route to dozens of LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more.\n",
+    "\n",
+    "Key features:\n",
+    "- **Multi-provider routing** — switch models without changing agent code\n",
+    "- **Secrets management** — provider API keys stored encrypted on the server; your application sends no provider keys\n",
+    "- **Fallback & retry** — automatic failover to backup models\n",
+    "- **Budget tracking** — per-endpoint or per-user token budgets\n",
+    "- **Usage tracing** — every call logged as an MLflow trace automatically\n",
+    "\n",
+    "Because MLflow Gateway speaks the OpenAI API, you can use `OpenAIChatCompletionClient` with a custom `base_url` to point any AutoGen agent at it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "## Prerequisites\n\n1. **Start an MLflow server:**\n   ```bash\n   pip install mlflow\n   mlflow server --host 127.0.0.1 --port 5000\n   ```\n\n2. **Create a gateway endpoint** via the MLflow UI at [http://localhost:5000](http://localhost:5000):  \n   Navigate to **AI Gateway → Create Endpoint**, give it a name (e.g. `my-chat-endpoint`), select a provider and model, and enter your API key (stored encrypted on the server).\n\n   ![Create Endpoint UI](mlflow_gateway_images/create_endpoint.png)\n\n   See the [MLflow AI Gateway documentation](https://mlflow.org/docs/latest/genai/governance/ai-gateway/) for advanced setup options including programmatic endpoint creation via the REST API."
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pip install -U 'autogen-agentchat' 'autogen-ext[openai]'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Connect to MLflow Gateway\n",
+    "\n",
+    "Use `OpenAIChatCompletionClient` with:\n",
+    "- `base_url` pointing to the MLflow Gateway OpenAI-compatible endpoint\n",
+    "- `model` set to your **gateway endpoint name**\n",
+    "- `api_key` set to any non-empty string (the gateway manages provider keys server-side)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
+    "\n",
+    "MLFLOW_GATEWAY_URL = \"http://localhost:5000\"\n",
+    "ENDPOINT_NAME = \"my-chat-endpoint\"  # the endpoint name you created in MLflow\n",
+    "\n",
+    "model_client = OpenAIChatCompletionClient(\n",
+    "    model=ENDPOINT_NAME,\n",
+    "    base_url=f\"{MLFLOW_GATEWAY_URL}/gateway/openai/v1\",\n",
+    "    api_key=\"unused\",  # provider keys are stored on the MLflow server\n",
+    "    model_capabilities={\n",
+    "        \"json_output\": False,\n",
+    "        \"vision\": False,\n",
+    "        \"function_calling\": True,\n",
+    "    },\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Single-turn Chat Example\n",
+    "\n",
+    "Use the model client directly to verify the connection:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autogen_core.models import UserMessage\n",
+    "\n",
+    "result = await model_client.create(\n",
+    "    messages=[UserMessage(content=\"What is MLflow AI Gateway?\", source=\"user\")]\n",
+    ")\n",
+    "print(result.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multi-Agent Chat Example\n",
+    "\n",
+    "Here we create two agents — a user proxy and an assistant — and run a short conversation through MLflow Gateway."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "from autogen_agentchat.agents import AssistantAgent\nfrom autogen_agentchat.ui import Console\nfrom autogen_agentchat.teams import RoundRobinGroupChat\nfrom autogen_agentchat.conditions import MaxMessageTermination\n\n# Create the assistant using the MLflow Gateway client\nassistant = AssistantAgent(\n    name=\"assistant\",\n    model_client=model_client,\n    system_message=\"You are a helpful AI assistant. Keep answers concise.\",\n)\n\n# Run a quick conversation\ntermination = MaxMessageTermination(max_messages=3)\nteam = RoundRobinGroupChat([assistant], termination_condition=termination)\n\nawait Console(team.run_stream(task=\"Explain LLM gateways in two sentences.\"))"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Streaming\n",
+    "\n",
+    "MLflow Gateway supports streaming. AutoGen uses streaming automatically when available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "from autogen_core.models import UserMessage\n\nasync for chunk in model_client.create_stream(\n    messages=[UserMessage(content=\"Write a haiku about LLM gateways.\", source=\"user\")]\n):\n    if hasattr(chunk, 'content') and chunk.content:\n        print(chunk.content, end=\"\", flush=True)\n\n# Close the client after all examples are done\nawait model_client.close()"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "## Gateway Features\n\nAll of these are configured in the MLflow UI — no code changes needed in your AutoGen application:\n\n| Feature | Description |\n|---------|-------------|\n| **Fallback** | If the primary model fails or is rate-limited, the gateway retries with a backup model automatically |\n| **Traffic splitting** | Route X% of requests to model A and Y% to model B for A/B testing |\n| **Budget tracking** | Set token/cost limits per endpoint or per user |\n| **Usage tracing** | Every call is logged as an MLflow trace — inputs, outputs, latency, token counts |\n\nYour `model=ENDPOINT_NAME` value stays the same regardless of which provider or model the gateway routes to behind the scenes.\n\n### Budget Tracking\n\n![Budget Tracking UI](mlflow_gateway_images/budget_tracking.png)\n\n### Usage Tracing\n\nThe **Usage dashboard** shows request volume, latency, and error rates at a glance:\n\n![Usage Dashboard](mlflow_gateway_images/usage_dashboard.png)\n\nThe **Logs tab** lists every traced request with its response, token counts, execution time, and status:\n\n![Usage Traces](mlflow_gateway_images/usage_traces.png)\n\nClick any trace to see the full **request and response detail**:\n\n![Trace Detail](mlflow_gateway_images/usage_trace_detail.png)"
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/...c/user-guide/core-user-guide/cookbook/mlflow_gateway_images/budget_tracking.png b/...c/user-guide/core-user-guide/cookbook/mlflow_gateway_images/budget_tracking.png
diff --git a/...c/user-guide/core-user-guide/cookbook/mlflow_gateway_images/create_endpoint.png b/...c/user-guide/core-user-guide/cookbook/mlflow_gateway_images/create_endpoint.png
diff --git a/...c/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_dashboard.png b/...c/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_dashboard.png
diff --git a/...ser-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_trace_detail.png b/...ser-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_trace_detail.png
diff --git a/.../src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_traces.png b/.../src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_traces.png