diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/index.md b/python/docs/src/user-guide/core-user-guide/cookbook/index.md index f93f1ba83495..73b2a4192267 100644 --- a/python/docs/src/user-guide/core-user-guide/cookbook/index.md +++ b/python/docs/src/user-guide/core-user-guide/cookbook/index.md @@ -15,6 +15,7 @@ openai-assistant-agent langgraph-agent llamaindex-agent local-llms-ollama-litellm +mlflow-gateway instrumenting topic-subscription-scenarios structured-output-agent diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow-gateway.ipynb b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow-gateway.ipynb new file mode 100644 index 000000000000..c6bcd54e2d1a --- /dev/null +++ b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow-gateway.ipynb @@ -0,0 +1,151 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using MLflow AI Gateway with AutoGen\n", + "\n", + "[MLflow AI Gateway](https://mlflow.org/docs/latest/llms/gateway/index.html) is a database-backed LLM proxy built into the MLflow tracking server (MLflow ≥ 3.0). It gives you a **single OpenAI-compatible endpoint** that can route to dozens of LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more.\n", + "\n", + "Key features:\n", + "- **Multi-provider routing** — switch models without changing agent code\n", + "- **Secrets management** — provider API keys stored encrypted on the server; your application sends no provider keys\n", + "- **Fallback & retry** — automatic failover to backup models\n", + "- **Budget tracking** — per-endpoint or per-user token budgets\n", + "- **Usage tracing** — every call logged as an MLflow trace automatically\n", + "\n", + "Because MLflow Gateway speaks the OpenAI API, you can use `OpenAIChatCompletionClient` with a custom `base_url` to point any AutoGen agent at it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": "## Prerequisites\n\n1. **Start an MLflow server:**\n ```bash\n pip install mlflow\n mlflow server --host 127.0.0.1 --port 5000\n ```\n\n2. **Create a gateway endpoint** via the MLflow UI at [http://localhost:5000](http://localhost:5000): \n Navigate to **AI Gateway → Create Endpoint**, give it a name (e.g. `my-chat-endpoint`), select a provider and model, and enter your API key (stored encrypted on the server).\n\n ![Create Endpoint UI](mlflow_gateway_images/create_endpoint.png)\n\n See the [MLflow AI Gateway documentation](https://mlflow.org/docs/latest/genai/governance/ai-gateway/) for advanced setup options including programmatic endpoint creation via the REST API." + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Installation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pip install -U 'autogen-agentchat' 'autogen-ext[openai]'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connect to MLflow Gateway\n", + "\n", + "Use `OpenAIChatCompletionClient` with:\n", + "- `base_url` pointing to the MLflow Gateway OpenAI-compatible endpoint\n", + "- `model` set to your **gateway endpoint name**\n", + "- `api_key` set to any non-empty string (the gateway manages provider keys server-side)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autogen_ext.models.openai import OpenAIChatCompletionClient\n", + "\n", + "MLFLOW_GATEWAY_URL = \"http://localhost:5000\"\n", + "ENDPOINT_NAME = \"my-chat-endpoint\" # the endpoint name you created in MLflow\n", + "\n", + "model_client = OpenAIChatCompletionClient(\n", + " model=ENDPOINT_NAME,\n", + " base_url=f\"{MLFLOW_GATEWAY_URL}/gateway/openai/v1\",\n", + " api_key=\"unused\", # provider keys are stored on the MLflow server\n", + " model_capabilities={\n", + " \"json_output\": False,\n", + " \"vision\": False,\n", + " \"function_calling\": True,\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Single-turn Chat Example\n", + "\n", + "Use the model client directly to verify the connection:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autogen_core.models import UserMessage\n", + "\n", + "result = await model_client.create(\n", + " messages=[UserMessage(content=\"What is MLflow AI Gateway?\", source=\"user\")]\n", + ")\n", + "print(result.content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Multi-Agent Chat Example\n", + "\n", + "Here we create two agents — a user proxy and an assistant — and run a short conversation through MLflow Gateway." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "from autogen_agentchat.agents import AssistantAgent\nfrom autogen_agentchat.ui import Console\nfrom autogen_agentchat.teams import RoundRobinGroupChat\nfrom autogen_agentchat.conditions import MaxMessageTermination\n\n# Create the assistant using the MLflow Gateway client\nassistant = AssistantAgent(\n name=\"assistant\",\n model_client=model_client,\n system_message=\"You are a helpful AI assistant. Keep answers concise.\",\n)\n\n# Run a quick conversation\ntermination = MaxMessageTermination(max_messages=3)\nteam = RoundRobinGroupChat([assistant], termination_condition=termination)\n\nawait Console(team.run_stream(task=\"Explain LLM gateways in two sentences.\"))" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Streaming\n", + "\n", + "MLflow Gateway supports streaming. AutoGen uses streaming automatically when available." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "from autogen_core.models import UserMessage\n\nasync for chunk in model_client.create_stream(\n messages=[UserMessage(content=\"Write a haiku about LLM gateways.\", source=\"user\")]\n):\n if hasattr(chunk, 'content') and chunk.content:\n print(chunk.content, end=\"\", flush=True)\n\n# Close the client after all examples are done\nawait model_client.close()" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": "## Gateway Features\n\nAll of these are configured in the MLflow UI — no code changes needed in your AutoGen application:\n\n| Feature | Description |\n|---------|-------------|\n| **Fallback** | If the primary model fails or is rate-limited, the gateway retries with a backup model automatically |\n| **Traffic splitting** | Route X% of requests to model A and Y% to model B for A/B testing |\n| **Budget tracking** | Set token/cost limits per endpoint or per user |\n| **Usage tracing** | Every call is logged as an MLflow trace — inputs, outputs, latency, token counts |\n\nYour `model=ENDPOINT_NAME` value stays the same regardless of which provider or model the gateway routes to behind the scenes.\n\n### Budget Tracking\n\n![Budget Tracking UI](mlflow_gateway_images/budget_tracking.png)\n\n### Usage Tracing\n\nThe **Usage dashboard** shows request volume, latency, and error rates at a glance:\n\n![Usage Dashboard](mlflow_gateway_images/usage_dashboard.png)\n\nThe **Logs tab** lists every traced request with its response, token counts, execution time, and status:\n\n![Usage Traces](mlflow_gateway_images/usage_traces.png)\n\nClick any trace to see the full **request and response detail**:\n\n![Trace Detail](mlflow_gateway_images/usage_trace_detail.png)" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/budget_tracking.png b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/budget_tracking.png new file mode 100644 index 000000000000..e08645a4ed1d Binary files /dev/null and b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/budget_tracking.png differ diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/create_endpoint.png b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/create_endpoint.png new file mode 100644 index 000000000000..bea9ea99e7e9 Binary files /dev/null and b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/create_endpoint.png differ diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_dashboard.png b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_dashboard.png new file mode 100644 index 000000000000..70312879ed73 Binary files /dev/null and b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_dashboard.png differ diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_trace_detail.png b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_trace_detail.png new file mode 100644 index 000000000000..25d41aac3459 Binary files /dev/null and b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_trace_detail.png differ diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_traces.png b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_traces.png new file mode 100644 index 000000000000..d750e9f65796 Binary files /dev/null and b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow_gateway_images/usage_traces.png differ