-
Notifications
You must be signed in to change notification settings - Fork 358
feat(fetch-url): support fetching images as multimodal content #1100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| Fetch content from a URL. Returns the main text content extracted from the page. Use this when you need to read a specific web page. | ||
| Fetch content from a URL. Returns the main text content extracted from the page, or the image data if the URL points to an image file. Use this when you need to read a specific web page or image. | ||
|
|
||
| Only public `http`/`https` URLs are supported. Requests to private, loopback, or link-local addresses are refused, and responses larger than 10 MiB are rejected. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,7 +7,8 @@ | |
| * 3. Reject responses larger than `maxBytes` (content-length first, | ||
| * then measured body length as a defensive second check). | ||
| * 4. `text/plain` / `text/markdown` → passthrough verbatim. | ||
| * 5. Otherwise (assumed HTML) → run Readability over a linkedom | ||
| * 5. `image/*` → download binary, encode as base64, return as image kind. | ||
| * 6. Otherwise (assumed HTML) → run Readability over a linkedom | ||
| * document. Return `# ${title}\n\n${text}` (title omitted when | ||
| * absent). If extraction yields no meaningful text, fall back to | ||
| * common content containers (`<article>` / `<main>` / `<body>`) | ||
|
|
@@ -172,6 +173,25 @@ export class LocalFetchURLProvider implements UrlFetcher { | |
| } | ||
| } | ||
|
|
||
| const contentType = (response.headers.get('content-type') ?? '').toLowerCase(); | ||
|
|
||
| // Handle image content types | ||
| if (contentType.startsWith('image/')) { | ||
| const arrayBuffer = await response.arrayBuffer(); | ||
| const actualBytes = arrayBuffer.byteLength; | ||
| if (actualBytes > this.maxBytes) { | ||
| throw new Error( | ||
| `Response body too large: ${String(actualBytes)} bytes exceeds maxBytes (${String(this.maxBytes)}).`, | ||
| ); | ||
| } | ||
| const base64 = Buffer.from(arrayBuffer).toString('base64'); | ||
| return { | ||
| content: '', | ||
| kind: 'image', | ||
| image: { mimeType: contentType, base64 }, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This accepts every Useful? React with 👍 / 👎. |
||
| }; | ||
| } | ||
|
|
||
| const body = await response.text(); | ||
|
|
||
| // Servers may omit content-length — measure again defensively. | ||
|
|
@@ -182,7 +202,6 @@ export class LocalFetchURLProvider implements UrlFetcher { | |
| ); | ||
| } | ||
|
|
||
| const contentType = (response.headers.get('content-type') ?? '').toLowerCase(); | ||
| if (contentType.startsWith('text/plain') || contentType.startsWith('text/markdown')) { | ||
| return { content: body, kind: 'passthrough' }; | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the active model lacks
image_in,FetchURLis still registered whenever a URL fetcher exists (checkedpackages/agent-core/src/agent/tool/index.ts, where onlyReadMediaFileis capability-gated). Fetching anyimage/*URL now returns animage_urlpart, which providers serialize as image input on the next request, so text-only aliases can fail after a successful fetch instead of receiving an actionable tool error. Pass model capabilities into this tool or degrade/error before emitting theimage_urlpart.Useful? React with 👍 / 👎.