MAIT-204: Add MediaWiki write workspace tool for OWUI#47
MAIT-204: Add MediaWiki write workspace tool for OWUI#47
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds an async MediaWiki write tool exposing a Tools.Valves config (wiki_url, username, password, timeout, edit_summary) and a Tools.save_to_wiki(...) method that performs input validation, SSRF-safe URL parsing, threaded mwclient login/save, and optional event emission. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Tool as MediaWiki Tool
participant Validator as Validator
participant SSRF as SSRF Checker
participant Wiki as mwclient.Site
participant Emitter as Event Emitter
User->>Tool: save_to_wiki(title, content, __event_emitter__)
Tool->>Validator: Validate config & inputs
Validator-->>Tool: Validation result
Tool->>SSRF: Parse wiki_url & perform SSRF/IP checks
SSRF-->>Tool: URL/IP safe
Tool->>Wiki: Connect & login (background thread)
Wiki-->>Tool: Login result
Tool->>Wiki: Save or update page with summary (background thread)
Wiki-->>Tool: Save result / errors (protected, API errors)
Tool->>Wiki: Fetch articlepath (siteinfo)
Wiki-->>Tool: articlepath / fallback
opt Event emitter provided
Tool->>Emitter: Emit progress/status events
Emitter-->>Tool: Acks
end
Tool-->>User: Return canonical page URL or error string
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/mediawiki_write.py`:
- Around line 68-71: The code builds netloc directly from parsed.hostname and
parsed.port which mishandles IPv6 literals; when assembling a URL netloc
(variable netloc) wrap an IPv6 literal in brackets (e.g., '[' + parsed.hostname
+ ']') before appending ':port' but keep using the unbracketed parsed.hostname
for DNS resolution; update all occurrences that construct netloc (the netloc
assignment using parsed.hostname/parsed.port at the top and the similar
constructions referenced at lines 96-97 and 136) to conditionally bracket IPv6
addresses while leaving resolution logic unchanged.
- Around line 124-127: The namespace check in _check_namespace uses ns =
title.split(":", 1)[0].strip().lower() which allows underscore variants (e.g.,
Gadget_definition) to bypass _BLOCKED_NAMESPACES; update the
extraction/normalization to convert underscores to spaces and normalize
whitespace (e.g., replace "_" with " ", collapse consecutive spaces, strip,
lower) before comparing against _BLOCKED_NAMESPACES so equivalents like
"Gadget_definition" match "gadget definition".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ac38becf-0fbd-4395-a20a-f780a5d11e2e
📒 Files selected for processing (1)
tools/mediawiki_write.py
| return host, path, scheme | ||
|
|
||
|
|
||
| def _check_ssrf(host: str) -> None: |
There was a problem hiding this comment.
I don't think SSRF is needed here - the URL of the Wiki is assumed to be safe as it's provided by privileged users (aka admins). Let's remove everything related to SSRF to avoid overcomplicating the tool code
| ) | ||
|
|
||
|
|
||
| def _check_namespace(title: str) -> None: |
There was a problem hiding this comment.
For now, please allow edits to be made to the NS_MAIN (0) namespace only. I guess the simplest way to enforce this is to disallow generated titles to contain :.
In later revisions, we will make the list of allowed namespaces configurable via Valve
| elif path == "" or path == "/": | ||
| path = "/w/" | ||
| else: | ||
| path = path.rstrip("/") + "/" |
There was a problem hiding this comment.
These assumptions are not too valid, for example, it's not necessary to have the /w/ directory if the URL provided is /wiki/, etc.
Instead of making these assumptions, let's demand the value for the URL to be the URL of the api.php script, for example: https://site.com/w/api.php or http://site.com/some/api.php or https://site.com/abc/api.php
We will then use the scheme , host and the path and ignore the api.php for mwclient's Site instantiation
| default=30, | ||
| description="Request timeout in seconds.", | ||
| ) | ||
| edit_summary: str = Field( |
There was a problem hiding this comment.
Note for later implementation: in further versions, we will likely want to make the LLM guess the summary text rather than hardcoding it as a valve. No action is needed right now
| [[Internal links]], and [https://example.com External links] as appropriate. | ||
|
|
||
| Args: | ||
| title: The wiki page title (e.g. "Meeting Notes 2025-04-30") |
There was a problem hiding this comment.
This should be a good place to put the NS_MAIN only restriction for titles. It probably also makes sense to let the agent know the max title length and the valid/invalid characters that can not be used as a title. Please see the valid characters list at https://www.mediawiki.org/wiki/Manual:$wgLegalTitleChars (1.39+)
| content: The page content formatted as MediaWiki markup | ||
|
|
||
| Returns: | ||
| A URL to the created or updated wiki page, or an error message. |
There was a problem hiding this comment.
During testing, I noticed that the agent also tends to return the full content of the page that has been created/updated. Let's try to add some instructions in place to prevent it from doing so.
| if len(title) > MAX_TITLE_LENGTH: | ||
| return f"Error: page title exceeds maximum length of {MAX_TITLE_LENGTH} characters." | ||
| if len(content.encode("utf-8")) > MAX_CONTENT_LENGTH: | ||
| return f"Error: content exceeds maximum allowed size of {MAX_CONTENT_LENGTH // 1_000_000} MB." |
There was a problem hiding this comment.
Good place to check for illegal title chars too
| await emit(str(e), done=True) | ||
| return f"Error: {e}" | ||
|
|
||
| # --- SSRF check (runs in thread — getaddrinfo blocks) --- |
There was a problem hiding this comment.
As suggested earlier we don't really need this
| return result["query"]["general"].get("articlepath", "/wiki/$1") | ||
|
|
||
| try: | ||
| article_path = await asyncio.to_thread(_get_article_path) |
There was a problem hiding this comment.
Can we please also emit status during this article path fetching step?
| date: 2025-04-30 | ||
| version: 1.1 | ||
| license: MIT | ||
| description: Allows the AI to save content as a new or updated MediaWiki page when the user asks to save something to the wiki or knowledge base. |
There was a problem hiding this comment.
Please reword into Allows creating new or updating existing MediaWiki pages when the user asks to save or update something to the wiki/knowledge base.
|
|
||
| # --- Validate configuration --- | ||
| if not self.valves.wiki_url: | ||
| await emit("MediaWiki URL is not configured in Tool Valves.", done=True) |
There was a problem hiding this comment.
Since the errors are emitted via status (and afaik there is no better way of doing so) can we please prepend them with Error: ?
|
|
||
| Args: | ||
| title: The wiki page title (e.g. "Meeting Notes 2025-04-30") | ||
| content: The page content formatted as MediaWiki markup |
There was a problem hiding this comment.
Not sure, but maybe we also should note the MAX_CONTENT_LENGTH to the agent. I'd guess it's pointless to mention limit in MB so maybe use chars
| try: | ||
| host, path, scheme = _parse_wiki_url(self.valves.wiki_url) | ||
| except ValueError as e: | ||
| await emit(str(e), done=True) |
There was a problem hiding this comment.
Here and in other places, I believe the behaviour of the status emit in UI is to auto-hide the status if the status is marked as done=True. Thus for Error messages that are done=True by design and that we do not want to be hidden, we must also supply hidden=False explicitly https://docs.openwebui.com/features/extensibility/plugin/tools/development#status-events--fully-compatible
- Require api.php URL in wiki_url Valve; rewrite _parse_wiki_url to
extract host/path by stripping api.php suffix, validate non-empty host,
and normalize trailing slash — dropping all path-guessing heuristics
- Remove SSRF protection (admin-provided URL is trusted)
- Replace namespace blocklist with colon check (NS_MAIN only)
- Add illegal title char validation per MediaWiki $wgLegalTitleChars
- Extend save_to_wiki docstring with title rules and instruction to
return only the page URL after saving
- Return bare page URL from save_to_wiki (no "Page saved successfully" prefix)
- Add emit("Fetching page URL…") before article-path API call
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tools/mediawiki_write.py (1)
236-240: ⚡ Quick winAvoid swallowing URL-fetch failures silently.
The broad
except Exceptionfallback works functionally, but without logging it removes debugging signal for real API/response issues.Suggested fix
- except Exception: + except Exception: + log.warning("Failed to fetch articlepath; using /wiki/$1 fallback", exc_info=True) page_url = _build_page_url(scheme, host, "/wiki/$1", title)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/mediawiki_write.py` around lines 236 - 240, The fallback currently swallows all errors from awaiting _get_article_path; change the broad except to "except Exception as e:" and log the exception before using the fallback URL so failures are visible (e.g., call logging.exception(...) or an existing module logger with a message like "Failed to get article path, falling back to default"), then continue to call _build_page_url(scheme, host, "/wiki/$1", title); keep the same functions (_get_article_path and _build_page_url) and await asyncio.to_thread usage but surface the exception for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tools/mediawiki_write.py`:
- Around line 236-240: The fallback currently swallows all errors from awaiting
_get_article_path; change the broad except to "except Exception as e:" and log
the exception before using the fallback URL so failures are visible (e.g., call
logging.exception(...) or an existing module logger with a message like "Failed
to get article path, falling back to default"), then continue to call
_build_page_url(scheme, host, "/wiki/$1", title); keep the same functions
(_get_article_path and _build_page_url) and await asyncio.to_thread usage but
surface the exception for debugging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c9be17e4-fd18-4fbb-af3b-631b6152f62e
📒 Files selected for processing (1)
tools/mediawiki_write.py
| if path_stripped.endswith("/api.php"): | ||
| path = path_stripped[: -len("/api.php")] + "/" | ||
| elif path_stripped == "api.php": | ||
| path = "/" | ||
| else: | ||
| path = path_stripped.rstrip("/") + "/" | ||
|
|
There was a problem hiding this comment.
Enforce api.php URL requirement instead of silently accepting other paths.
Line 56 onward currently normalizes non-api.php paths, but this PR’s contract says wiki_url must point to api.php. Accepting arbitrary paths can produce invalid mwclient.Site(path=...) values and failed writes.
Suggested fix
- # Strip api.php (with optional trailing slash) from path, then ensure trailing slash
- path = parsed.path
- # Remove trailing slash before checking for api.php suffix
- path_stripped = path.rstrip("/")
- if path_stripped.endswith("/api.php"):
- path = path_stripped[: -len("/api.php")] + "/"
- elif path_stripped == "api.php":
- path = "/"
- else:
- path = path_stripped.rstrip("/") + "/"
+ # Require api.php URL and derive mwclient path from its parent directory
+ path_stripped = parsed.path.rstrip("/")
+ if not path_stripped.endswith("/api.php"):
+ raise ValueError(
+ "wiki_url must be a full api.php URL, e.g. https://wiki.example.com/w/api.php"
+ )
+ base_path = path_stripped[: -len("/api.php")]
+ path = (base_path.rstrip("/") + "/") if base_path else "/"
Summary
tools/mediawiki_write.py— an OpenWebUI Workspace Tool that lets the AI write content to a MediaWiki instance via Native Tool CallingFeatures
mwclientMediaWiki:,Template:,Module:,Gadget:namespacesasyncio.to_thread()to avoid blocking the event loop