Skip to content

MAIT-204: Add MediaWiki write workspace tool for OWUI#47

Open
acikabubo wants to merge 2 commits intomainfrom
MAIT-204-owui-mediawiki-write-tool
Open

MAIT-204: Add MediaWiki write workspace tool for OWUI#47
acikabubo wants to merge 2 commits intomainfrom
MAIT-204-owui-mediawiki-write-tool

Conversation

@acikabubo
Copy link
Copy Markdown
Contributor

Summary

  • Adds tools/mediawiki_write.py — an OpenWebUI Workspace Tool that lets the AI write content to a MediaWiki instance via Native Tool Calling
  • Triggered by phrases like "save to wiki", "write to wiki", "create a wiki page"
  • Credentials (wiki URL, username, BotPassword) configured via Valves

Features

  • Creates or updates MediaWiki pages using mwclient
  • Converts content to MediaWiki markup before saving
  • SSRF protection: blocks writes to private/internal addresses, checks all DNS results
  • Namespace guard: blocks writes to MediaWiki:, Template:, Module:, Gadget: namespaces
  • Input validation: max title length (255 chars), max content size (2 MB)
  • All blocking I/O runs in asyncio.to_thread() to avoid blocking the event loop
  • Event emitters for real-time status (connecting, saving, done)
  • Returns canonical page URL after successful save

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added a MediaWiki integration tool to save or update pages on a MediaWiki site with configurable connection and authentication, edit summaries, and progress/status reporting.
    • Input and configuration validation (titles, content size, namespace restrictions) and clearer error handling; returns a canonical page URL on success.

Walkthrough

Adds an async MediaWiki write tool exposing a Tools.Valves config (wiki_url, username, password, timeout, edit_summary) and a Tools.save_to_wiki(...) method that performs input validation, SSRF-safe URL parsing, threaded mwclient login/save, and optional event emission.

Changes

Cohort / File(s) Summary
MediaWiki Write Tool
tools/mediawiki_write.py
New file implementing Tools with nested Valves(BaseModel) for configuration and an async save_to_wiki method. Implements input validation (title/content constraints), URL parsing and SSRF checks, threaded mwclient connect/login and page save, error handling, articlepath retrieval, and optional progress/event emission.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Tool as MediaWiki Tool
    participant Validator as Validator
    participant SSRF as SSRF Checker
    participant Wiki as mwclient.Site
    participant Emitter as Event Emitter

    User->>Tool: save_to_wiki(title, content, __event_emitter__)
    Tool->>Validator: Validate config & inputs
    Validator-->>Tool: Validation result
    Tool->>SSRF: Parse wiki_url & perform SSRF/IP checks
    SSRF-->>Tool: URL/IP safe
    Tool->>Wiki: Connect & login (background thread)
    Wiki-->>Tool: Login result
    Tool->>Wiki: Save or update page with summary (background thread)
    Wiki-->>Tool: Save result / errors (protected, API errors)
    Tool->>Wiki: Fetch articlepath (siteinfo)
    Wiki-->>Tool: articlepath / fallback
    opt Event emitter provided
        Tool->>Emitter: Emit progress/status events
        Emitter-->>Tool: Acks
    end
    Tool-->>User: Return canonical page URL or error string
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a MediaWiki write tool for OpenWebUI workspace. It is specific, concise, and directly reflects the core functionality added in the pull request.
Description check ✅ Passed The description is well-related to the changeset. It accurately outlines the new tool's purpose, features, and implementation details including SSRF protection, namespace guards, input validation, async handling, and event emitters.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/mediawiki_write.py`:
- Around line 68-71: The code builds netloc directly from parsed.hostname and
parsed.port which mishandles IPv6 literals; when assembling a URL netloc
(variable netloc) wrap an IPv6 literal in brackets (e.g., '[' + parsed.hostname
+ ']') before appending ':port' but keep using the unbracketed parsed.hostname
for DNS resolution; update all occurrences that construct netloc (the netloc
assignment using parsed.hostname/parsed.port at the top and the similar
constructions referenced at lines 96-97 and 136) to conditionally bracket IPv6
addresses while leaving resolution logic unchanged.
- Around line 124-127: The namespace check in _check_namespace uses ns =
title.split(":", 1)[0].strip().lower() which allows underscore variants (e.g.,
Gadget_definition) to bypass _BLOCKED_NAMESPACES; update the
extraction/normalization to convert underscores to spaces and normalize
whitespace (e.g., replace "_" with " ", collapse consecutive spaces, strip,
lower) before comparing against _BLOCKED_NAMESPACES so equivalents like
"Gadget_definition" match "gadget definition".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ac38becf-0fbd-4395-a20a-f780a5d11e2e

📥 Commits

Reviewing files that changed from the base of the PR and between afe51ff and b808994.

📒 Files selected for processing (1)
  • tools/mediawiki_write.py

Comment thread tools/mediawiki_write.py Outdated
Comment thread tools/mediawiki_write.py Outdated
Comment thread tools/mediawiki_write.py Outdated
return host, path, scheme


def _check_ssrf(host: str) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think SSRF is needed here - the URL of the Wiki is assumed to be safe as it's provided by privileged users (aka admins). Let's remove everything related to SSRF to avoid overcomplicating the tool code

Comment thread tools/mediawiki_write.py Outdated
)


def _check_namespace(title: str) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, please allow edits to be made to the NS_MAIN (0) namespace only. I guess the simplest way to enforce this is to disallow generated titles to contain :.

In later revisions, we will make the list of allowed namespaces configurable via Valve

Comment thread tools/mediawiki_write.py Outdated
elif path == "" or path == "/":
path = "/w/"
else:
path = path.rstrip("/") + "/"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assumptions are not too valid, for example, it's not necessary to have the /w/ directory if the URL provided is /wiki/, etc.

Instead of making these assumptions, let's demand the value for the URL to be the URL of the api.php script, for example: https://site.com/w/api.php or http://site.com/some/api.php or https://site.com/abc/api.php

We will then use the scheme , host and the path and ignore the api.php for mwclient's Site instantiation

Comment thread tools/mediawiki_write.py
default=30,
description="Request timeout in seconds.",
)
edit_summary: str = Field(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for later implementation: in further versions, we will likely want to make the LLM guess the summary text rather than hardcoding it as a valve. No action is needed right now

Comment thread tools/mediawiki_write.py
[[Internal links]], and [https://example.com External links] as appropriate.

Args:
title: The wiki page title (e.g. "Meeting Notes 2025-04-30")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a good place to put the NS_MAIN only restriction for titles. It probably also makes sense to let the agent know the max title length and the valid/invalid characters that can not be used as a title. Please see the valid characters list at https://www.mediawiki.org/wiki/Manual:$wgLegalTitleChars (1.39+)

Comment thread tools/mediawiki_write.py
content: The page content formatted as MediaWiki markup

Returns:
A URL to the created or updated wiki page, or an error message.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During testing, I noticed that the agent also tends to return the full content of the page that has been created/updated. Let's try to add some instructions in place to prevent it from doing so.

Comment thread tools/mediawiki_write.py
if len(title) > MAX_TITLE_LENGTH:
return f"Error: page title exceeds maximum length of {MAX_TITLE_LENGTH} characters."
if len(content.encode("utf-8")) > MAX_CONTENT_LENGTH:
return f"Error: content exceeds maximum allowed size of {MAX_CONTENT_LENGTH // 1_000_000} MB."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good place to check for illegal title chars too

Comment thread tools/mediawiki_write.py Outdated
await emit(str(e), done=True)
return f"Error: {e}"

# --- SSRF check (runs in thread — getaddrinfo blocks) ---
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested earlier we don't really need this

Comment thread tools/mediawiki_write.py
return result["query"]["general"].get("articlepath", "/wiki/$1")

try:
article_path = await asyncio.to_thread(_get_article_path)
Copy link
Copy Markdown
Member

@vedmaka vedmaka Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please also emit status during this article path fetching step?

Comment thread tools/mediawiki_write.py
date: 2025-04-30
version: 1.1
license: MIT
description: Allows the AI to save content as a new or updated MediaWiki page when the user asks to save something to the wiki or knowledge base.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reword into Allows creating new or updating existing MediaWiki pages when the user asks to save or update something to the wiki/knowledge base.

Comment thread tools/mediawiki_write.py

# --- Validate configuration ---
if not self.valves.wiki_url:
await emit("MediaWiki URL is not configured in Tool Valves.", done=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the errors are emitted via status (and afaik there is no better way of doing so) can we please prepend them with Error: ?

Comment thread tools/mediawiki_write.py

Args:
title: The wiki page title (e.g. "Meeting Notes 2025-04-30")
content: The page content formatted as MediaWiki markup
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but maybe we also should note the MAX_CONTENT_LENGTH to the agent. I'd guess it's pointless to mention limit in MB so maybe use chars

Comment thread tools/mediawiki_write.py
try:
host, path, scheme = _parse_wiki_url(self.valves.wiki_url)
except ValueError as e:
await emit(str(e), done=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other places, I believe the behaviour of the status emit in UI is to auto-hide the status if the status is marked as done=True. Thus for Error messages that are done=True by design and that we do not want to be hidden, we must also supply hidden=False explicitly https://docs.openwebui.com/features/extensibility/plugin/tools/development#status-events--fully-compatible

- Require api.php URL in wiki_url Valve; rewrite _parse_wiki_url to
  extract host/path by stripping api.php suffix, validate non-empty host,
  and normalize trailing slash — dropping all path-guessing heuristics
- Remove SSRF protection (admin-provided URL is trusted)
- Replace namespace blocklist with colon check (NS_MAIN only)
- Add illegal title char validation per MediaWiki $wgLegalTitleChars
- Extend save_to_wiki docstring with title rules and instruction to
  return only the page URL after saving
- Return bare page URL from save_to_wiki (no "Page saved successfully" prefix)
- Add emit("Fetching page URL…") before article-path API call
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tools/mediawiki_write.py (1)

236-240: ⚡ Quick win

Avoid swallowing URL-fetch failures silently.

The broad except Exception fallback works functionally, but without logging it removes debugging signal for real API/response issues.

Suggested fix
-        except Exception:
+        except Exception:
+            log.warning("Failed to fetch articlepath; using /wiki/$1 fallback", exc_info=True)
             page_url = _build_page_url(scheme, host, "/wiki/$1", title)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/mediawiki_write.py` around lines 236 - 240, The fallback currently
swallows all errors from awaiting _get_article_path; change the broad except to
"except Exception as e:" and log the exception before using the fallback URL so
failures are visible (e.g., call logging.exception(...) or an existing module
logger with a message like "Failed to get article path, falling back to
default"), then continue to call _build_page_url(scheme, host, "/wiki/$1",
title); keep the same functions (_get_article_path and _build_page_url) and
await asyncio.to_thread usage but surface the exception for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tools/mediawiki_write.py`:
- Around line 236-240: The fallback currently swallows all errors from awaiting
_get_article_path; change the broad except to "except Exception as e:" and log
the exception before using the fallback URL so failures are visible (e.g., call
logging.exception(...) or an existing module logger with a message like "Failed
to get article path, falling back to default"), then continue to call
_build_page_url(scheme, host, "/wiki/$1", title); keep the same functions
(_get_article_path and _build_page_url) and await asyncio.to_thread usage but
surface the exception for debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c9be17e4-fd18-4fbb-af3b-631b6152f62e

📥 Commits

Reviewing files that changed from the base of the PR and between b808994 and 99ea499.

📒 Files selected for processing (1)
  • tools/mediawiki_write.py

Comment thread tools/mediawiki_write.py
Comment on lines +56 to +62
if path_stripped.endswith("/api.php"):
path = path_stripped[: -len("/api.php")] + "/"
elif path_stripped == "api.php":
path = "/"
else:
path = path_stripped.rstrip("/") + "/"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce api.php URL requirement instead of silently accepting other paths.

Line 56 onward currently normalizes non-api.php paths, but this PR’s contract says wiki_url must point to api.php. Accepting arbitrary paths can produce invalid mwclient.Site(path=...) values and failed writes.

Suggested fix
-    # Strip api.php (with optional trailing slash) from path, then ensure trailing slash
-    path = parsed.path
-    # Remove trailing slash before checking for api.php suffix
-    path_stripped = path.rstrip("/")
-    if path_stripped.endswith("/api.php"):
-        path = path_stripped[: -len("/api.php")] + "/"
-    elif path_stripped == "api.php":
-        path = "/"
-    else:
-        path = path_stripped.rstrip("/") + "/"
+    # Require api.php URL and derive mwclient path from its parent directory
+    path_stripped = parsed.path.rstrip("/")
+    if not path_stripped.endswith("/api.php"):
+        raise ValueError(
+            "wiki_url must be a full api.php URL, e.g. https://wiki.example.com/w/api.php"
+        )
+    base_path = path_stripped[: -len("/api.php")]
+    path = (base_path.rstrip("/") + "/") if base_path else "/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants