feat(extensions): authenticate GitHub-hosted catalog and download requests with GITHUB_TOKEN/GH_TOKEN#2087
Conversation
…uests with GITHUB_TOKEN/GH_TOKEN
There was a problem hiding this comment.
Pull request overview
Adds GitHub-token authentication to extension catalog fetching and extension ZIP downloads so catalogs/assets hosted in private GitHub repos work when GITHUB_TOKEN/GH_TOKEN is set, while aiming to avoid leaking credentials to non-GitHub hosts.
Changes:
- Introduces
ExtensionCatalog._make_request(url)to attach anAuthorizationheader for GitHub-hosted URLs when a token is available. - Updates all
urllib.request.urlopen(...)call sites inExtensionCatalogto use the new request builder. - Adds unit/integration tests for the request/header behavior and updates user docs to reflect token usage for both catalogs and downloads.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/extensions.py |
Adds _make_request and routes catalog/download urlopen calls through it to support authenticated GitHub fetches. |
tests/test_extensions.py |
Adds tests validating auth header behavior and that urlopen receives a Request containing the header. |
extensions/EXTENSION-USER-GUIDE.md |
Updates env var documentation and adds an example for private GitHub-hosted catalogs. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
src/specify_cli/extensions.py
Outdated
|
|
||
| headers: Dict[str, str] = {} | ||
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | ||
| if token and any( | ||
| host in url | ||
| for host in ("raw.githubusercontent.com", "github.com", "api.github.com") | ||
| ): |
There was a problem hiding this comment.
The GitHub-hosted URL check uses substring matching (host in url), which can incorrectly attach the token to non-GitHub hosts (e.g., https://github.com.evil.com/... or https://internal.example.com/path/github.com/...) and violates the stated goal of preventing credential leakage. Parse the URL and compare urlparse(url).hostname (lowercased) against an allowlist of exact hostnames instead of scanning the full URL string.
| headers: Dict[str, str] = {} | |
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | |
| if token and any( | |
| host in url | |
| for host in ("raw.githubusercontent.com", "github.com", "api.github.com") | |
| ): | |
| from urllib.parse import urlparse | |
| headers: Dict[str, str] = {} | |
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | |
| hostname = (urlparse(url).hostname or "").lower() | |
| github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | |
| if token and hostname in github_hosts: |
| catalog = self._make_catalog(temp_dir) | ||
| req = catalog._make_request("https://internal.example.com/catalog.json") | ||
| assert "Authorization" not in req.headers | ||
|
|
There was a problem hiding this comment.
Current tests cover a generic non-GitHub domain, but they don't cover common spoofing cases that would slip through the current substring-based domain check (e.g., https://github.com.evil.com/... or a non-GitHub host whose path/query contains github.com). Add negative tests for these URL shapes to ensure the auth header is never attached outside the intended allowlist.
| def test_make_request_token_not_added_for_github_lookalike_host(self, temp_dir, monkeypatch): | |
| """Auth header is not attached to non-GitHub hosts that only contain github.com in the hostname.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://github.com.evil.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers | |
| def test_make_request_token_not_added_for_non_github_host_with_github_in_path(self, temp_dir, monkeypatch): | |
| """Auth header is not attached when a non-GitHub host includes github.com only in the URL path.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://evil.example.com/github.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers | |
| def test_make_request_token_not_added_for_non_github_host_with_github_in_query(self, temp_dir, monkeypatch): | |
| """Auth header is not attached when a non-GitHub host includes github.com only in the query string.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://evil.example.com/download?source=https://github.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers |
|
Did you also make sure presets are covered similarly? |
…stname parsing & added 3 new spoofing tests -> Issue # 2037
…direct leakage -> Issue # 2037
|
I'd prefer to do it at the same time as it would the same kind of change. Thanks for working with us on this! |
mnriem
left a comment
There was a problem hiding this comment.
Let me know if you can deliver it as part of this PR without too much hassle. Thanks!
…s with GITHUB_TOKEN/GH_TOKEN -> Issue # 2037
There was a problem hiding this comment.
Pull request overview
Adds GitHub-token authentication for GitHub-hosted extension (and preset) catalog/ZIP downloads so private GitHub repositories can be used as sources without failing unauthenticated.
Changes:
- Add
_make_request()and_open_url()helpers to attachAuthorization: token …for GitHub-hosted URLs and use them for catalog fetch + ZIP downloads. - Add unit/integration-style tests asserting auth headers are (or are not) attached based on URL host and env vars.
- Update user docs for extensions and presets to describe
GH_TOKEN/GITHUB_TOKENbehavior and provide private-catalog examples.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/extensions.py |
Adds request/open helpers and routes catalog fetch + ZIP download through them. |
src/specify_cli/presets.py |
Mirrors the same authenticated request/open behavior for preset catalogs and ZIP downloads. |
tests/test_extensions.py |
Adds test coverage for request header behavior and that catalog fetch / extension download use it. |
tests/test_presets.py |
Adds similar test coverage for preset catalogs and preset downloads. |
extensions/EXTENSION-USER-GUIDE.md |
Updates env var docs + adds private GitHub catalog usage example. |
presets/README.md |
Adds token env var docs + private GitHub catalog usage example. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 6/6 changed files
- Comments generated: 5
src/specify_cli/extensions.py
Outdated
| _github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | ||
|
|
||
| class _StripAuthOnRedirect(urllib.request.HTTPRedirectHandler): | ||
| def redirect_request(_self, req, fp, code, msg, headers, newurl): | ||
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | ||
| if new_req is not None: | ||
| hostname = (urlparse(newurl).hostname or "").lower() | ||
| if hostname not in _github_hosts: | ||
| new_req.headers.pop("Authorization", None) | ||
| return new_req |
There was a problem hiding this comment.
_open_url() strips the Authorization header on redirects to any host outside {raw.githubusercontent.com, github.com, api.github.com}. GitHub archive URLs (e.g. https://github.com///archive/refs/tags/.zip) redirect to codeload.github.com, so with a token set this logic will drop the header and private-repo archive downloads will still 404. Consider including codeload.github.com (and any other GitHub-owned redirect targets you need to support) in the allowlist used for redirect decisions, or otherwise preserving Authorization for redirects that remain within trusted GitHub domains.
src/specify_cli/presets.py
Outdated
| _github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | ||
|
|
||
| class _StripAuthOnRedirect(urllib.request.HTTPRedirectHandler): | ||
| def redirect_request(_self, req, fp, code, msg, headers, newurl): | ||
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | ||
| if new_req is not None: | ||
| hostname = (urlparse(newurl).hostname or "").lower() | ||
| if hostname not in _github_hosts: | ||
| new_req.headers.pop("Authorization", None) | ||
| return new_req |
There was a problem hiding this comment.
_open_url() currently removes the Authorization header when redirected to any host not in the small GitHub host set. GitHub “archive/refs” ZIP URLs commonly redirect to codeload.github.com; stripping auth there will break downloading presets/extensions from private repos even when GITHUB_TOKEN/GH_TOKEN is set. Update the redirect allowlist/logic to treat codeload.github.com (and any other GitHub-owned redirect endpoints you want to support) as trusted so auth is preserved when required.
| | Variable | Description | Default | | ||
| |----------|-------------|---------| | ||
| | `SPECKIT_PRESET_CATALOG_URL` | Override the full catalog stack with a single URL (replaces all defaults) | Built-in default stack | | ||
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or preset ZIPs are hosted in a private GitHub repository. | None | | ||
|
|
||
| #### Example: Using a private GitHub-hosted catalog | ||
|
|
||
| ```bash | ||
| # Authenticate with a token (gh CLI, PAT, or GITHUB_TOKEN in CI) | ||
| export GITHUB_TOKEN=$(gh auth token) | ||
|
|
||
| # Search a private catalog added via `specify preset catalog add` | ||
| specify preset search my-template | ||
|
|
||
| # Install from a private catalog | ||
| specify preset add my-template | ||
| ``` | ||
|
|
||
| The token is attached automatically to requests targeting GitHub domains. Non-GitHub catalog URLs are always fetched without credentials. |
There was a problem hiding this comment.
The PR title/description focus on ExtensionCatalog, but this change also introduces the same GitHub-token behavior for PresetCatalog and documents it here. Please update the PR title and/or description to reflect that presets are included as well, so reviewers and release notes capture the full scope.
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable, please explain why
…direct safety, shared helper
…correctly when token is blank
|
@mnriem review items addressed. |
There was a problem hiding this comment.
Pull request overview
Adds GitHub-token authentication support for extension/preset catalog fetches and ZIP downloads when the target URL is GitHub-hosted, closing the private-repo “404 without auth” gap while avoiding credential leakage to non-GitHub hosts (including on redirects).
Changes:
- Introduce shared GitHub-authenticated HTTP helpers (
build_github_request/open_github_url) with redirect-time auth stripping. - Route ExtensionCatalog and PresetCatalog network fetch/download code paths through the new helper.
- Add unit/integration tests plus documentation updates for
GH_TOKEN/GITHUB_TOKENusage and private GitHub-hosted catalogs.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/_github_http.py |
New shared helper for building authenticated requests and safely handling redirects. |
src/specify_cli/extensions.py |
Extension catalog fetch + extension ZIP download now use authenticated opener when applicable. |
src/specify_cli/presets.py |
Preset catalog fetch + preset ZIP download now use the shared authenticated opener. |
tests/test_extensions.py |
Adds coverage for request-building and ensures auth headers are passed through fetch/download call sites. |
tests/test_presets.py |
Adds parallel coverage for PresetCatalog request-building and auth propagation. |
extensions/EXTENSION-USER-GUIDE.md |
Updates env var documentation and adds private GitHub-hosted catalog example. |
presets/README.md |
Documents token usage for private GitHub-hosted preset catalogs and downloads. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (4)
src/specify_cli/extensions.py:1737
import urllib.requestis now unused in this try-block since the code usesself._open_url(...)instead ofurllib.request.urlopen(...). Please remove the unused import to avoid ruff F401.
import urllib.request
import urllib.error
with self._open_url(catalog_url, timeout=10) as response:
src/specify_cli/extensions.py:1881
- Since this method now downloads via
self._open_url(...), the localimport urllib.requestnear the top of the function is unused and will fail ruff (F401). Remove the unusedurllib.requestimport (you likely still needurllib.errorfor the URLError handler).
# Download the ZIP file
try:
with self._open_url(download_url, timeout=60) as response:
zip_data = response.read()
src/specify_cli/presets.py:1479
import urllib.request/import urllib.errorare now unused here because the code usesself._open_url(...)instead ofurllib.request.urlopen(...). Remove the unused imports to avoid ruff F401.
try:
import urllib.request
import urllib.error
with self._open_url(catalog_url, timeout=10) as response:
catalog_data = json.loads(response.read())
src/specify_cli/presets.py:1640
- Now that downloads use
self._open_url(...), the localimport urllib.requestin this method is unused and will be flagged by ruff (F401). Remove the unused import (keepurllib.errorif you still catchurllib.error.URLError).
try:
with self._open_url(download_url, timeout=60) as response:
zip_data = response.read()
- Files reviewed: 7/7 changed files
- Comments generated: 2
…ons and unused imports
There was a problem hiding this comment.
Pull request overview
Adds GitHub-token–aware request handling for extension/preset catalog fetches and downloads so private GitHub-hosted catalogs/ZIPs work when GITHUB_TOKEN/GH_TOKEN is set, while aiming to prevent credential leakage to non-GitHub hosts.
Changes:
- Introduces shared GitHub-authenticated urllib helpers (
build_github_request,open_github_url) and wires them intoExtensionCatalogandPresetCatalognetwork paths. - Adds unit/integration tests asserting auth headers are attached for GitHub hosts and not attached for non-GitHub/lookalike URLs.
- Updates user docs to describe token usage for private GitHub-hosted catalogs and downloads.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/_github_http.py |
New shared helpers for attaching GitHub token headers and handling redirects. |
src/specify_cli/extensions.py |
Routes catalog fetch and ZIP download through GitHub-auth-aware opener helper. |
src/specify_cli/presets.py |
Routes preset catalog fetch and pack download through GitHub-auth-aware opener helper. |
tests/test_extensions.py |
Adds coverage for request building and header propagation in extension flows. |
tests/test_presets.py |
Adds coverage for request building and header propagation in preset flows. |
extensions/EXTENSION-USER-GUIDE.md |
Documents token behavior for GitHub-hosted catalogs/ZIPs and adds example. |
presets/README.md |
Documents token behavior for GitHub-hosted preset catalogs/ZIPs and adds example. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 7/7 changed files
- Comments generated: 3
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | ||
| if new_req is not None: | ||
| hostname = (urlparse(newurl).hostname or "").lower() | ||
| if hostname not in GITHUB_HOSTS: | ||
| new_req.headers.pop("Authorization", None) |
There was a problem hiding this comment.
_StripAuthOnRedirect.redirect_request() calls super().redirect_request() first; urllib’s default redirect implementation strips the Authorization header on cross-host redirects. That means redirects between allowed GitHub hosts (e.g. github.com → codeload.github.com) will lose the token, contradicting the docstring and likely breaking private repo archive downloads. Consider capturing the original Authorization header, calling super(), then re-attaching it when the redirect target hostname is still in GITHUB_HOSTS (and explicitly removing it otherwise). Adding a focused unit test for the github.com → codeload.github.com redirect case would prevent regressions.
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | |
| if new_req is not None: | |
| hostname = (urlparse(newurl).hostname or "").lower() | |
| if hostname not in GITHUB_HOSTS: | |
| new_req.headers.pop("Authorization", None) | |
| original_auth = req.get_header("Authorization") | |
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | |
| if new_req is not None: | |
| hostname = (urlparse(newurl).hostname or "").lower() | |
| if hostname in GITHUB_HOSTS: | |
| if original_auth: | |
| new_req.add_unredirected_header("Authorization", original_auth) | |
| else: | |
| new_req.headers.pop("Authorization", None) | |
| new_req.unredirected_hdrs.pop("Authorization", None) |
| |----------|-------------|---------| | ||
| | `SPECKIT_CATALOG_URL` | Override the full catalog stack with a single URL (backward compat) | Built-in default stack | | ||
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub API token for downloads | None | | ||
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or extension ZIPs are hosted in a private GitHub repository. | None | |
There was a problem hiding this comment.
Docs enumerate GitHub hosts as raw.githubusercontent.com, github.com, and api.github.com, but the implementation also treats codeload.github.com as GitHub-owned (and tests rely on it). Please either include codeload.github.com in this list or adjust the wording so the parenthetical isn’t interpreted as exhaustive.
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or extension ZIPs are hosted in a private GitHub repository. | None | | |
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`, `codeload.github.com`). Required when your catalog JSON or extension ZIPs are hosted in a private GitHub repository. | None | |
| | Variable | Description | Default | | ||
| |----------|-------------|---------| | ||
| | `SPECKIT_PRESET_CATALOG_URL` | Override the full catalog stack with a single URL (replaces all defaults) | Built-in default stack | | ||
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or preset ZIPs are hosted in a private GitHub repository. | None | |
There was a problem hiding this comment.
Docs list GitHub-hosted URLs as raw.githubusercontent.com, github.com, and api.github.com, but the code also supports codeload.github.com (GitHub archive redirect target). Please add codeload.github.com here or rephrase to avoid implying the list is complete.
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or preset ZIPs are hosted in a private GitHub repository. | None | | |
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`, `codeload.github.com`). Required when your catalog JSON or preset ZIPs are hosted in a private GitHub repository. | None | |
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable, please explain why
Description
Fixes #2037. Closes the authentication gap introduced when multi-catalog support landed in #1707.
Before this change, all network requests in
ExtensionCatalogused bareurllib.request.urlopen(url)with no headers. Any catalog or extension ZIP hosted in a private GitHub repository would silently fail with HTTP 404, regardless of whetherGITHUB_TOKENorGH_TOKENwas set in the environment.This PR adds a
_make_request(url)helper onExtensionCatalogthat attaches anAuthorization: token <value>header when:GITHUB_TOKENorGH_TOKENis present in the environment, andraw.githubusercontent.com,github.com, orapi.github.com)Non-GitHub URLs are always fetched without credentials to prevent token leakage to third-party hosts.
The three affected call sites are:
_fetch_single_catalog— fetches catalog JSON from a configured catalog URLfetch_catalog— legacy single-catalog path used whenSPECKIT_CATALOG_URLis setdownload_extension— downloads extension ZIP from a release asset URLNo behavior change for users without a token set — the code path is identical to before.
Documentation in
EXTENSION-USER-GUIDE.mdhas been updated: the existingGH_TOKEN/GITHUB_TOKENtable entry (which described the token as "for downloads" only) now accurately reflects that it covers catalog fetches as well, and a private-catalog usage example has been added.Testing
uv run specify --help— CLI loads correctly, all commands presentmainbefore this change:TestManifestPathTraversal::test_record_file_rejects_absolute_pathTestCommandRegistrar::test_codex_skill_registration_uses_fallback_script_variant_without_init_optionsTestExtensionCatalogintests/test_extensions.py:_make_request: no-token path,GITHUB_TOKEN,GH_TOKENfallback, precedence when both are set, non-GitHub URL never gets header (security),api.github.comdomainurlopenand assert the capturedRequestobject carries the auth header — one for_fetch_single_catalog, one fordownload_extensionAI Disclosure
This PR was implemented with AI assistance via Claude Code.