Skip to content

Add Sourcepoint first-party CMP integration#625

Open
ChristianPavilonis wants to merge 14 commits intomainfrom
feature/sourcepoint-integration
Open

Add Sourcepoint first-party CMP integration#625
ChristianPavilonis wants to merge 14 commits intomainfrom
feature/sourcepoint-integration

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis commented Apr 8, 2026

Summary

  • Adds a new Sourcepoint CMP (Consent Management Platform) first-party proxy integration that routes cdn.privacy-mgmt.com and geo.privacymanager.io traffic through Trusted Server, eliminating third-party requests for consent management assets.
  • Implements four rewriting layers (HTML attributes, JS response bodies, runtime config trap, client-side DOM guard) to ensure comprehensive URL coverage across static markup, dynamically loaded scripts, and Next.js hydration payloads.
  • Ships disabled by default with full TOML configuration, integration documentation, 14 Rust unit tests, and 6 TypeScript tests.

Changes

File Change
crates/trusted-server-core/src/integrations/sourcepoint.rs New ~860-line integration module: route registration, CDN/geo proxy handlers, HTML attribute rewriter, JS body rewriter (rewrite_script_content), head injector (window._sp_ property trap), Accept-Encoding scoping, backend selection, and 14 unit tests
crates/trusted-server-core/src/integrations/mod.rs Register sourcepoint module and wire it into the integration registry
crates/js/lib/src/integrations/sourcepoint/index.ts TypeScript entry point that bootstraps the client-side script guard
crates/js/lib/src/integrations/sourcepoint/script_guard.ts Client-side DOM insertion dispatcher handler that intercepts dynamically inserted <script>/<link> elements pointing at Sourcepoint CDN and rewrites them to first-party paths
crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts 6 Vitest tests covering the script guard (rewrites, skips, and teardown)
docs/guide/integrations/sourcepoint.md Integration documentation: overview, setup steps, config reference, architecture
docs/guide/integrations-overview.md Updated overview with Sourcepoint in the Mermaid flowchart, performance characteristics table, and environment variables section
trusted-server.toml Added [integrations.sourcepoint] config stanza (disabled by default)

Closes

Closes #145
Closes #344
Closes #345

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses log macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@ChristianPavilonis ChristianPavilonis self-assigned this Apr 8, 2026
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
@aram356 aram356 requested review from aram356 and prk-Jr April 9, 2026 16:36
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds a well-structured Sourcepoint CMP first-party proxy with four rewriting layers (HTML attributes, JS body, runtime config trap, client-side DOM guard). The implementation follows existing integration patterns closely, has 14 Rust + 6 JS tests, and ships disabled by default. A few items need attention before merge.

Blocking

🔧 wrench

  • take_body_str() panics on non-UTF-8 upstream responses: response.take_body_str() at line 400 will panic if the upstream CDN returns invalid UTF-8 with a JS Content-Type. Read bytes and attempt conversion with a fallback instead. (sourcepoint.rs:400)
  • CI failure — format-typescript: Prettier check is failing on test/integrations/sourcepoint/script_guard.test.ts. Must be fixed before merge.

❓ question

  • Redirect Location headers not rewritten: The JS body rewrite is gated on status == 200. If the CDN returns a 3xx redirect with a Location pointing to cdn.privacy-mgmt.com, the browser follows it back to the third-party CDN, defeating the proxy. (sourcepoint.rs:394)

Non-blocking

🤔 thinking

  • Upstream response headers forwarded verbatim: Set-Cookie and other headers from cdn.privacy-mgmt.com pass through to the client on the non-rewritten path. Same as Didomi, but worth confirming this is intended. (sourcepoint.rs:418)
  • Regex matches mismatched quotes: The CDN URL pattern can match 'url" (single open, double close). Extremely unlikely in minified JS but could produce malformed output. (sourcepoint.rs:65)

♻️ refactor

  • Redundant rewrite_sdk guard: Checked in both handles_attribute() and rewrite() — the second is unreachable. (sourcepoint.rs:437)
  • Head injector config property names: Hardcoded Sourcepoint config keys could go stale. A test asserting known property names appear in the generated script would catch omissions. (sourcepoint.rs:467)

🌱 seedling

  • No body size limit: The body is read entirely into memory with no upper bound. A Content-Length check would prevent OOM from unexpected large responses. (sourcepoint.rs:400)
  • Missing validation test for invalid cdn_origin: No test proves that register() fails when cdn_origin is not a valid URL. (sourcepoint.rs:153)

⛏ nitpick

  • Manual [sourcepoint] log prefix: Other integrations rely on the log crate's module path rather than a manual bracketed prefix. (sourcepoint.rs:355)

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: FAIL (Prettier issue in script_guard.test.ts)
  • format-docs: PASS
  • browser integration tests: PASS
  • integration tests: PASS
  • CodeQL: FAIL (likely infra)

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed commit a211eb0. Three blocking issues require fixes before merge; the rest are non-blocking.


Blocking (must fix before merge)

B1 — CI: format-typescript still failing on current HEAD

The latest commit (a211eb02) re-introduced a Prettier violation in crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts. CI's format-typescript check will fail on this commit. Fix with:

cd crates/js/lib && npm run format -- --write

then commit the result.

B2 — SSRF via cdn_origin (inline comment on line 96)

See inline comment. A syntactically valid but host-unrestricted cdn_origin allows proxying to any IP including 169.254.169.254 (cloud metadata). Needs a custom host validator.

B3 — PUT/PATCH unreachable (inline comment on line 337)

routes() only registers GET and POST; the PUT/PATCH arms in handle() are dead code. Either register the methods or remove the arms.


Important (non-blocking)

I1 — Single-quoted '/unified/' missed by SP_ORIGIN_UNIFIED_PATTERN (inline on line 81) — fix is a one-character change to the regex character class.

I2 — BackendConfig::from_url per request (inline on line 357) — pre-compute in new() following the pattern in aps.rs / prebid.rs.

I3 — PR checklist says "Uses tracing macros" but codebase uses log crate

The checklist item reads "Uses tracing macros (not println!)" but CLAUDE.md specifies log. The source correctly uses log::info! throughout. The checklist entry should read "Uses log macros (not println!)" to avoid confusion for future contributors.

I4 — JS-rewrite path drops all upstream headers (inline on line 403) — Response::new() discards Vary, CORS, and security headers. Start from response.clone_without_body() instead.

I5 — No test for build_target_url with a path-bearing cdn_origin (inline on line 337) — set_path silently drops any path prefix in cdn_origin. Should be tested and either fixed or documented as an explicit constraint.


Suggestions (non-blocking)

S1 — URL normalisation logic duplicated across Rust and TypeScript

parse_sourcepoint_url in sourcepoint.rs and normalizeSourcepointUrl in script_guard.ts implement identical protocol-relative URL normalisation in parallel. Not a defect, but a cross-reference comment in each file would prevent future fixes being applied to only one side.

S2 — Redundant rewrite_sdk guard in rewrite() (inline on line 437) — unreachable because handles_attribute already gates on it. Either remove with a comment or document the intentional belt-and-suspenders.

S3 — No test for single-quoted CDN URLs (inline on line 81)

S4 — Unchecked WASM build in PR checklist

The test plan has an unchecked - [ ] WASM build item, but CI's WASM build passes. The checkbox should be checked to accurately reflect build status.

S5 — register missing # Examples doc section (inline on line 314) — CLAUDE.md requires this for all public API functions.

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
ChristianPavilonis and others added 3 commits April 10, 2026 10:48
- Replace #[validate(url)] with a custom validator that restricts
  cdn_origin to *.privacy-mgmt.com hosts, preventing SSRF via
  arbitrary origins (e.g. cloud metadata endpoints).
- Remove unreachable PUT/PATCH arms from the request body match
  since routes() only registers GET and POST.
- Fix Prettier formatting in script_guard.test.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace take_body_str() with take_body_bytes() + String::from_utf8()
  to avoid panicking on non-UTF-8 upstream responses.
- Rewrite Location headers on 3xx redirects that point to
  cdn.privacy-mgmt.com so browsers stay on the first-party proxy.
- Preserve upstream CORS headers on the JS-rewrite path instead of
  discarding them when building a fresh Response.
- Extend SP_ORIGIN_UNIFIED_PATTERN to match both single- and
  double-quoted "/unified/" chunk paths, preserving the original
  quote character in the replacement.
- Normalise log prefixes from [sourcepoint] to Sourcepoint: for
  consistency with APS/Prebid style.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant rewrite_sdk check from rewrite() since
  handles_attribute() already gates on it; update test to verify
  the guard at the handles_attribute level.
- Add # Examples section to register() per documentation standards.
- Add tests for cdn_origin validation (rejects non-privacy-mgmt.com
  hosts, accepts valid origins).
- Add test for single-quoted origin+'/unified/' rewrite pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Review feedback addressed

All blocking and important findings have been fixed across three commits. Replies posted on each inline thread.

Blocking (fixed)

  • SSRF guard — Custom validate_cdn_origin() restricts host to *.privacy-mgmt.com
  • PUT/PATCH dead code — Removed unreachable arms
  • Prettier — Formatting fixed

Important non-blocking (fixed)

  • UTF-8 safetytake_body_bytes() + String::from_utf8() with graceful fallback
  • Redirect rewriting — 3xx Location headers containing CDN host are rewritten
  • CORS headers — JS-rewrite path now preserves upstream CORS headers
  • Single-quote regexSP_ORIGIN_UNIFIED_PATTERN handles both ' and " quotes
  • Log prefix — Normalised to Sourcepoint: style

Suggestions (fixed)

  • Removed redundant rewrite_sdk guard, added invariant comment
  • Added # Examples to register() docs
  • Added validation tests for cdn_origin
  • Added test for single-quoted unified path

Intentionally skipped (with rationale in thread replies)

  • Regex mismatched quotes — documented limitation, no real-world impact
  • BackendConfig::from_url per-request — matches APS/Prebid pattern
  • Body size limit — future enhancement
  • Config property name test — covered by existing head injector test

All CI checks pass locally: cargo fmt, clippy, cargo test, vitest, Prettier.

Refactor normalizeSourcepointUrl to remove the bare-domain startsWith
check that triggered CodeQL "Incomplete URL substring sanitization"
alerts. The host === exact match was already the security boundary;
now the normalization layer no longer references the CDN hostname at
all, eliminating the static analysis finding.

Add a Content-Length guard (5 MB) before reading upstream response
bodies into memory for JavaScript rewriting, preventing unbounded
memory consumption from unexpectedly large responses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aram356 aram356 linked an issue Apr 13, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-executed revision — every prior blocker is addressed and the four-layer rewriting architecture (HTML attrs / JS body / window._sp_ trap / client DOM guard) is cleanly separated. Two remaining correctness issues in the upstream response handling need fixes before merge: the body-size guard is bypassed when Content-Length is absent, and the UTF-8 fallback path silently drops upstream headers. A few non-blocking refinements around redirect rewriting, config scope, and test coverage follow.

Blocking

🔧 wrench

  • Body-size guard bypassed on chunked/unknown-length responsesMAX_REWRITE_BODY_SIZE only applies when Content-Length is present; otherwise take_body_bytes() reads unbounded (sourcepoint.rs:456-470)
  • UTF-8 fallback drops upstream headers — the fallback constructs a fresh Response with only status + Content-Type, discarding Set-Cookie, CORS, Cache-Control, Vary, etc., diverging from the passthrough path (sourcepoint.rs:480-488)

Non-blocking

♻️ refactor

  • Redirect Location rewrite is a literal String::replace — misses protocol-relative //cdn.privacy-mgmt.com/... values (sourcepoint.rs:430-438)
  • validate_cdn_origin does not check schemeftp://cdn.privacy-mgmt.com passes validation and only fails at the first proxied request (sourcepoint.rs:135-150)

🤔 thinking

  • Config/rewriter host scope mismatchvalidate_cdn_origin accepts *.privacy-mgmt.com but rewriters hardcode cdn.privacy-mgmt.com; configuring a different subdomain silently half-enables the integration (sourcepoint.rs:143, 69, 197, 598)
  • Rewritten-JS path hard-codes Cache-Control — diverges from the passthrough path's apply_cache_headers policy without explanation (sourcepoint.rs:498-501)
  • Property trap only catches reassignment of window._sp_ — nested mutation (window._sp_.config.X = ...) is not patched; also s.replace() in the JS helper is first-occurrence only (sourcepoint.rs:568-600)

🌱 seedling

  • copy_headers forwards Authorization — credential-leak footgun if publishers reuse bearer tokens across origins (sourcepoint.rs:222-233)
  • No direct test for handle() — redirect rewriting, UTF-8 fallback, Content-Length guard, Content-Type gating all lack unit coverage (sourcepoint.rs:372-522)

👍 praise

  • Solid response to the prior review — every prior 🔧/❓ addressed with well-chosen fixes; four-layer rewriting architecture is cleanly separated and well documented

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • CodeQL: PASS

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
…t rewrite

- Skip JS rewrite when Content-Length is absent (chunked/unknown) to
  prevent unbounded memory reads
- Reuse original response on UTF-8 fallback to preserve upstream headers
- Extract redirect Location rewriting into a URL-parsing helper that
  handles absolute, protocol-relative, and relative redirects
- Tighten cdn_origin validation to exactly cdn.privacy-mgmt.com and
  reject non-HTTP(S) schemes
- Drop Authorization from forwarded headers to prevent credential leakage
- Document Cache-Control divergence and property trap limitations
- Add 7 new tests for validation and redirect rewriting
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Round-3 blockers are all correctly addressed — UTF-8 fallback now reuses the original response (preserving headers), Content-Length-missing short-circuits the rewrite before take_body_bytes, rewrite_redirect_location handles absolute / protocol-relative / relative via Url::join, validate_cdn_origin checks scheme and pins the host, and Authorization is excluded from forwarded headers. No remaining 🔧/❓ findings. Submitting as change-request per author request; the items below are refinements (🤔/♻️/🌱/⛏) rather than correctness blockers — each can be resolved or deferred to a follow-up at the author's discretion.

Non-blocking

♻️ refactor

  • apply_cache_headers invoked at five return sites — redirect branch, CL-too-large branch, CL-missing branch, UTF-8-fallback branch, and the fallthrough (sourcepoint.rs:476, 505, 513, 528, 567). Restructuring handle() around a single exit point (or wrapping the proxied Response in a post-handle adapter) would remove the per-branch bookkeeping. Low priority.

🌱 seedling

  • No direct test for handle() (sourcepoint.rs:414-569) — redirect rewriting, UTF-8 fallback, CL-missing skip, CL-too-large skip, and non-JS passthrough branches all lack unit coverage. Acknowledged in the round-3 response; tracking here for follow-up.
  • CSP interaction not documented (docs/guide/integrations/sourcepoint.md) — publishers with Content-Security-Policy: script-src https://cdn.privacy-mgmt.com will see rewritten first-party scripts blocked the moment they enable the integration. A note under "HTML Rewriting" calling out that CSP must allow the first-party host would save an incident report.

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • CodeQL: PASS

Comment thread crates/js/lib/src/integrations/sourcepoint/index.ts
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts
@ChristianPavilonis ChristianPavilonis removed the request for review from prk-Jr April 21, 2026 13:59
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Implemented Sourcepoint follow-up updates and pushed commit.

What I changed:

  • Added __tsjs_sourcepoint runtime config bootstrap in Sourcepoint head inject and gated client-side guard installation with it.
  • Sourcepoint head injection now always emits config script; runtime trap is only included when rewrite_sdk is enabled.
  • Added strict cdn_origin validation to reject URL paths.
  • Preserved response headers on JS rewrite success path while normalizing encoding headers and content type.
  • Added regression coverage for the above behaviors.

I also resolved remaining open review threads and re-requested review from @prk-Jr.

Validation:

  • cargo test --workspace
  • cargo clippy --package trusted-server-core --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • cd crates/js/lib && npx vitest run
  • cd crates/js/lib && npm run -s format -- --check

Commit: e2fdb0c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants