🐎 Laocoön — Supply Chain Guard

"Timeo Danaos et dona ferentes" — Laocoön, on accepting gifts from strangers.

A GitHub Action that inspects dependency lockfile changes in a pull request and uses an LLM to flag likely supply chain attacks — specifically freshly-injected trojans hiding inside an otherwise-trusted package upgrade.

It is ecosystem-pluggable and ships with three:

Ecosystem	Lockfiles	Registry	Diffs
Elixir / Hex	`mix.lock`	hex.pm	published `.tar` (inner `contents.tar.gz`)
JavaScript / npm	`package-lock.json`, `npm-shrinkwrap.json`, `yarn.lock`, `pnpm-lock.yaml`	registry.npmjs.org	published `.tgz` (strips `package/`)
Python / PyPI	`poetry.lock`, `uv.lock`, `Pipfile.lock`, pinned `requirements.txt`	pypi.org	published sdist `.tar.gz`

Each ecosystem surfaces its own prime attack surface to the model — Hex/release hooks, npm lifecycle scripts (postinstall, …), Python setup.py / build hooks.

The core idea: soaked-baseline diffing

The biggest real-world supply chain risk isn't a long-standing malicious package — those get caught by the ecosystem over time. It's a freshly published trojan release of a trusted package (maintainer account takeover, compromised CI) that you pick up early, before anyone notices.

So instead of diffing the version you had against the version you're getting, Laocoön diffs against a soaked baseline:

the newest release on the same version lineage that is older than the soak window (default 60 days).

A version that's been public for 60+ days has had time for the ecosystem to catch a compromise, so it's treated as presumed-clean. Everything in the diff from that baseline to the adopted version is novel, un-vetted surface — exactly where an injected payload would live.

Lineage-aware: adopting 1.1.2 diffs against the newest soaked 1.1.x (e.g. 1.1.0), not 1.2.x, so you see the real intended changes on the line you're tracking, not unrelated churn. Fallback ladder: same major.minor → same major → any → (none → flagged in the comment).

How it works

Detect which known lockfiles changed in the PR (mix.lock, …), diffing PR base → head (the cumulative net change).
For each changed/added dependency: select the soaked baseline, download and unpack the published artifacts (the actual tarballs — not the git repo, which can differ), and diff the file trees.
Gather registry signals: release age, downloads, owners, publisher (account-takeover check), repository link, retirement/yank.
Binaries & minified blobs are never sent to the LLM — they can't be statically reviewed. A binary that's new or changed vs the soaked baseline is flagged as elevated risk rather than silently passed as clean.
Two-stage LLM cascade: a cheap model triages every PR; a stronger model re-reviews only when triage flags risk, install/build hooks changed, a binary changed, or no soaked baseline exists. Built on the Vercel AI SDK with schema-validated structured output, so the provider is swappable — see Models & providers.
Post / update one PR comment + job summary; fail the check when risk ≥ fail-on.

Idempotency

The action fingerprints the net lockfile diff and stores it in the PR comment. Pushes that don't change the net dependency set reuse the existing analysis instead of re-calling the LLM. Pair with concurrency: cancel-in-progress (see the example workflow) so rapid pushes cancel superseded runs.

Usage

# .github/workflows/supply-chain-guard.yml
name: Laocoön Supply Chain Guard
on:
  pull_request:
    paths: ["**/mix.lock"] # add package-lock.json, poetry.lock, … as needed
permissions:
  contents: read
  pull-requests: write
concurrency:
  group: laocoon-${{ github.event.pull_request.number }}
  cancel-in-progress: true
jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 } # need base + head
      - uses: u2i/laocoon@v1
        with:
          gemini-api-key: ${{ secrets.GEMINI_API_KEY }}

Secrets & setup

The only secret you must provide is an LLM provider API key — and only for the provider(s) your triage-model/deep-model specs use (default config needs just GEMINI_API_KEY). GITHUB_TOKEN is provided automatically by Actions; you do not create it.

Secret	Needed when	Get it from
`GEMINI_API_KEY`	`google:` models (default)	https://aistudio.google.com/apikey
`ANTHROPIC_API_KEY`	`anthropic:` models	https://console.anthropic.com/
`OPENAI_API_KEY`	`openai:` models	https://platform.openai.com/api-keys
key for `llm-api-key`	`compatible:` models	your gateway (OpenRouter, Together, …)

Set it on one repo:

gh secret set GEMINI_API_KEY --repo u2i/your-repo
# paste the key when prompted (avoid --body, which lands in shell history)

Or once at the org level, scoped to selected repos (recommended when several repos run Laocoön):

gh secret set GEMINI_API_KEY --org u2i --visibility selected --repos "repo-a,repo-b"

UI equivalent: Settings → Secrets and variables → Actions → New repository (or organization) secret. The workflow then references it as ${{ secrets.GEMINI_API_KEY }} (see the example above).

A missing key fails loudly: when the model resolves, the run errors with e.g. Provider "google" requires the GEMINI_API_KEY environment variable.

⚠️ Pull requests from forks

On the standard pull_request trigger, GitHub withholds secrets from fork PRs, so the API key is empty and the run fails closed (the key never leaks to untrusted code). This means external-contributor PRs are not scanned — which is the safe default.

Do not switch to pull_request_target to work around this: it runs with secrets and write access in the base-repo context while checking out untrusted PR code, a well-known secret/token-exfiltration vector. For a key-spending security tool, failing closed on forks is correct. If you only get same-org PRs (no forks), this never comes up.

Models & providers

Models are selected with a provider:model string, so triage and deep can use different tiers — or even different providers. Built on the Vercel AI SDK; supply only the API key(s) your specs use.

Provider prefix	Backend	Key input / env
`google:` (default)	Google Gemini	`gemini-api-key` / `GEMINI_API_KEY`
`anthropic:`	Anthropic Claude	`anthropic-api-key` / `ANTHROPIC_API_KEY`
`openai:`	OpenAI	`openai-api-key` / `OPENAI_API_KEY`
`compatible:`	Any OpenAI-compatible endpoint (OpenRouter, Together, Groq, Vercel AI Gateway, Ollama, …)	`llm-api-key` + `llm-base-url`

A bare model id with no prefix defaults to google:. An unrecognized prefix errors (a typo'd provider fails loudly rather than silently becoming a Gemini model).

# Mix providers: cheap Gemini triage, Claude for the deep look.
with:
  gemini-api-key: ${{ secrets.GEMINI_API_KEY }}
  anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
  triage-model: google:gemini-3.1-flash-lite
  deep-model: anthropic:claude-sonnet-4-6

# Or route everything through one OpenAI-compatible gateway:
with:
  llm-api-key: ${{ secrets.OPENROUTER_API_KEY }}
  llm-base-url: https://openrouter.ai/api/v1
  triage-model: compatible:google/gemini-2.5-flash-lite
  deep-model: compatible:anthropic/claude-sonnet-4

Inputs

Input	Default	Description
`gemini-api-key`	— (required)	Google Gemini API key.
`github-token`	`${{ github.token }}`	Used to read the PR diff and post the comment.
`triage-model`	`gemini-3.1-flash-lite`	Cheap model run on every PR.
`deep-model`	`gemini-3.5-flash`	Stronger model used only on escalation.
`soak-days`	`60`	A release older than this is a presumed-clean baseline.
`ecosystems`	(auto)	Restrict to `hex,npm,pypi`. Empty = auto-detect.
`lockfile`	(auto)	Restrict to a single lockfile path.
`base-ref`	(PR base)	Ref to diff against.
`fail-on`	`high`	`critical`/`high`/`medium`/`low`/`none`.
`comment`	`true`	Post/update a PR comment.
`max-diff-bytes`	`60000`	Byte cap on the artifact diff sent to the LLM per ecosystem.

Outputs

Output	Description
`risk-level`	Highest risk (`none`…`critical`, or `skipped`).
`findings-json`	Structured findings as JSON.

Cost

Cost ≈ diff bytes × per-token price. The triage model runs on every analyzed PR (fractions of a cent); the deep model fires only on the small fraction of PRs that warrant it. The fingerprint-skip means you only pay when the net dependency set actually changes.

Adding an ecosystem

Drop a module in src/ecosystems/ exporting id, displayName, lockfiles, parse(contents, filename), isRegistryBacked, packageKey, fetchContext, getReleases, fetchArtifact, and register it in src/ecosystems/index.mjs. Use the shared baseContext/computeCadence from registry-context.mjs so the LLM payload looks identical across registries. The core (diff, soak selection, artifact diff, LLM cascade, GitHub, reporting) is entirely ecosystem-agnostic.

Known per-ecosystem limitations

PyPI: the JSON API exposes no maintainer accounts or download counts, so those signals are absent (author name is a weak proxy). Wheel-only releases (no sdist) can't be source-diffed and are flagged.
npm: yarn.lock/pnpm-lock.yaml parsing is tolerant but not a full grammar; exotic entries may be skipped (logged, not silently dropped).
requirements.txt: only fully-pinned (==) lines are analyzable; ranges/unpinned lines have no exact version to soak-diff.

Limitations

Heuristic, LLM-based review — not a guarantee. Treat it as a high-signal reviewer, not a gate of last resort.
Binary / precompiled / minified files cannot be statically reviewed; they're flagged, not read. Verify their provenance yourself.
Artifact diffs are byte-capped; dropped files are logged.

Development

npm install   # dev/build deps (AI SDK, zod, esbuild)
npm test      # unit tests (node --test)
npm run build # bundle src/ + deps -> dist/index.mjs

The parser, soak selection, artifact diff, tar unpacker, and provider resolution are unit-tested; the artifact fetch/unpack/diff path is verified against live hex.pm, npm, and PyPI.

Why a bundle?

A composite GitHub Action has no npm install step at runtime, so the AI SDK dependency tree is pre-bundled into a single committed dist/index.mjs (via esbuild). Edit src/, then run npm run build and commit dist/ before tagging a release. node_modules/ is git-ignored; dist/ is committed.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
dist		dist
examples		examples
src		src
test		test
.gitignore		.gitignore
README.md		README.md
action.yml		action.yml
build.mjs		build.mjs
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐎 Laocoön — Supply Chain Guard

The core idea: soaked-baseline diffing

How it works

Idempotency

Usage

Secrets & setup

⚠️ Pull requests from forks

Models & providers

Inputs

Outputs

Cost

Adding an ecosystem

Known per-ecosystem limitations

Limitations

Development

Why a bundle?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐎 Laocoön — Supply Chain Guard

The core idea: soaked-baseline diffing

How it works

Idempotency

Usage

Secrets & setup

⚠️ Pull requests from forks

Models & providers

Inputs

Outputs

Cost

Adding an ecosystem

Known per-ecosystem limitations

Limitations

Development

Why a bundle?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages