Skip to content

Add private-network VM install modes (DNS-01, self-signed)#1106

Draft
RAVEENSR wants to merge 7 commits into
wso2:mainfrom
RAVEENSR:vm-private-install
Draft

Add private-network VM install modes (DNS-01, self-signed)#1106
RAVEENSR wants to merge 7 commits into
wso2:mainfrom
RAVEENSR:vm-private-install

Conversation

@RAVEENSR

@RAVEENSR RAVEENSR commented Jun 18, 2026

Copy link
Copy Markdown
Member

Purpose

Describe the problems, issues, or needs driving this feature/fix and include links to related issues in the following format: Resolves issue1, issue2, etc.

The VM installer has two entry points today — the simple installer (install-vm.sh, sslip.io + Let's Encrypt on :443) and the advanced installer (install-advanced.sh with letsencrypt | byoc | upstream). Both assume the VM is reachable from the public internet. Many companies will not expose the console/API publicly and want Agent Manager reachable only from their private network. The blocker is TLS: public letsencrypt (TLS-ALPN-01) needs the ACME CA to connect inbound to the VM on :443, which a private VM cannot accept.

Related to #1017 (VM standalone install).
Resolves #1081

Goals

Describe the solutions that this feature/fix will introduce to resolve the problems described above

Add two TLS modes to the advanced installer so a VM with private ingress and internet egress can run Agent Manager without ever being publicly reachable:

  • letsencrypt-dns (recommended) — public-trusted certificates via the ACME DNS-01 challenge. The CA validates a DNS TXT record and never connects to the VM, so it works behind a firewall. A records may point at the VM's private IP (split-horizon).
  • selfsigned — fully offline: a generated local CA + leaf for teams with neither a public DNS zone nor an internal CA.

byoc (internal-CA cert) and upstream (internal LB) already cover the remaining private variants and are unchanged.

Approach

Describe how you are implementing the solutions.

Both new modes only produce a cert/key on disk and then reuse the existing byoc serving path (Caddyfile rendering, container cert mounts, and validate_cert's SAN/wildcard coverage checks) — so lib-vm.sh is untouched and the serving logic is shared.

  • New deployments/vm/lib-tls.sh: pure helpers (tls_san_list, build_lego_args, render_renewal_units) plus side-effecting issuance (issue_dns01_cert via the dockerized goacme/lego image, generate_selfsigned_ca via openssl, install_renewal_timer).
  • letsencrypt-dns issues a cert covering every service host + the *.<AGENTS_BASE> wildcard, copies it to /opt/amp/certs/, and installs a daily systemd timer (lego renew + caddy reload).
  • selfsigned generates a local CA + leaf and writes the CA to /opt/amp/certs/ca.crt for distribution to client trust stores.
  • validate_config learns the two modes (letsencrypt-dns requires DNS_PROVIDER + ACME_EMAIL; selfsigned requires nothing extra), the --init template documents them, and --dry-run previews the issuance/generation plan without doing it.

User stories

Summary of user stories addressed by this change

As a platform operator at a security-conscious company, I can install Agent Manager on a private-network-only VM with trustworthy TLS — either public-trusted certificates via DNS-01 (no public ingress) or a generated local CA when fully offline.

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

The advanced VM installer now supports private-network-only deployments via two new TLS modes: letsencrypt-dns (public-trusted certificates through the ACME DNS-01 challenge, with automatic renewal) and selfsigned (a generated local CA for fully offline installs).

Documentation

Link(s) to product documentation that addresses the changes of this PR.

Updated documentation/docs/getting-started/on-a-vm.mdx: new "Private network (no public exposure)" section, the two new modes added to the TLS-modes and config-key tables, DNS split-horizon guidance for DNS-01, and troubleshooting entries.

Training

Link to the PR for changes to the training content, if applicable

N/A — no training content impact.

Certification

N/A — no impact on certification exams.

Marketing

N/A — no marketing content for this change.

Automation tests

  • Unit tests

    Code coverage information

Extended deployments/vm/tests/run.sh (SAN list, lego arg builder, renewal-unit rendering, validate_config for both modes, --init template, and --dry-run for both modes) and deployments/vm/tests/preflight.sh (local-CA generation produces a cert whose SANs cover every host + the agent wildcard, verified via validate_cert + openssl). Both suites pass; the changed scripts are shellcheck-clean.

  • Integration tests

    Details about the test cases and coverage

The suites are hermetic (no Docker/cluster). A live DNS-01 issuance against a real zone and a selfsigned browser-trust check on a GCP VM are planned as an out-of-band smoke test, mirroring the earlier VM-install smoke tests (this PR is a draft pending that).

Security checks

  • Followed secure coding standards in http://wso2.com/technical-reports/wso2-secure-engineering-guidelines? yes
  • Ran FindSecurityBugs plugin and verified report? N/A — shell scripts only, no Java.
  • Confirmed that this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets? yes — DNS-provider credentials are supplied by the operator in amp-config.env at install time and the renewal config copy is written 0600; none are committed.

Samples

Provide high-level details about the samples related to this feature

N/A — no new samples. The docs include a DNS-01 (AWS Route 53) walkthrough.

Related PRs

List any other related PRs

Builds on the VM standalone (#1017) and advanced-install work in deployments/vm/.

Migrations (if applicable)

N/A — additive feature; no migration required. Switching TLS_MODE on an existing install only re-renders Caddy and re-issues/regenerates the certificate.

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Unit + preflight suites run locally on macOS (bash 3.2). The installer targets Linux VMs (Ubuntu/Debian, bash 4+); the real-VM smoke test is pending (see Automation tests).

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

Key insight: the ACME DNS-01 challenge validates a DNS TXT record rather than connecting to the host, so it can issue public-trusted (and wildcard) certificates for a server with no public ingress. Used lego (single-binary ACME client, ~150 DNS providers) to keep the stock caddy:2 image and reuse the existing byoc serving path rather than building a custom Caddy image with DNS plugins.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4aa55307-a357-4e2e-b534-38162e7e3ddf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

The advanced VM installer assumed a publicly reachable VM: public
letsencrypt issues via TLS-ALPN-01, which needs the ACME CA to connect
inbound on :443. Companies that keep Agent Manager on a private network
could not use it.

Add two TLS modes to install-advanced.sh for a private VM with internet
egress but no public ingress:

- letsencrypt-dns: public-trusted certs via the ACME DNS-01 challenge.
  The CA validates a DNS TXT record and never connects to the VM, so a
  firewalled VM still gets trusted certs (split-horizon A records are
  fine). Issued by a dockerized lego; a daily systemd timer renews it.
- selfsigned: a generated local CA + leaf for fully offline installs
  with no public zone and no internal CA; the CA is emitted for client
  trust-store distribution.

Both modes only produce a cert and reuse the existing byoc serving path
(Caddyfile, cert mounts, validate_cert), so lib-vm.sh is unchanged. New
cert production lives in lib-tls.sh. validate_config learns both modes,
the --init template and on-a-vm.mdx document them (with a DNS-01 flow
diagram), and --dry-run previews issuance without doing it.

Related to wso2#1017
@RAVEENSR RAVEENSR force-pushed the vm-private-install branch from 10b9261 to 11b76b9 Compare June 18, 2026 09:19
RAVEENSR added 6 commits June 18, 2026 16:09
The Advanced tab interleaved public and private guidance and its label
("custom domain / BYOC / load balancer") no longer fit once private
modes were added, making it hard to follow one coherent path.

Rename the tab to "Advanced" and split its body with nested
Public network | Private network tabs (distinct groupId) that hold the
network-specific bits: which TLS modes apply, the DNS records, and a
copy-paste example config per mode. Shared scaffolding (prerequisites,
configure, TLS-modes reference, install, teardown) stays common, so
there is a single source of truth.

The standalone DNS and Private-network sections are dissolved into the
tabs, the flow diagram moves into the Private tab, the TLS-modes table
gains a Network column, and orphaned #adv-dns / #adv-private anchors
are repointed. Docs site builds clean.
In advisory modes (byoc, upstream, selfsigned, letsencrypt-dns)
validate_dns returns 0 even when some hostnames did not resolve, so
preflight_dns logged "all hostnames resolve to this VM" right after
listing the failures. This is constant for the private modes, whose
internal/private domains rarely resolve at dry-run time.

Gate the success message on an empty DNS_ERRORS, and otherwise tell the
operator to point their private DNS or client hosts entries at the VM.
Reword the advisory note so it fits all advisory modes, not just an LB.
Configure is trimmed to just the template-generation step. Each
Public and Private network tab now carries its own TLS modes table,
a complete config-key table (common plus network-specific keys),
BYOC certificate requirements, DNS records, and example configs
(including a new byoc example per tab).

The shared mixed config-key table and the standalone TLS modes
section (with BYOC requirements and upstream topology) were removed;
their content was redistributed into the tabs. Links to the old
#adv-tls-modes and #adv-byoc-san anchors are repointed to the
in-tab subsections.
goacme/lego:latest now resolves to v5.x, whose CLI dropped the global
flags build_lego_args emits (e.g. --accept-tos), so letsencrypt-dns
issuance failed immediately with "flag provided but not defined". Pin to
goacme/lego:v4.35.2 (override via the LEGO_IMAGE env var) in both the
initial issuance and the renewal-timer unit, and add a unit-test guard
so the image can't regress to :latest.
A clean single-user advanced install on a fresh VM exhausts the
default fs.inotify.max_user_instances ceiling once k3d is up:
systemd logs "Failed to allocate directory watch: Too many open
files" while enabling the cert-renewal timer, and k3d configmap/
secret watches can silently go stale.

Add ensure_inotify_limits to the shared bootstrap phase (both the
standalone and advanced installers) to raise instances/watches to
k3d-friendly floors and persist them under /etc/sysctl.d. Only keys
below their floor are touched, so a host already tuned higher is
never lowered. The decision is factored into a pure inotify_bump_
target helper with unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide option to run agent manager in a VM without exposing console like components to the public

1 participant