…dening the internal mTLS trust anchor
Background: DisableClientAuthEkuValidation was recently added to
KurrentDB because publicly-trusted CAs are increasingly no longer
issuing certificates with the clientAuth EKU. A consequence is that
mTLS within a cluster using only public-CA certificates is no longer
possible without that flag — and the flag is a short-term mitigation,
not a long-term solution (it weakens RFC 5280 EKU enforcement). A
private PKI is therefore required to run mTLS in the cluster at all.
Given that, widening TrustedRootCertificatesPath to include the system
root store (e.g. /etc/ssl/certs) for the sake of mTLS is strictly
worse than doing nothing — it collapses the cluster's trust boundary
onto every public CA the OS happens to trust, with only the
CertificateReservedNodeCommonName string check standing between the
cluster and any certificate issued by any public CA.
Operators historically reached for a public CA in mTLS for two reasons:
1. To avoid running a private PKI and rotating its certs themselves.
2. To avoid distributing a custom CA to every gRPC client / browser
that talks to the node.
Reason #1 is moot today — a private PKI is required either way once
public CAs stop issuing clientAuth EKU certs. Reason #2 is still real;
terminating TLS at a load balancer / reverse proxy could solve it,
but is not straightforward with gRPC.
The primary recommendation is therefore to use a private PKI for
cluster mTLS. For deployments that also need to solve reason #2, this
change lets them configure a separate, publicly-trusted certificate
served only on matching SNI — without widening the cluster's mTLS
trust anchor.
The node serves this second, publicly-trusted certificate on TLS
connections whose ClientHello SNI matches one of its Subject
Alternative Names, while the node's own internal-CA certificate
continues to be served on every other connection (including all
internal node-to-node HTTPS gossip). The publicly-trusted
certificate's issuing CA never enters TrustedRootCerts, never
participates in InternalClientCertificateValidator, and cannot
authenticate anything as a node. Client certificate handling is
identical on both paths — requested but not required, validated only
against the internal CA — so user-certificate authentication
continues to work for external gRPC clients connecting via the
publicly-trusted hostname.
Result: internal PKI stays strict (operator's private CA only), and
external gRPC clients (and browsers) connecting to the public
hostname see a publicly-trusted certificate — no need to distribute
the internal CA to every client application or browser.
Adds four new CertificateFile options (PubliclyTrustedCertificateFile,
PubliclyTrustedCertificatePrivateKeyFile, PubliclyTrustedCertificatePassword,
PubliclyTrustedCertificatePrivateKeyPassword) and four new
CertificateStore options (PubliclyTrustedCertificateStoreLocation,
PubliclyTrustedCertificateStoreName, PubliclyTrustedCertificateSubjectName,
PubliclyTrustedCertificateThumbprint). Nothing fires unless one of
these is set — the feature is purely additive.
At startup the publicly-trusted certificate's DNS names are logged,
and a warning is emitted if any of its SANs overlap with the node
certificate's DNS SANs — that overlap would cause internal
node-to-node HTTPS traffic using such a name for SNI to receive the
publicly-trusted certificate and fail internal mTLS.
A separate warning is emitted if TrustedRootCertificatesPath points at
a well-known OS system trust store directory (e.g. /etc/ssl/certs,
/etc/pki/ca-trust/extracted/pem) or if TrustedRootCertificateStoreName
is 'Root' / 'AuthRoot' — both are indicators of the exact
misconfiguration this feature is meant to make unnecessary. The
warning recommends using the PubliclyTrustedCertificate* options
instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background
DisableClientAuthEkuValidationwas recently added because publicly-trusted CAs are increasingly no longer issuing certs with theclientAuthEKU. A consequence: mTLS within a cluster using only public-CA certs is no longer possible without that flag, and the flag is a short-term mitigation (it weakens RFC 5280 EKU enforcement), not a long-term solution. A private PKI is therefore required to run mTLS in the cluster at all.Given that, widening
TrustedRootCertificatesPathto include the system root store (e.g./etc/ssl/certs) for the sake of mTLS is strictly worse than doing nothing — it collapses the cluster's trust boundary onto every public CA the OS happens to trust, with only theCertificateReservedNodeCommonNamestring check standing between the cluster and any cert issued by any public CA.Operators historically reached for a public CA in mTLS for two reasons:
Reason #1 is moot today (private PKI is required either way). Reason #2 is still real; TLS termination at a load balancer / reverse proxy could solve it but isn't straightforward with gRPC.
This PR
The primary recommendation is to use a private PKI for cluster mTLS. For deployments that also need to solve reason #2, this PR lets them configure a separate, publicly-trusted certificate served only on matching SNI — without widening the cluster's mTLS trust anchor.
The node serves this second cert on TLS connections whose ClientHello SNI matches one of its SANs; the internal-CA node certificate continues to be served on every other connection (including internal node-to-node gossip). The publicly-trusted cert's CA never enters
TrustedRootCertsand cannot authenticate anything as a node.Client cert handling is unchanged — requested but not required, validated against the internal CA — so internal-CA-issued user certs still authenticate via the publicly-trusted hostname.
New config
CertificateFile.PubliclyTrustedCertificate{File,PrivateKeyFile,Password,PrivateKeyPassword}CertificateStore.PubliclyTrustedCertificate{StoreLocation,StoreName,SubjectName,Thumbprint}Purely additive.
Startup diagnostics
TrustedRootCertificatesPathpoints at a well-known OS trust store (/etc/ssl/certs,/etc/pki/ca-trust/extracted/pem, etc.) orTrustedRootCertificateStoreNameisRoot/AuthRoot— the misconfiguration this feature is meant to make unnecessary.Test plan