Skip to content

Fix/IStartup Capabilites/IPv6, ip version flag, operational metrics#2

Merged
Bierchermuesli merged 5 commits intomainfrom
fix/ipv6-resolution-and-ip-version-flag
Apr 2, 2026
Merged

Fix/IStartup Capabilites/IPv6, ip version flag, operational metrics#2
Bierchermuesli merged 5 commits intomainfrom
fix/ipv6-resolution-and-ip-version-flag

Conversation

@Bierchermuesli
Copy link
Copy Markdown
Contributor

@Bierchermuesli Bierchermuesli commented Mar 15, 2026

Problems

  1. ip_protocol defaulted to hardcoded ip4 — literal IPv6 addresses like 2600:: always failed with address 2600::: no suitable address found - or if FQDN only resolves to AAAA
  2. No awareness of system capabilities — auto mode would try IPv6 for hostnames even on IPv4-only systems, causing probes to fail
  3. Every probe wasted a socket open attempt trying udp4 before falling back to ip4:icmp (or vice versa), with misleading log output
  4. No startup validation — a misconfigured system would silently fail on every probe

Changes

Auto IP family detection (--ping.default-ip-protocol, default auto)

  • Literal IPv4/IPv6 addresses are detected by family — no ip_protocol param needed
  • For hostnames, AAAA is tried first (IPv6-first, RFC 8305 / Happy Eyeballs intent), but only if the system can actually open an IPv6 ICMP socket
  • New --ping.default-ip-protocol flag (ip4/ip6/auto) sets the system-wide default; overridable per probe via ip_protocol URL parameter
  • Adds some global metrics

Startup capability detection (DetectCapabilities)

  • Probes which socket types are available once at startup: raw (ip4:icmp / ip6:ipv6-icmp) is tried first, UDP (udp4/udp6) as fallback. however - again: udp should be preffered and NET_RAW is not needed
  • Probes for IPv6 capabilites, so AAAA/IPv6 pings fail faster
  • performPing uses the detected type directly — no per-probe try-and-fail
  • Startup log shows clear socket type labels: raw/CAP_NET_RAW, udp/unprivileged,
    or unavailable — not ambiguous booleans
  • Process exits immediately with a descriptive error if neither IPv4 nor IPv6 ICMP is available, rather than letting every probe fail

Startup says now either:

  • level=ERROR source=main.go:120 msg="No ICMP socket available — cannot ping anything" ipv4=unavailable ipv6=unavailable hint="add CAP_NET_RAW or set net.ipv4.ping_group_range" (exit)
  • level=INFO source=main.go:124 msg="ICMP sockets ready" ipv4=udp/unprivileged ipv6=udp/unprivileged
  • INFO source=main.go:124 msg="ICMP sockets ready" ipv4=udp/unprivileged ipv6=udp/unprivileged
  • INFO source=main.go:153 msg="ICMP sockets ready" ipv4=raw/CAP_NET_RAW ipv6=unavailable (IPv4 only probes)

new in /metrics

  ping_exporter_probes_total{result="success",ip_version="4"} 30
  ping_exporter_probes_total{result="success",ip_version="6"} 12
  ping_exporter_probes_total{result="failure",ip_version="4"} 2

  ping_exporter_packets_sent_total{ip_version="4"} 900
  ping_exporter_packets_sent_total{ip_version="6"} 360

  ping_exporter_packets_received_total{ip_version="4"} 897
  ping_exporter_packets_received_total{ip_version="6"} 358

  ping_exporter_probe_duration_seconds_bucket{ip_version="4",le="0.5"} 28
  ping_exporter_probe_duration_seconds_bucket{ip_version="6",le="0.5"} 11

Note

This branch includes commits from #1 fix/icmp-socket-privileged-detection — that PR should be merged first.

…for raw ICMP sockets

The privileged variable was derived from conn == nil, which was only true
for the dontFragment/v4RawConn path. When udp4 failed and fell back to
ip4:icmp (or ip6:ipv6-icmp succeeded), the flag was incorrectly false,
causing two bugs:

1. dst was wrapped as *net.UDPAddr ("8.8.8.8:0") but ReadFrom on a raw
   ICMP socket returns *net.IPAddr ("8.8.8.8"), so the peer comparison
   always failed and every reply was silently dropped, causing timeout.
2. The ICMP echo ID check was skipped for raw sockets, allowing stray
   packets to match.

Replace with useUDP bool set only when udp4/udp6 socket is actually used.
Fixes ping in Docker containers with NET_RAW where udp4 is unavailable
and the ip4:icmp fallback is used. IPv6 fallback from ip6:ipv6-icmp to
udp6 is also correctly tracked.
Move per-packet and socket-selection logs to DEBUG level. Add a single
INFO summary line per probe (sent/received/loss/avg_rtt). Improve error
messages to surface both udp4 and raw socket failures when neither works.

Update README permissions section to document the two socket modes
(unprivileged udp4/udp6 vs raw ip4:icmp/ip6:ipv6-icmp), clarify that
NET_RAW is not required in Docker when the host ping_group_range sysctl
is permissive, and remove the duplicate/incorrect "NET_RAW is required"
note from the Docker usage example.
Default ip_protocol changes from ip4 to auto. In auto mode, literal IPv4/IPv6
addresses are detected by family so e.g. 2600:: no longer fails with "no
suitable address found". For hostnames, AAAA is tried first (IPv6-first,
RFC 8305 / Happy Eyeballs intent) but only if the system can actually open an
IPv6 ICMP socket — detected once at startup via DetectCapabilities(). This
prevents dual-stack hostname probes from failing on IPv4-only systems.

Add --ping.default-ip-protocol flag (ip4/ip6/auto) to set the system-wide
default, overridable per probe via the ip_protocol URL parameter.

Update README with IP protocol auto-detection behaviour, updated parameter
defaults, and new --ping.default-ip-protocol flag documentation.
… fail fast if none

Extend Capabilities with IPv4UDP/IPv6UDP flags so performPing goes straight
to the correct socket type instead of try-and-fail on every probe.

Detection order: raw (ip4:icmp / ip6:ipv6-icmp) is tried first — if
CAP_NET_RAW or root is present it wins. UDP (udp4/udp6) is the fallback
for truly unprivileged environments where ping_group_range allows it.
This ensures the startup log accurately reflects the socket type in use
(e.g. sudo shows raw/CAP_NET_RAW, not udp/unprivileged).

Logging moved from DetectCapabilities into main so capabilities and the
failure path are reported in one place with clear socket type labels
(raw/CAP_NET_RAW, udp/unprivileged, unavailable) instead of ambiguous
boolean flags.

Fail at startup with a descriptive error if neither IPv4 nor IPv6 ICMP
is available, rather than letting every probe fail silently.
Add exporter-level metrics to /metrics:
- ping_exporter_capability_info{ipv4, ipv6} — socket type in use at startup
- ping_exporter_probes_total{result, ip_version} — probe success/failure counts
- ping_exporter_packets_sent_total{ip_version} — aggregate packets sent
- ping_exporter_packets_received_total{ip_version} — aggregate packets received
- ping_exporter_probe_duration_seconds{ip_version} — probe duration histogram

All metrics are split by ip_version (4/6) derived from probe_ping_ip_version.

Fix IPv6 capability detection: opening ip6:ipv6-icmp on :: succeeds even
when no IPv6 address is configured (the socket opens but cannot route).
Add hasRoutableAddress() which checks for a non-loopback, non-link-local
address on an up interface before attempting to open the ICMP socket.
Containers without IPv6 now correctly report ipv6=unavailable.
@Bierchermuesli Bierchermuesli changed the title Fix/Startup Capabilites and ip version flag Fix/IPv6/Startup Capabilites, ip version flag global metrics Mar 15, 2026
@Bierchermuesli Bierchermuesli changed the title Fix/IPv6/Startup Capabilites, ip version flag global metrics Fix/IStartup Capabilites/IPv6, ip version flag, operational metrics Mar 15, 2026
Copy link
Copy Markdown
Member

@hwuethrich hwuethrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! LGTM 🌈

@Bierchermuesli Bierchermuesli merged commit d41bb37 into main Apr 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants