You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add OpenTelemetry metrics for public-key lookup and remote JSON-LD document fetches. These metrics should make it easier to see whether remote fetches, cache misses, or failed key resolution are slowing down federation.
Current state
Fedify already creates spans such as activitypub.fetch_key, activitypub.fetch_document, and activitypub.lookup_object. Signature verification also records some key-fetch failure details on spans and span events.
Those traces help explain one request. Operators still need aggregate metrics for questions such as:
How often does Fedify fetch remote public keys?
How often are key lookups served from cache?
Which remote hosts are causing slow document fetches or repeated failures?
Are context-document fetches or actor-document fetches a source of latency?
Proposed solution
Once #619 adds metrics support, add counters and histograms around key lookup and remote document fetch paths.
Proposed instruments:
activitypub.key.lookup: counter, incremented for public-key lookup attempts.
activitypub.key.lookup.duration: histogram, recording key lookup duration in milliseconds.
activitypub.document.fetch: counter, incremented for remote JSON-LD document fetch attempts.
activitypub.document.fetch.duration: histogram, recording fetch duration in milliseconds.
activitypub.document.cache: counter, incremented for cache hits and misses where Fedify can observe them.
Proposed attributes:
activitypub.lookup.kind: public_key, actor, object, context, or other.
activitypub.lookup.result: hit, miss, fetched, not_found, invalid, network_error, or error.
activitypub.remote.host: hostname only, never the full URL.
http.response.status_code, when a remote HTTP response exists.
activitypub.cache.enabled: whether the lookup path used a cache-backed loader.
Do not include key IDs, actor IDs, object IDs, full URLs, handles, or JSON-LD context URLs as metric attributes. Full identifiers can stay on spans.
Scope
Instrument public-key lookup and key-fetch paths used by HTTP Signatures and related federation checks.
Instrument remote document fetches done by Fedify document loaders.
Record cache hit/miss metrics where the cache layer can report them without changing KvStore semantics.
Keep user-supplied custom document loaders in scope only when Fedify wraps or calls them directly.
Update docs/manual/opentelemetry.md with metric names, units, and cardinality guidance.
Acceptance criteria
Key lookup attempts and durations are recorded for success and failure paths.
Remote document fetch attempts and durations are recorded with host-only remote attributes.
Cache hit/miss metrics are emitted where Fedify can observe cache behavior.
Metrics do not include full URLs, key IDs, actor IDs, or object IDs.
Tests cover at least one key lookup cache hit, one remote key fetch, and one failed document fetch.
Summary
Add OpenTelemetry metrics for public-key lookup and remote JSON-LD document fetches. These metrics should make it easier to see whether remote fetches, cache misses, or failed key resolution are slowing down federation.
Current state
Fedify already creates spans such as
activitypub.fetch_key,activitypub.fetch_document, andactivitypub.lookup_object. Signature verification also records some key-fetch failure details on spans and span events.Those traces help explain one request. Operators still need aggregate metrics for questions such as:
Proposed solution
Once #619 adds metrics support, add counters and histograms around key lookup and remote document fetch paths.
Proposed instruments:
activitypub.key.lookup: counter, incremented for public-key lookup attempts.activitypub.key.lookup.duration: histogram, recording key lookup duration in milliseconds.activitypub.document.fetch: counter, incremented for remote JSON-LD document fetch attempts.activitypub.document.fetch.duration: histogram, recording fetch duration in milliseconds.activitypub.document.cache: counter, incremented for cache hits and misses where Fedify can observe them.Proposed attributes:
activitypub.lookup.kind:public_key,actor,object,context, orother.activitypub.lookup.result:hit,miss,fetched,not_found,invalid,network_error, orerror.activitypub.remote.host: hostname only, never the full URL.http.response.status_code, when a remote HTTP response exists.activitypub.cache.enabled: whether the lookup path used a cache-backed loader.Do not include key IDs, actor IDs, object IDs, full URLs, handles, or JSON-LD context URLs as metric attributes. Full identifiers can stay on spans.
Scope
KvStoresemantics.docs/manual/opentelemetry.mdwith metric names, units, and cardinality guidance.Acceptance criteria
Open questions
kvCache()wrapper?