Add finish method to synchronous instruments#4702
Conversation
cdd66a9 to
646e391
Compare
|
One thing that has been asked for is to be able to delete multiple series at once, since callers often don't have access to the complete list of attribute sets they've previously incremented. E.g. remove(http.target=foo) would remove all streams for that http.target. One way to solve this is by matching all attribute sets which don't have the keys provided. Remove without any arguments would match and remove all streams from the instrument. Unfortunately, it wouldn't be backwards-compatible to change this later, so we need to decide if this is important from the start. The other big question (which we need to resolve in the SDK spec PR), is how this impacts start time handling for cumulative metrics in the SDK. If an attribute set is deleted and recreated, the resulting metric data must have non-overlapping start-end time ranges since the cumulative value has been (presumably) reset. One way to solve this would be to require per-attribute-set start times in the SDK (#4184). |
|
A more flexible version of @dashpole's suggestion might look like:
Yeah for example, in java, we always use the same SDK start time for all cumulative series. Whether they see their first data at start time or days later, the start is always the same. I think your suggestion to track per-attribute-set start times in the SDK seems reasonable, but that seems to imply a behavior change from the single constant start time that we currently do java. Are there implications for this? Are there cases where a user with a cumulative backend would be upset to see new series with a start time corresponding to the window where data was first recorded? |
|
Also, if I call something like Makes me wonder if we need a top level instrument level close / remove method, as well as the fine grained method for removing specific series. |
I don't think it will impact Prometheus in any negative way (@ArthurSens might know more, since he opened the original issue). Depending on how strict you want to be, it may make it harder to aggregate timeseries with different start timestamps. If you user the earliest start timestamp, you may be missing data, and not produce an accurate cumulative for the entire time range. |
|
cc @jmacd |
This can be added later and separately from this effort, from what I can tell. |
My concrete use case is tied to a queue system where we report the count of events seen. See https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/ibm-mq-metrics/model/metrics.yaml#L216 When we lose contact with the queue manager, we can no longer report this information. If we get back in contact with the queue manager, we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate. |
How do you reset the start time? All SDKs i'm aware of in OTel today set the cumulative start time at instrument creation time, and never reset it. |
Actually, we would be happier to see this :) I've opened #4184 a long time ago, but never found the time to continue working on it. A start time per time series would help us provide more accurate increase rates. |
|
Since this is now sponsored, I am marking this PR ready for review. The discussion continues. |
Let's put this in as a requirement and try it out in the POCs, see how we fare. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This issue has been raised and resisted a number of times in OpenTelemetry. I understand there is still an unanswered need, but I do not like the verb Consumers should receive the correct finalized value of these series. As @ArthurSens points out in #4184, what we need is a specification for how the data should be transmitted so that the ending of a series is clear. In Prometheus, we have the NaN value, and in OTel we have the missing data point flag, but we've never specified how to set that flag. I would like to see a specification that dictates SDKs have to remember the "finishing" series long enough to send the NaN/missing-data-flag to each reader at least once, otherwise that reader would lose information. Then, to answer #4184, we need to specify that new series must be created with a start time >= the Nan/missing-data-flag previously issued for the same series. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
|
Can I please ask for a maintainer to reopen the pull request? Thanks. |
Sure,
Are you offering to work on that specification? Is it a requirement for this or a follow-up? |
Co-authored-by: Robert Pająk <pellared@hotmail.com>
|
This was discussed on 1/6/2026 as part of the maintainer sync meeting, I reproduce the meeting notes below:
|
|
The start time PR merged, so this should be unblocked now. I think there will be some updates to the start timestamp spec needed to clarify the interaction between finish and start times. "All cumulative timeseries MUST use the initial start timestamp in subsequent collection intervals." probably needs to be changed to "All cumulative timeseries MUST use the initial start timestamp in subsequent collection intervals until Finish is called on the timeseries. After Finish is called, the next measurement to the timeseries SHOULD choose a new start time following the above rules for cumulative aggregations.". |
|
Thanks for the update @dashpole . I think we are unblocked but need to update the PR and the POCs. I have asked @MrAlias if he could help with the Go POC, and I'll try to refresh the Java one. It's been a bit and we need help ; the Python POC could be a nice one to update next.
|
|
I also have a use case for being able to cleanly unregister metrics from the client at runtime... We have multiple autoscaled services that primarily consumes from kafka. As scaling events happen, kafka partitions are reassigned. We scope many of our metrics to kafka partitions where partition is an important metadata tag on the metrics. We track things at a partition level like latest offset committed, records consumed, and some logical time tracking as based on specific messages consumed. If we are unable to cleanly remove metrics at runtime, then we end up with long lived instances reporting stale values for a partition that was reassigned to another instance while that other instance reports current values for the given partition. In the case of java applications, we risk a heap leak over time on the lowest order instances that live the longest and through many partition reassignments. |
|
Can you link the PoCs in the description? |
|
|
||
| **Status**: [Development](../document-status.md) | ||
|
|
||
| Unregister the attribute set. It will no longer be reported. |
There was a problem hiding this comment.
"It will no longer be reported." - forever or until another API call trying to add the attribute set back?
There was a problem hiding this comment.
Registering again will create a new attribute set with a new start time.
|
|
||
| This API MUST accept the following parameter: | ||
|
|
||
| * [Attributes](../common/README.md#attribute) to identify the Instrument. |
There was a problem hiding this comment.
The instrument isn't identified by the attributes. I would consider "time series" or something similar. You also say this below. This also applies to other Finish function definition.
| @@ -1041,6 +1042,13 @@ Note: If a user makes no configuration changes, `Enabled` returns `true` since b | |||
| default `MeterConfig.enabled=true` and instruments use the default | |||
| aggregation when no matching views match the instrument. | |||
|
|
|||
There was a problem hiding this comment.
Some things I would like to see specified in the SDK section:
- Specify how this interacts with export and storage. Something like "The SDK MUST preserve previously-aggregated metric data for the timeseries until it has been Collected, and then MUST clear all internal storage associated with the timeseries."
- Specify how this interacts with timestamps: Something like "When a an aggregation that has been Finished is Collected, its timestamp MUST be the time at which it was Finished". This may require changes elsewhere in the specification.
- Specify what happens if a timeseries has been finished, but a new measurement is made to it. This is a bit tricky. Options are:
- [Preferred] Un-finish the timeseries. This is a new observation, so rather than stop + start again, we can just continue the existing aggregation.
- Move the timestamp back to the current measurement time, but still Finish the timeseries. This might help if there is a race where a measurement is made just after it is Finished.
- Keep the existing timeseries finished, but start a new one with a new start timestamp. This might be complex to implement, as they would both have the same attribute set.
There was a problem hiding this comment.
[Preferred[ Un-finish the timeseries. This is a new observation, so rather than stop + start again, we can just continue the existing aggregation.
👍
|
|
||
| This API MUST accept the following parameter: | ||
|
|
||
| * [Attributes](../common/README.md#attribute) to identify the Instrument. |
There was a problem hiding this comment.
I think we should change this to be able to delete one or more timeseries by treating attributes that are not provided to this as matching. There are a few reasons:
- I can easily delete all timeseries on an instrument by providing no attributes
- As a user, I don't have to keep track of every attribute that i've ever provided to the API. For example, if my HTTP server dynamically creates and deletes routes over time, I would like to be able to Finish all timeseries associated with a particular
http.routewithout needing to enumerate all of the status codes it has ever served. - This would interact much better with filtering. If, for example, I was filtering out all attributes except for
http.routeon my server metrics, it isn't clear if the SDK should Finish the aggregation withhttp.route=fooif I callFinish(http.route=foo, http.response.status_code=200, ...). Was my intent to delete the entire route? Was it to only delete the portion that has status code 200? If, instead, users of the API pass attributes where they want all aggregations containing those attributes to be deleted, the previous example would not delete anything. If I instead calledFinish(http.route=foo), only then would the entire route be deleted.
There was a problem hiding this comment.
Yes. Why not the proposed predicate solution?
I can't find any comments which indicate there are problems / critiques with it.
There was a problem hiding this comment.
Ah, sorry. Missed it. I like your approach better, and support allowing users to provide a predicate or filter function.
I think the only downside is possibly performance, but I don't expect that to be a big concern for Finish.
There was a problem hiding this comment.
Simply out of bandwidth. I will work on it for java.
Add a feature to use per-series start times to match the spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#start-timestamps This is a prerequisite to [finishing / closing](open-telemetry/opentelemetry-specification#4702). Previous prototype: #7719 --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> Co-authored-by: Robert Pająk <pellared@hotmail.com>
|
We updated the Go PoC to track the current The current Go PoC now includes:
A few observations from the Go implementation:
Based on the PoC, I think the spec would benefit from being more explicit about:
If useful, I can also summarize the concrete Go API/SDK tradeoffs we hit while implementing the PoC. |
|
@atoulme joined the 5/14 java SIG yesterday to discuss this. I asked some questions, following feedback on the java prototype:
We need to be able to have answers to these types of questions to avoid bad / confusing UX. From my POV, the answers aren't good. If the application configured the SDK to use delta temporality, you don't have to worry about calling finish at all. If cumulative, it can help but only in situations where the series are known ahead of time and the caller is aware of their lifecycle. If a caller doesn't call finish, they get the behavior they've been getting the whole time: cumulative metrics stay in memory for the lifetime of the application. We started talking about an alternative solution: What if instead of a finish method, there was a view configuration option (or MeterProvider level, or Meter level, or MetricReader level) that allowed you to specify the TTL for stale cumulative series? I.e. "I want to retain cumulative series for 10 collection cycles (or time based) beyond their last measurement". And what if we coupled this with a new instrument advisory parameter which allowed instrumentation to inform the SDK about how series lifecycle works for that instrument: "for instrument, all series receive measurements regularly, so the absence of a measurement indicates that the series is stale and should be removed after the next collection" This type of solution has a much better UX:
|
Instrumentation should call this when one or more attribute values aren't expected to ever be recorded again. E.g. a pod name of a pod that was deleted, a database pool (from the original issue. If it isn't removed, then cumulative metrics have a memory/timeseries leak. For delta metrics, some downstream components (e.g. deltatocumulative processor) won't know if a timeseries should still be considered active.
Finish is only really useful when the label you care about is not dynamic. With the database pool example above, you would need to know the database pool name, but don't need to know all of the dynamic database pool statuses (made up dynamic attribute). I think we had already discussed a predicate-based API.
Can you give an example? In some sense, an http client has no idea when you will get a 429 again, but I wouldn't want that series cleaned up (I think it would disappear from my graphs instead of showing a rate of zero). I think there are additional ways Finish would be useful. One minor thing is that you would get more accurate end timestamps when something you were monitoring (e.g. the pod or database pool) is deleted. Another is staleness markers (NoDataPointFlag) so that UIs know where to stop drawing lines. We had also discussed this being useful in the context of Bound instruments. If we wanted to go with an expiration to solve this use-case, I would ideally be able to set a TTL on a list of attribute keys. E.g. For each value of |
They already leak today. If
I prefer to think about this from the perspective of the metrics SDK arather than the graph, because graph / query behavior varies by platform. Consider the
Yes it would allow for an explicit end timestamp corresponding to the moment when finish was called. That's not possible with a TTL based approach.
That seems complicated to implement and for the user to reason about for benefit which I think would be pretty niche. |
I wouldn't call this a leak, and it sounds like we might differ on the problem we are trying to solve. In K8s SIG instrumentation, when defining new metrics for k8s components, we don't have any concerns about rare status codes because they are bounded. We don't consider it a leak to keep all status codes around, because it has to keep leaking for a long time before it causes problems. What we do worry about is an attribute where the cardinality of the values isn't bounded, and where the cardinality will continue to grow over the lifetime of the process. As far as i'm aware, this has always been when an attribute is a reference to something external (a pod, a volume, service etc), and those things come and go over time. In those cases, we use an API that allows explicitly managing the state for metrics ourselves. Monitoring external resources is a niche pattern (typically done via the collector in OTel today), and nearly all instrumentation libraries won't need to use Finish. It would maybe also be nice to have a TTL to also intelligently drop series you don't really need. But the thing that is missing from OTel today is that you can't use it to monitor a set of things that change over time.
Fair point; you are right. I agree cleaning up a metric probably shouldn't be a concern. It will probably surprise some users, but I don't think anything will break. |
|
Based on other's feedback above, it seems to me that we can have what we want without a new API surface.
I would say that that what you do as a receiver of the information is independent of what the producer decides to do. If you haven't seen any 429 observations in a while, you should be free to issue a final point with the NO_DATA flag set. When you receive that there is no data in a timeseries, you are free to create a gap in your series or to fill in zeros. I think we should have automatic TTLs; any SDK that decides to remove from its memory and stop reporting a timeseries should issue a final point with NO_DATA set. SDKs that begin reporting the same series again must use a new start timestamp. Why do we need an explicit API, especially given the problems pointed at, e.g.,
This is the norm for a statsd/dogstatsd API user. Delta temporality is the solution for this use-case, and SDKs in my opinion should be organized not to accumulate memory under these conditions. In a related note, I'm aware of large users who have been unable to migrate to OTel away from statsd/dogstatsd because those APIs let users define metric instruments "at record time"; and likewise, I think OTel SDKs should be able to efficiently handle instruments being created on-demand. |
Fixes #2232
Changes
Add the ability to finish reporting a synchronous instrument attribute set. The attribute set under that instrument will no longer report.
POCs
open-telemetry/opentelemetry-go#8050
open-telemetry/opentelemetry-java#7792