Skip to content

feat: Manage prometheus resources#2117

Merged
tolusha merged 9 commits into
mainfrom
servicemonitor
May 19, 2026
Merged

feat: Manage prometheus resources#2117
tolusha merged 9 commits into
mainfrom
servicemonitor

Conversation

@tolusha
Copy link
Copy Markdown
Contributor

@tolusha tolusha commented Apr 13, 2026

What does this PR do?

  1. New metrics package (pkg/deploy/metrics/) — a new reconciler that manages Prometheus resources:
    • ServiceMonitor objects for both Che Server (che-host) and DWO (devworkspace-controller)
    • RBAC (Role + RoleBinding) granting the prometheus-k8s service account access to scrape metrics endpoints
    • Adds the openshift.io/cluster-monitoring: "true" label to the operator namespace so OpenShift's built-in monitoring stack discovers the ServiceMonitors

Screenshot/screencast of this PR

N/A

What issues does this PR fix or reference?

https://redhat.atlassian.net/browse/CRW-8629
https://redhat.atlassian.net/browse/CRW-8589
https://redhat.atlassian.net/browse/CRW-8315

How to test this PR?

  1. Deploy Eclipse Che, start a workspace
  2. Ensure that me collect devworkspace and che-server specific metrics without creating any additional resources
    https://eclipse.dev/che/docs/stable/administration-guide/monitoring-che/
    https://eclipse.dev/che/docs/stable/administration-guide/monitoring-the-dev-workspace-operator/

Common Test Scenarios

  • Deploy Eclipse Che
  • Start an empty workspace
  • Open terminal and build/run an image
  • Stop a workspace
  • Check operator logs for reconciliation errors or infinite reconciliation loops

PR Checklist

As the author of this Pull Request I made sure that:

Reviewers

Reviewers, please comment how you tested the PR when approving it.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

tolusha added 2 commits May 11, 2026 17:47
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
tolusha added 5 commits May 13, 2026 09:31
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
@tolusha tolusha marked this pull request as ready for review May 13, 2026 08:47
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
@tolusha tolusha requested a review from rohanKanojia May 13, 2026 09:38
@tolusha tolusha requested a review from akurinnoy May 13, 2026 09:38
@tolusha
Copy link
Copy Markdown
Contributor Author

tolusha commented May 13, 2026

/retest

Rules: []rbacv1.PolicyRule{
{
APIGroups: []string{""},
Resources: []string{"services", "endpoints", "pods"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I might not have the full context. Why is pods needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tolusha
Copy link
Copy Markdown
Contributor Author

tolusha commented May 14, 2026

Hi! I'm che-ai-assistant — I help with your pull requests.

Available commands:

  • /che-ai-assistant generate-che-doc — Generate a documentation PR based on this PR's changes
  • /che-ai-assistant ok-pr-review — Run a comprehensive PR review (summary, code review, deep review, impact analysis)
  • /che-ai-assistant help — Show this help message

@rohanKanojia
Copy link
Copy Markdown
Contributor

While testing it I observe that ServiceMonitor for devworkspace gets created but che-host is not getting created.

On installing Eclipse Che based on these changes and creating a DevWorkspace.

  1. ServiceMonitor got created:
kubectl get servicemonitor -neclipse-che                                                                                                                                            
NAME           AGE
devworkspace   106s
  1. Role/RoleBinding got created:
kubectl get role -neclipse-che | grep prometheus
devworkspace-prometheus                                           2026-05-14T15:39:05Z

kubectl get rolebinding -neclipse-che | grep prometheus
devworkspace-prometheus                                           Role/devworkspace-prometheus                                           2m17s

These get garbage collected when CheCluster gets deleted, confirming ownerReference is working correctly.

How to enable Che equivalent resources? I can confirm metrics is enabled in CheCluster CR:

oc get checluster -o yaml | grep -A2  metrics                                                                                                                          
      metrics:
        enable: true
      pluginRegistry:

Comment thread pkg/deploy/metrics/cheserver_prometheus_resources.go
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
@tolusha
Copy link
Copy Markdown
Contributor Author

tolusha commented May 15, 2026

@rohanKanojia
Thank you for noticing that. It has been fixed.

While testing it I observe that ServiceMonitor for devworkspace gets created but che-host is not getting created.
fixed

@tolusha
Copy link
Copy Markdown
Contributor Author

tolusha commented May 15, 2026

/che-ai-assistant generate-che-doc

Created documentation PR: eclipse-che/che-docs#3106

@tolusha
Copy link
Copy Markdown
Contributor Author

tolusha commented May 18, 2026

/che-ai-assistant ok-pr-review

Copy link
Copy Markdown
Contributor Author

@tolusha tolusha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds a well-designed metrics reconciler with clean abstractions and good test coverage. The PrometheusResourceProvider interface pattern is well-motivated, and the interval preservation mechanism shows operational awareness.

Key findings:

  • 7 inline comments with specific suggestions
  • Most are minor improvements (documentation, test coverage, deprecated package usage)
  • No blocking issues found

The reconciler design is solid and follows established patterns in the codebase. Great work on the upgrade path with deleteAbandonedResources and graceful degradation for clusters without Prometheus Operator.

Comment thread pkg/deploy/metrics/dwo_prometheus_resources.go
Comment thread pkg/deploy/metrics/dwo_prometheus_resources.go
Comment thread config/rbac/cluster_role.yaml
Comment thread pkg/deploy/metrics/prometheus_resources_utils.go
Comment thread pkg/deploy/metrics/init_test.go

func NewMetricsReconciler() *MetricsReconciler {
return &MetricsReconciler{}
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package-level isAbandonedResourcesDeleted isn't reset between tests. TestReconcileMetrics sets it true, so later tests skip deletion code. Consider a reset function in init_test.go or make this per-DeployContext.

Comment thread pkg/deploy/metrics/cheserver_prometheus_resources.go
@rohanKanojia
Copy link
Copy Markdown
Contributor

I tested it again and now I can confirm both ServiceMonitors are available:

$ kubectl get servicemonitor -neclipse-che                                                                                                                              
NAME           AGE
che            24s
devworkspace   25s
$ kubectl get role -neclipse-che | grep prometheus                                                                                                                       
che-prometheus                                                    2026-05-18T10:31:23Z
devworkspace-prometheus                                           2026-05-18T10:31:23Z

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rohanKanojia, tolusha

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tolusha tolusha merged commit 518bd8a into main May 19, 2026
25 of 26 checks passed
@tolusha tolusha deleted the servicemonitor branch May 19, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants