Image swap issue during node draining

Hi,

We have faced an issue when we try to drain the kubernetes node group, where `k8s-image-swapper` pods are running alongside with a bunch of other pods: it is not capable to swap images paths for all pods in time.
So, I can find that some images are lost 
```
kubectl get pods -A -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |tr -s '[[:space:]]' '\n' |sort |uniq -c | egrep -v '(dkr.ecr|public.ecr)'
```
Of course, `pdb` has been enabled and set to `3`, `podAntiAffinity` in `preferred` mode has been configured as well, even `priorityClassName: system-cluster-critical` has been added (wasn't expected that it would help a lot).

Total number of running pods is `4`, I suppose, we could try to increase the replica count and pdb accordingly even more and, probably, it might help, but I see no sense in such ineffective scaling,

I am mostly sure that the reason is in `livenessProbe` and `readinessProbe`: they return success to fast, the service is not capable to handle requests yet.
https://github.com/estahn/charts/blob/main/charts/k8s-image-swapper/templates/deployment.yaml#L76

Could you, please, check these probes? Does they represent the service health correctly?
The possibility of configuration of this block in the helm chart might be considered too: I guess, increasing `successThreshold` from default `1`  to `2` could have some impact since it will provide the service more time on initialization.

Mostly forgot to say, we was able to mitigate mentioned issue by adding `kubectl cordon` and `kubectl rollout restart deployment k8s-image-swapper`, only after that we run `kubectl drain`. It buys some time for the service to start, but we would be very happy to remove this additional logic.

Many thanks for your work!
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image swap issue during node draining #170

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Image swap issue during node draining #170

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions