-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Routes are not cleaned after scale down/node removal via cluster-autoscaler #734
Comments
Hey @pat-s, could you post the logs of hcloud-cloud-controller-manager and its configuration? Especially the networking part of the configuration is of interest. |
v1.20.0 running with
Running on a k3s cluster deployed with terraform-hcloud-kube-hetzner. k8s version: 1.29.8 |
+1 We are hitting the 100 routes limit as well which seems to be related to a lot of node scaling events.
Is there a way to reset routes manually? Or a way to figure out which routes are really in use? |
I just deleted about 30 routes for a node with internal IP 10.255.0.4 which has been removed hours ago. So I checked the logs for this node:
Which shows that there is an event when the node is deleting but the routes are still present. |
@fatelgit @pat-s Please open a support ticket on the cloud console https://console.hetzner.cloud/support so we can fix this issue. |
As @jooola wrote, if you open an actual support ticket we can use our internal support panels to gain more insights into your projects and see what is happening on our side. I do think I found the bug without any additional info though. In the configuration @pat-s posted, you can see that the flag Based on the logs @fatelgit posted, it seems like the cluster is configured to assign Node Pod CIDRs in the But HCCM only removes routes from the range specified in the This mismatch leads to the previous routes not being cleaned up. You should change your hcloud-cloud-controller-manager configuration to only have the correct flag for your cluster setup. If I find the time today, I will open an issue with |
@apricote Opened a support ticket. I found
My cluster CIDR is in fact 10.42.0.0/16 and if it gets overwritten by 10.244.0.0/16 , then I understand why the removal is not working.
I looks like that the config sent by |
It seems like this change here might be responsible: 2ba4058 Maybe updating https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/master/templates/ccm.yaml.tpl to align with the recent changes might already do it? |
I opened kube-hetzner/terraform-hcloud-kube-hetzner#1477 In general, this is not an issue with hcloud-cloud-controller-manager but rather a misconfiguration by the user (through kube-hetzner). Hetzner does not provide official support for this. |
TL;DR
See title.
Expected behavior
Routes are removed again after the node is deleted.
Observed behavior
Routes are not removed and accumulate in the account, leading to node startup failures when the route rate limit is hit (100?).
Minimal working example
No response
Log output
No response
Additional information
Initially posted in kubernetes/autoscaler#7227
The text was updated successfully, but these errors were encountered: