Cyberithub

Unable to drain out Kubernetes Cluster Node for Maintenance

Advertisements

In this article, we will see how to deal with the error "unable to drain node" in case you are getting it while trying to drain out the Cluster node for maintenance. Maintenance is a periodic activity which every organization regularly perform to keep the Cluster healthy, secure and up to date. In the last article, we have seen How to take a Kubernetes Cluster node out for maintenance. In that article, we have understood what to do in case node does not get drained out by using kubectl drain <node_name> command.

Here we will further try to understand what we can do if the node is still throwing the same error and does not get resolved even after using the --ignore-daemonsets option. More on Kubernetes documentation.

Unable to drain out Kubernetes Cluster Node for Maintenance

Unable to drain out Kubernetes Cluster Node for Maintenance

In our use case we have a master and worker node-1 where all the pods are currently running. Our task is to take node-1 out for maintenance. But when we try to drain out node-1 using kubectl drain node-1 command then we always end up with the error unable to drain node "node-1" as shown below.

root@cyberithub:~# kubectl drain node-1 
node/node-1 cordoned
error: unable to drain node "node-1", aborting command...

There are pending nodes to be drained:
node-1
cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/simple-webapp-1
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/kube-flannel-ds-xngfr, kube-system/kube-proxy-hsq9m

So as explained in the previous article, we try to use --ignore-daemonsets option here to delete daemonset managed pods but that also did not helped completely. We are still observing the error "cannot delete pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet".

root@cyberithub:~# kubectl drain node-1 --ignore-daemonsets
node/node-1 cordoned
error: unable to drain node "node-1", aborting command...

There are pending nodes to be drained:
node-1
error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/hello-app

So if you check the pods status again, you can see that pod is still running on worker node "node-1". It is simply because pod running on node-1 is not part of any ReplicationController or Replicaset or any Job.

root@cyberithub:~# kubectl get pods -o wide
NAME                  READY STATUS  RESTARTS AGE   IP         NODE   NOMINATED NODE READINESS GATES
test-746c87566d-8h8gz 1/1   Running    0     19m   10.238.3.5 master <none>         <none>
test-746c87566d-k5mxz 1/1   Running    0     19m   10.238.3.6 master <none>         <none>
test-746c87566d-zslkf 1/1   Running    0     19m   10.238.3.4 master <none>         <none>
hello-app             1/1   Running    0     2m49s 10.238.1.5 node-1 <none>         <none>

So to fix this issue we need to forcefully evict all the pods from the node using --force option. But mind you this option will remove the hello-app pod and then it will be lost forever as it is not part of any daemonset or ReplicaSet or ReplicationController or Job. So run this command only when you are completely sure.

root@cyberithub:~# kubectl drain node-1 --ignore-daemonsets --force
node/node-1 already cordoned
WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: default/hello-app; ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-g2v5g, kube-system/kube-proxy-fsfjt
evicting pod default/hello-app
pod/hello-app evicted
node/node-1 evicted

Now if you check the node status you can see that worker node "node-1" is evicted.

root@cyberithub:~# kubectl get nodes
NAME   STATUS ROLES  AGE VERSION
master Ready  master 48m v1.20.0

Similarly if you check the pods status, you can see that no pod is running on node-1. So now we are good to proceed for node-1 maintenance.

root@cyberithub:~# kubectl get pods -o wide
NAME                  READY STATUS  RESTARTS AGE IP         NODE   NOMINATED NODE READINESS GATES
test-746c87566d-8h8gz 1/1   Running    0     26m 10.238.3.5 master <none>         <none>
test-746c87566d-k5mxz 1/1   Running    0     26m 10.238.3.6 master <none>         <none>
test-746c87566d-zslkf 1/1   Running    0     26m 10.238.3.4 master <none>         <none>

Sometimes you might have a scenario where you don't want to loose your critical application running on node-1 and hence using --force option may not be a good idea. So in cases like that what you can do is  you can simply cordon the node by using kubectl cordon node-1 command so that the current running application won't be removed but there will be no further application scheduled on that node.

root@cyberithub:~# kubectl cordon node-1
node/node-1 cordoned

So if you check the status of node-1, you can see that it is set to SchedulingDisabled. It means there will be no further apps getting scheduled on this node.

root@cyberithub:~# kubectl get nodes
NAME   STATUS                   ROLES  AGE VERSION
master Ready                    master 53m v1.20.0
node-1 Ready,SchedulingDisabled <none> 53m v1.20.0

Leave a Comment