- 노드 스케줄링 중단(cordon)및 허용(uncordon)
- 클러스터의 특정 노드에서 Pod을 배치하지 않거나 제거할 때 사용하는 명령어
- cordon 명령을 내부적으로 실행하여 노드를 스케줄링 불가능 상태로 만든 뒤, 실행 중인 Pod을 다른 노드로 이동(재스케줄링)하거나 종료한다.
cordon
- 특정 노드를 스케줄링 불가능 상태(Unschedulable)로 설정한다.
- cordon 명령을 사용하면 새로운 Pod은 해당 노드에 스케줄링되지 않지만, 이미 실행 중인 Pod에는 영향을 주지 않는다.
- 사용 사례
- 노드 유지 관리 작업을 준비할 때
- 특정 노드에서 새로운 워크로드 배치를 방지하고 싶을 때
drain
- 특정 노드에서 모든 Pod을 안전하게 제거한다.
- drain은 Pod을 다른 노드로 재스케줄링하거나 종료하여 노드를 비우는 작업을 수행합니다.
- 사용사례
- 노드 업데이트, 하드웨어 교체, 클러스터 확장 등 유지 관리 작업
- 노드를 클러스터에서 제거하려는 경우
$ kubectl cordon/uncordon <node name>
$ kubectl drain <node name> --ignore-daemonsets --delete-emptydir-data
--ignore-daemonsets: DaemonSet Pod은 삭제하지 않고 무시.
--delete-emptydir-data: emptyDir 볼륨 데이터를 사용하는 Pod도 강제로 삭제.
--force: Pod 삭제에 대한 추가 확인 단계를 건너뜀 (RC,RS,Job,DaemonSet 또는 StatefulSet에서 관리하지 않는 Pod까지 제거)
- DaemonSet Pod을 강제로 삭제하려면 수동으로 삭제해야 한다.
cordon | drain | |
주요 목적 | 노드에 새로운 Pod 스케줄링 차단 | 노드에서 실행 중인 모든 Pod 제거 |
Pod 상태 | 기존 Pod은 유지 | 기존 Pod은 삭제되거나 다른 노드로 이동 |
사용 시점 | 유지 관리 작업 준비 | 유지 관리 작업 수행 |
명령어 실행 후 결과 | 노드가 SchedulingDisabled 상태가 됨 | 노드가 비워지고 SchedulingDisabled 상태 |
root@master:~/pod-scheduling# kubectl cordon node2.example.com
node/node2.example.com cordoned
root@master:~/pod-scheduling# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master.example.com Ready control-plane 27d v1.29.3
node1.example.com Ready <none> 27d v1.29.3
node2.example.com Ready,SchedulingDisabled <none> 27d v1.29.3
node3.example.com Ready <none> 20h v1.29.3
root@master:~/pod-scheduling# kubectl apply -f deploy-nginx.yaml
deployment.apps/webui created
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
webui-9dc5b6684-2rlbv 1/1 Running 0 21s 10.44.0.1 node1.example.com <none> <none>
webui-9dc5b6684-d8x9p 1/1 Running 0 21s 10.44.0.2 node1.example.com <none> <none>
webui-9dc5b6684-fk9z7 1/1 Running 0 21s 10.47.0.1 node3.example.com <none> <none>
webui-9dc5b6684-hl5fp 1/1 Running 0 21s 10.47.0.2 node3.example.com <none> <none>
# 노드 1,3에서만 실행되는 것 확인 가능
# uncordon
root@master:~/pod-scheduling# kubectl uncordon node2.example.com
node/node2.example.com uncordoned
root@master:~/pod-scheduling# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master.example.com Ready control-plane 27d v1.29.3
node1.example.com Ready <none> 27d v1.29.3
node2.example.com Ready <none> 27d v1.29.3
node3.example.com Ready <none> 20h v1.29.3
- uncordon : 유지 관리가 완료된 후, 노드를 다시 스케줄링 가능 상태로 변경
# We need to take node01 out for maintenance. Empty the node of all applications and mark it unschedulable.
controlplane ~ ➜ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
blue-667bf6b9f9-5dcqk 1/1 Running 0 79s 10.244.1.3 node01 <none> <none>
blue-667bf6b9f9-8f76x 1/1 Running 0 79s 10.244.1.2 node01 <none> <none>
blue-667bf6b9f9-9ls8z 1/1 Running 0 79s 10.244.0.4 controlplane <none> <none>
controlplane ~ ➜ kubectl drain node01
node/node01 cordoned
error: unable to drain node "node01" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-splpr, kube-system/kube-proxy-bsg97, continuing command...
There are pending nodes to be drained:
node01
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-splpr, kube-system/kube-proxy-bsg97
# Pod가 올라가있기 때문에 바로 삭제할 수 없다.
controlplane ~ ✖ kubectl drain node01 --ignore-daemonsets
node/node01 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-splpr, kube-system/kube-proxy-bsg97
evicting pod default/blue-667bf6b9f9-8f76x
evicting pod default/blue-667bf6b9f9-5dcqk
pod/blue-667bf6b9f9-8f76x evicted
pod/blue-667bf6b9f9-5dcqk evicted
node/node01 drained
controlplane ~ ➜ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 14m v1.29.0
node01 Ready,SchedulingDisabled <none> 13m v1.29.0
controlplane ~ ➜ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
blue-667bf6b9f9-2jp7j 1/1 Running 0 50s 10.244.0.5 controlplane <none> <none>
blue-667bf6b9f9-9ls8z 1/1 Running 0 3m46s 10.244.0.4 controlplane <none> <none>
blue-667bf6b9f9-r25q8 1/1 Running 0 50s 10.244.0.6 controlplane <none>
# 전부 controlplane으로 옮겨갔다.
controlplane ~ ➜ kubectl describe node controlplane | grep Taint
Taints: <none>
# Taint가 없기때문에 controlplane에 파드가 배치될 수 있다.
# maintenance tasks가 모두 끝났다. 다시 node01을 schedulable하게 바꿔준다.
controlplane ~ ➜ kubectl uncordon node01
node/node01 uncordoned
controlplane ~ ➜ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 16m v1.29.0
node01 Ready <none> 15m v1.29.0
- maintenace가 끝나고 다시 node01을 schedulable하게 바꾼다.
# Running the uncordon command on a node will not automatically schedule pods on the node.
# only When new pods are created, they will be placed on node01.
controlplane ~ ➜ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 10m v1.31.0
node01 Ready,SchedulingDisabled <none> 10m v1.31.0
controlplane ~ ➜ kubectl uncordon node01
node/node01 uncordoned
controlplane ~ ➜ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 11m v1.31.0
node01 Ready <none> 11m v1.31.0
- 이슈가 발생해 node01을 다시 maintenance로 바꿔줘야 한다면,
controlplane ~ ➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
blue-667bf6b9f9-2jp7j 1/1 Running 0 9m24s
blue-667bf6b9f9-9ls8z 1/1 Running 0 12m
blue-667bf6b9f9-r25q8 1/1 Running 0 9m24s
hr-app 1/1 Running 0 3m2s
controlplane ~ ➜ kubectl drain node01 --ignore-daemonsets
node/node01 cordoned
error: unable to drain node "node01" due to error:cannot delete Pods declare no controller (use --force to override): default/hr-app, continuing command...
There are pending nodes to be drained:
node01
cannot delete Pods declare no controller (use --force to override): default/hr-app
# 에러 발생한다.
# hr-app은 replicaset 일종이 아닌 single pod이다.
# 이런 case에는 drain command가 작동하지 않으니, --force 를 사용해주어야 한다.
# --force를 사용해주면
controlplane ~ ➜ kubectl drain node01 --ignore-daemonsets --force
node/node01 already cordoned
Warning: deleting Pods that declare no controller: default/hr-app; ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-splpr, kube-system/kube-proxy-bsg97
evicting pod default/hr-app
pod/hr-app evicted
node/node01 drained
# 이렇게 node01에서 영원히 사라지게 된다.
# hr-app is a critical app and we do not want it to be removed and we do not want to schedule any more pods on node01.
# Mark node01 as unschedulable so that no new pods are scheduled on this node.
## Make sure that hr-app is not affected.
# Do not drain node01, instead use the kubectl cordon node01 command. This will ensure that no new pods are scheduled on this node and the existing pods will not be affected by this operation.
controlplane ~ ➜ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 31m v1.29.0
node01 Ready,SchedulingDisabled <none> 30m v1.29.0
controlplane ~ ➜ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
blue-667bf6b9f9-2jp7j 1/1 Running 0 19m 10.244.0.5 controlplane <none> <none>
blue-667bf6b9f9-9ls8z 1/1 Running 0 22m 10.244.0.4 controlplane <none> <none>
blue-667bf6b9f9-r25q8 1/1 Running 0 19m 10.244.0.6 controlplane <none> <none>
hr-app-67c9b64bcf-k7llx 1/1 Running 0 2m49s 10.244.1.5 node01 <none> <none>
# 잘 작동하고 있다.
[실습 - 2]
root@master:~/pod-scheduling# kubectl apply -f deploy-nginx.yaml
deployment.apps/webui created
root@master:~/pod-scheduling# kubectl run db --image=redis
pod/db created
Every 2.0s: kubectl get pods -o wide master.example.com: Sat Apr 13 12:58:37 2024
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
db 1/1 Running 0 59s 10.47.0.2 node3.example.com <none> <none>
webui-9dc5b6684-65wbs 1/1 Running 0 84s 10.36.0.2 node2.example.com <none> <none>
webui-9dc5b6684-gkqn4 1/1 Running 0 84s 10.36.0.1 node2.example.com <none> <none>
webui-9dc5b6684-n9jqj 1/1 Running 0 84s 10.44.0.1 node1.example.com <none> <none>
webui-9dc5b6684-wqvjz 1/1 Running 0 84s 10.47.0.1 node3.example.com <none> <none>
# 노드 3에서 db가 실행됐음
root@master:~/pod-scheduling# kubectl drain node3.example.com
node/node3.example.com cordoned
error: unable to drain node "node3.example.com" due to error:[cannot delete Pods declare no controller (use --force to override): default/db, cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/kube-proxy-wvz78, kube-system/weave-net-8qzkj], continuing command...
There are pending nodes to be drained:
node3.example.com
cannot delete Pods declare no controller (use --force to override): default/db
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/kube-proxy-wvz78, kube-system/weave-net-8qzkj
# 에러 발생하는 것 확인가능 - > 옵션 필요
root@master:~/pod-scheduling# kubectl drain node3.example.com --ignore-daemonsets --force
node/node3.example.com already cordoned
Warning: deleting Pods that declare no controller: default/db; ignoring DaemonSet-managed Pods: kube-system/kube-proxy-wvz78, kube-system/weave-net-8qzkj
evicting pod default/webui-9dc5b6684-wqvjz
evicting pod default/db
pod/webui-9dc5b6684-wqvjz evicted
pod/db evicted
node/node3.example.com drained
Every 2.0s: kubectl get pods -o wide master.example.com: Sat Apr 13 13:02:31 2024
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATE
webui-9dc5b6684-65wbs 1/1 Running 0 5m18s 10.36.0.2 node2.example.com <none> <none>
webui-9dc5b6684-gkqn4 1/1 Running 0 5m18s 10.36.0.1 node2.example.com <none> <none>
webui-9dc5b6684-m9hxd 1/1 Running 0 33s 10.44.0.2 node1.example.com <none> <none>
webui-9dc5b6684-n9jqj 1/1 Running 0 5m18s 10.44.0.1 node1.example.com <none> <none>
# 이제 노드3에 실행되고 있는 Pod 없는 것 확인
root@master:~/pod-scheduling# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master.example.com Ready control-plane 27d v1.29.3
node1.example.com Ready <none> 27d v1.29.3
node2.example.com Ready <none> 27d v1.29.3
node3.example.com Ready,SchedulingDisabled <none> 20h v1.29.3
root@master:~/pod-scheduling# kubectl uncordon node3.example.com
node/node3.example.com uncordoned
root@master:~/pod-scheduling# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master.example.com Ready control-plane 27d v1.29.3
node1.example.com Ready <none> 27d v1.29.3
node2.example.com Ready <none> 27d v1.29.3
node3.example.com Ready <none> 20h v1.29.3
root@master:~/pod-scheduling#
반응형
'Container > Kubernetes' 카테고리의 다른 글
[K8S] PV & PVC (0) | 2024.04.20 |
---|---|
CKA 시험준비 (0) | 2024.04.16 |
[K8S] Taint & Toleration (0) | 2024.04.12 |
[K8S] Affinity & antiAffinity (0) | 2024.03.25 |
[K8S] Secret (0) | 2024.03.20 |