본문 바로가기

Container/Kubernetes

[K8S] TS - Worker Node Failure

certified-kubernetes-administrator-with-practice-tests 301강

참고: https://kubernetes.io/docs/reference/node/node-status/

 

  • Check Node Status
$ kubectl get nodse
$ kubectl describe node worker-1
# Conditions Status 잘 확인한다.
-> 문제가 있으면 Unknown Status로 나온다. (정상은 False/True)
  • Check Node
$ top
$ df -h
  • Check Kubelet Status
$ service kubelet status
$ sudo journalctl -u kubelet
  • Check Certificates
$ openssl x509 -in /var/lib/kubelet/worker-1.crt -text

 

[실습 -1]

fix the node

 

controlplane ~ ➜  kubectl get nodes
NAME           STATUS     ROLES           AGE   VERSION
controlplane   Ready      control-plane   17m   v1.29.0
node01         NotReady   <none>          17m   v1.29.0

# kubelet은 node의 controller역할을 한다.
# kubelet이 inactive 상태인 것 확인
node01 ~ ➜  service kubelet status
○ kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: inactive (dead) since Sun 2024-04-28 05:50:38 UTC; 3min 12s ago
       Docs: https://kubernetes.io/docs/
    Process: 2582 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=0/SUCCESS)
   Main PID: 2582 (code=exited, status=0/SUCCESS)

Apr 28 05:34:22 node01 kubelet[2582]: I0428 05:34:22.269241    2582 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-plugin\">
Apr 28 05:34:22 node01 kubelet[2582]: I0428 05:34:22.269355    2582 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock>
Apr 28 05:34:22 node01 kubelet[2582]: I0428 05:34:22.269405    2582 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-acc>
Apr 28 05:34:22 node01 kubelet[2582]: I0428 05:34:22.269436    2582 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-proxy\">
Apr 28 05:34:22 node01 kubelet[2582]: I0428 05:34:22.269463    2582 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\>
Apr 28 05:34:22 node01 kubelet[2582]: I0428 05:34:22.269494    2582 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-acc>
Apr 28 05:34:24 node01 kubelet[2582]: I0428 05:34:24.651624    2582 kubelet_node_status.go:497] "Fast updating node status as it just became ready"
Apr 28 05:34:25 node01 kubelet[2582]: I0428 05:34:25.433110    2582 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/kube-proxy-xq4sk" podSta>
Apr 28 05:34:26 node01 kubelet[2582]: I0428 05:34:26.356982    2582 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-flannel/kube-flannel-ds-vjpnx" >
Apr 28 05:50:38 node01 kubelet[2582]: I0428 05:50:38.557845    2582 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
node01 ~ ✖ service kubelet start

node01 ~ ➜  service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sun 2024-04-28 05:54:16 UTC; 3s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 9853 (kubelet)
      Tasks: 24 (limit: 251379)
     Memory: 42.4M
     CGroup: /system.slice/kubelet.service
             └─9853 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yam>
             
controlplane ~ ➜  kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   21m   v1.29.0
node01         Ready    <none>          21m   v1.29.0 # 정상으로 돌아옴

 

[실습 -2]

The cluster is broken again. Investigate and fix the issue.

 

controlplane ~ ➜  kubectl get nodes
NAME           STATUS     ROLES           AGE   VERSION
controlplane   Ready      control-plane   23m   v1.29.0
node01         NotReady   <none>          22m   v1.29.0

controlplane ~ ➜  ssh node01
Last login: Sun Apr 28 05:52:35 2024 from 192.7.140.4

node01 ~ ➜  service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Sun 2024-04-28 05:57:06 UTC; 7s ago
       Docs: https://kubernetes.io/docs/
    Process: 11203 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 11203 (code=exited, status=1/FAILURE)

node01 ~ ➜  service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Sun 2024-04-28 05:57:06 UTC; 7s ago
       Docs: https://kubernetes.io/docs/
    Process: 11203 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 11203 (code=exited, status=1/FAILURE) # 프로세스에 문제있음
node01 ~ ➜ service kubelet start

# restart를 시켜줘도 상태가 그대로다
node01 ~ ➜  service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Sun 2024-04-28 05:57:47 UTC; 2s ago
       Docs: https://kubernetes.io/docs/
    Process: 11662 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 11662 (code=exited, status=1/FAILURE)

로그 확인
node01 ~ ➜  sudo journalctl -u kubelet -f
...
Feb 11 06:52:45 node01 kubelet[11664]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 11 06:52:45 node01 kubelet[11664]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
Feb 11 06:52:45 node01 kubelet[11664]: I0211 06:52:45.702634   11664 server.go:206] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Feb 11 06:52:45 node01 kubelet[11664]: E0211 06:52:45.704942   11664 run.go:72] "command failed" err="failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/pki/WRONG-CA-FILE.crt: open /etc/kubernetes/pki/WRONG-CA-FILE.crt: no such file or directory"

node01 ~ ➜  cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/WRONG-CA-FILE.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

node01 ~ ➜  ls /etc/kubernetes/pki/  # 올바른 경로 확인
ca.crt

# 맞게 수정해준다.
node01 ~ ➜  cat /var/lib/kubelet/config.yaml | grep CAFile
    clientCAFile: /etc/kubernetes/pki/ca.crt

node01 ~ ➜  service kubelet restart

node01 ~ ➜  service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sun 2024-04-28 06:04:35 UTC; 5s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 14689 (kubelet)
      Tasks: 25 (limit: 251379)
     Memory: 31.8M
     CGroup: /system.slice/kubelet.service
             └─14689 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.ya>

Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.400446   14689 apiserver.go:52] "Watching apiserver"
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.402690   14689 topology_manager.go:215] "Topology Admit Handler" podUID="bd68ca99-fb4f-4c5b-86a0-a90e24695491" podNamespace>
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.402783   14689 topology_manager.go:215] "Topology Admit Handler" podUID="fc797d09-d12e-4855-a6f8-003e0dc5f78c" podNamespace>
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.440915   14689 desired_state_of_world_populator.go:159] "Finished populating initial desired state of world"
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.449060   14689 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-loc>
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.449112   14689 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules>
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.449237   14689 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-loc>
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.449367   14689 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"run\" (Uniq>
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.449507   14689 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-plugin\>
Apr 28 06:04:36 node01 kubelet[14689]: I0428 06:04:36.449554   14689 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni\" (Uniq>

node01 ~ ➜  exit
logout
Connection to node01 closed.

controlplane ~ ➜  kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   31m   v1.29.0
node01         Ready    <none>          30m   v1.29.0

 

[실습 -3]

controlplane ~ ➜  kubectl get nodes
NAME           STATUS     ROLES           AGE   VERSION
controlplane   Ready      control-plane   32m   v1.29.0
node01         NotReady   <none>          31m   v1.29.0

controlplane ~ ➜  ssh node01
Last login: Sun Apr 28 05:57:08 2024 from 192.7.140.3

node01 ~ ➜  service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sun 2024-04-28 06:05:25 UTC; 59s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 15230 (kubelet)
      Tasks: 33 (limit: 251379)
     Memory: 38.3M
     CGroup: /system.slice/kubelet.service
#정상 동작중
  • 기본적으로 쿠버네티스 API 서버는 포트 6443에서 HTTPS 요청을 수신한다. 
node01 ~ ✖ ls /etc/kubernetes/kubelet.conf 
/etc/kubernetes/kubelet.conf

controlplane ~ ✖ cat /etc/kubernetes/manifests/kube-apiserver.yaml  | grep -i port
    - --secure-port=6443
        port: 6443
        port: 6443
        port: 6443

controlplane ~ ➜  kubectl cluster-info
Kubernetes control plane is running at https://controlplane:6443
CoreDNS is running at https://controlplane:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
        
node01 ~ ➜  cat /etc/kubernetes/kubelet.conf
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJWldBSklGc1daNmt3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TkRBME1qZ3dOVEk0TWpoYUZ3MHpOREEwTWpZd05UTXpNamhhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURDWkM0L2ZocTM4S2ZDaDJqbnR4dGtzTExWL3dJUnBiSEtpcCtIS1p6M0hEbGJRUFBSYWdINXB4bEQKYXo5SjJxSWVac21maUpKanUxdDRZNXJQcWZWTmFoMUk1TDRKY1pkVWhEWkF2cExZNERsRVVZV2dwL2Z1NmNBdwpjOWhIS3FnbW5xcStwZGxYeFVkcCtrMzVQUnErRjVoVVJRUGZhNHJwMmV6aG81cFhldVFhVUduZ1V1TmxxNWlNClpaTm9kYUNtWENiMFpOUDVjMHZtWWdxOVVUSkRVVVJPczdYTjhoNXIrRGEyODA5ZVhpQ1dDMCtweUxnQ2I2VDMKaGtPWjBNdGNCaE8xVksyNjRjRER5Nld3cyt1VFVqMm94ZnJEcEhlUHAxLzJCQUVXVHQyVmhsWUIyaXhRRUNHUgp2Z3hiWUdYdGRkMDZjZzlrbDQ5VUZOV3FJWmlCQWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJRSGVtUXpNRVAwRisyZ0VSL0xkVFNjQ2IvRGt6QVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3B5ZjhlOCsxQQp0Nlk2U0JnVDA3SG5GakJjK3VSWTRac2ZQOThzd2psN2E0MTNTeDZpK3FCTW9jYmpDcFk3VmJrZXRidzlJcTRJCjdSenk2eGcrRUpvUFFvejRZaS8xaFVMWng4VXdwc1dob2I1dG5ZcFdkOXBEbllIbVozTi9kdmMyMGg3TjRrK0EKdFdtRGJJcitTVGJ4TU9KL21zMThqL0gza25TNkhiZDZ1MHAyUW9LeXAyZFhiajlldkFmcFVUSkF0WUJLbVh1bQpWc2pkT2M5WVdHYkpnb0hPR0VFZHpNZmM4TXc2bTMxYU93QldpMFNITFdMU1FYYTdMdjdWV21SWDFGRXZSVmQ1Ckk2R0Y2NStzdDVTL3VyKzUxUEp4Tmt3Qk5QanByTE5iOUVhRUxwQlVHNFVnYUN0NmsxUUx6RXJHeVc0dnZFY3QKZEZrRXRpUWNUTkZICi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
    server: https://controlplane:6553 # control-plane의 Port Number는 6553이 아닌 6443이다.
  name: default-cluster
  ...
    
node01 ~ ➜  vi /etc/kubernetes/kubelet.conf  # 6443으로 수정!

node01 ~ ➜ cat /etc/kubernetes/kubelet.conf | grep server
    server: https://controlplane:6443
    
# 설정 후 kubelet restart.
node01 ~ ➜  service kubelet restart

node01 ~ ➜  service kubelet status
node01 ~ ➜ exit
logout
Connection to node01 closed.

controlplane ~ ➜ kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   40m   v1.29.0
node01         Ready    <none>          39m   v1.29.0
반응형

'Container > Kubernetes' 카테고리의 다른 글

[K8S] ETCD  (1) 2024.12.24
[K8S] 컨테이너 런타임  (0) 2024.12.24
[K8S] TS - Control Plane Failure  (0) 2024.04.28
[K8S] TS - Application Failure  (0) 2024.04.28
[K8S] Multiple Container  (0) 2024.04.28