etcd (분산 키-값 저장소)
쿠버네티스에서 운영되는 모든 objects는 Nodes, configs, Secret, Roles 등을 받아서 etcd에 저장한다.
따라서 etcd 데이터의 손실은 Kubernetes 클러스터 전체에 심각한 영향을 미칠 수 있으므로, 정기적인 백업과 복구 전략이 필수적이다.
etcd Cluster는 master node에 static pod로 호스트되며, cluster의 state와 같은 정보를 저장한다.
# ETCDCTL
ETCDCTL은 etcd를 위한 command line client이다.
- ETCDCTL은 Version 2 and Version 3를 사용하고, default로는 Version 2로 설정된다.
- back up과 restore와 같은 작업을 하기 위해서는 ETCDCTL_API를 3으로 설정해놔야 한다.
- export ETCDCTL_API=3
controlplane ~ ➜ export ETCDCTL_API=3
controlplane ~ ➜ etcdctl version
etcdctl version: 3.3.13
API version: 3.3
- version별로 각각 다른 command를 사용하니 주의한다.
# ETCDCTL version2
etcdctl backup
etcdctl cluster-health
etcdctl mk
etcdctl mkdir
etcdctl set
# ETCDCTL version3
etcdctl snapshot save
etcdctl endpoint health
etcdctl get
etcdctl put
# API 버전 설정을 위해서는 다음 명령어 실행
export ETCDCTL_API=3
# certificate file path
--cacert /etc/kubernetes/pki/etcd/ca.crt
--cert /etc/kubernetes/pki/etcd/server.crt
--key /etc/kubernetes/pki/etcd/server.key
kubectl exec etcd-master -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl get / --prefix --keys-only --limit=10 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key"
- listen-client-urls : etcd 서버가 클라이언트 요청을 수신하는 주소
- etcd가 설치된 노드에서 직접 실행하는 경우에는 해당 IP를 사용한다.
- advertise-client-urls: 클러스터 외부 클라이언트에게 알려주는 주소를 지정(클라이언트가 etcd 서버로 연결할 주소)
- Snapshot은 클라이언트 명령(etcdctl)로 실행되며, 클라이언트는 etcd 서버에 연결하기 위해 advertise-client-urls에 지정된 주소를 참조
- etcdctl 명령어를 etcd가 실행되지 않은 원격 노드에서 실행하려면, advertise-client-urls에 지정된 외부 접근 가능한 IP를 사용해야 한다.
- Multi node cluster의 경우에도 etcd 클러스터의 다른 노드에 Snapshot 명령을 실행하려면, 해당 노드의 advertise-client-urls에 설정된 IP를 사용한다.
ETCD backup
- master의 장애와 같은 예기치 못한 사고로 인해 ETCD 데이터베이스가 유실될 경우를 대비해서 Backup API를 제공
- ETCD snapshot (하나의 File로 복제 및 복구 가능)
- etcdctl snapshot save <snaphot filename>
controlplane /var/lib/kubelet ➜ cd /etc/kubernetes/manifests/
controlplane $ cat etcd.yaml | grep file
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
# trusted-ca-file, cert-file and key-file은 etcd pod definition 에서 가져온다.
# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
# --cacert=<trusted-ca-file> --cert=<cert-file> --key=<key-file> \
# snapshot save <backup-file-location>
#
controlplane $ ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
> --cacert=/etc/kubernetes/pki/etcd/ca.crt \
> --cert=/etc/kubernetes/pki/etcd/server.crt \
> --key=/etc/kubernetes/pki/etcd/server.key \
> snapshot save /opt/cluster_backup.db
{"level":"info","ts":1740981811.9616451,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/opt/cluster_backup.db.part"}
{"level":"info","ts":1740981811.9787233,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1740981811.9791672,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":1740981812.2119951,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1740981812.2281883,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"6.8 MB","took":"now"}
{"level":"info","ts":1740981812.228554,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"/opt/cluster_backup.db"}
Snapshot saved at /opt/cluster_backup.db
controlplane $ ls /opt/cluster_backup.db
/opt/cluster_backup.db
- 만약 ETCDCTL_API 2를 사용하고 있다면, snapshot이 작동하지 않으므로 3로 사용한다.
controlplane /etc/kubernetes/manifests ➜ etcdctl snapshot
No help topic for 'snapshot'
# 3에서는 작동
controlplane /etc/kubernetes/manifests ➜ ETCDCTL_API=3 etcdctl snapshot
NAME:
snapshot - Manages etcd node snapshots
USAGE:
etcdctl snapshot <subcommand>
# 매번 ETCDCTL_API=3를 붙여주기는 번거로우니 export 해준다.
controlplane /etc/kubernetes/manifests ➜ export ETCDCTL_API=3
ETCD Restore
- snapshot으로 저장한 database 파일을 동작중인 etcd에 적용하여 snapshot 생성 시점으로 되돌리기
# export ETCDCTL_API=3 etcdctl --data-dir <data-dir-location> snapshot restore snapshot.db
controlplane $ ETCDCTL_API=3 etcdctl --data-dir /var/lib/etcd-new snapshot restore /opt/cluster_backup.db
2025-03-03T06:13:02Z info snapshot/v3_snapshot.go:251 restoring snapshot {"path": "/opt/cluster_backup.db", "wal-dir": "/var/lib/etcd-new/member/wal", "data-dir": "/var/lib/etcd-new", "snap-dir": "/var/lib/etcd-new/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
2025-03-03T06:13:02Z info membership/store.go:119 Trimming membership information from the backend...
2025-03-03T06:13:02Z info membership/cluster.go:393 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2025-03-03T06:13:02Z info snapshot/v3_snapshot.go:272 restored snapshot {"path": "/opt/cluster_backup.db", "wal-dir": "/var/lib/etcd-new/member/wal", "data-dir": "/var/lib/etcd-new", "snap-dir": "/var/lib/etcd-new/member/snap"}
# 데이터 복구 완료
controlplane $ cd etcd-new/
controlplane $ tree
.
`-- member
|-- snap
| |-- 0000000000000001-0000000000000001.snap
| `-- db
`-- wal
`-- 0000000000000000-0000000000000000.wal
3 directories, 3 files
# etcd pod의 데이터 저장소를 new data directory로 바꿔주기.
controlplane $ vi /etc/kubernetes/manifests/etcd.yaml
volumeMounts:
- mountPath: /var/lib/etcd # 해당 데이터가 mount된다.
name: etcd-data # 이 이름으로 밑의 volume과 연동된다.
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd-new # 여기 바꿔준다.
type: DirectoryOrCreate
name: etcd-data
status: {}
# etcd restart 시키기 (보통 자동으로 retart됨)
kubectl -n kube-system delete pod etcd-controlplane -n kube-system
- 외부 etcd 서버를 restore
# student-node에서 the etcd-server로 백업
student-node /opt ➜ scp /opt/cluster2.db etcd-server:/root
cluster2.db 100% 2232KB 102.3MB/s 00:00
# 다이렉트로 etcd-server에서 복구하는 작업이라 endpoint https:/127.0.0.1 사용중
etcd-server ~ ➜ ETCDCTL_API=3 etcdctl snapshot restore /root/cluster2.db --data-dir=/var/lib/etcd-data-new
{"level":"info","ts":1736416857.3161871,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
{"level":"info","ts":1736416857.3319352,"caller":"mvcc/kvstore.go:388","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":6000}
{"level":"info","ts":1736416857.3374014,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1736416857.4211009,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
# etcd user에 의해 소유되도록 permission 변경
etcd-server /var/lib ➜ chown -R etcd:etcd /var/lib/etcd-data-new/
# Update the systemd service
etcd-server /var/lib ➜ vi /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \
--name etcd-server \
--data-dir=/var/lib/etcd-data-new \
--cert-file=/etc/etcd/pki/etcd.pem \
--key-file=/etc/etcd/pki/etcd-key.pem \
--peer-cert-file=/etc/etcd/pki/etcd.pem \
--peer-key-file=/etc/etcd/pki/etcd-key.pem \
--trusted-ca-file=/etc/etcd/pki/ca.pem \
--peer-trusted-ca-file=/etc/etcd/pki/ca.pem \
--peer-client-cert-auth \
--client-cert-auth \
--initial-advertise-peer-urls https://192.8.47.20:2380 \
--listen-peer-urls https://192.8.47.20:2380 \
--advertise-client-urls https://192.8.47.20:2379 \
--listen-client-urls https://192.8.47.20:2379,https://127.0.0.1:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcd-server=https://192.8.47.20:2380 \
--initial-cluster-state new
Restart=on-failure
RestartSec=5
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
# reload and restart the etcd service.
etcd-server /var/lib ➜ systemctl daemon-reload
etcd-server /var/lib ➜ systemctl restart etcd
etcd-server /var/lib ➜ systemctl status etcd
● etcd.service - etcd key-value store
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor pres
et: enabled)
Active: active (running) since Thu 2025-01-09 10:11:19 UTC;
5s ago
Docs: https://github.com/etcd-io/etcd
# It is recommended to restart controlplane components (e.g. kube-scheduler, kube-controller-manager, kubelet) to ensure that they don't rely on some stale data.
student-node /opt ➜ kubectl delete pods kube-controller-manager-cluster2-controlplane kube-scheduler-cluster2-controlplane -n kube-system
pod "kube-controller-manager-cluster2-controlplane" deleted
pod "kube-scheduler-cluster2-controlplane" deleted
ssh cluster2-controlplane
systemctl restart kubelet
systemctl status kubelet
- etcd cluster 조회
# external etcd-cluster
student-node ~ ➜ ssh etcd-server
etcd-server ~ ➜ ps -ef | grep -i etcd
etcd 820 1 0 08:45 ? 00:00:41 /usr/local/bin/etcd --name etcd-server --data-dir=/var/lib/etcd-data --cert-file=/etc/etcd/pki/etcd.pem --key-file=/etc/etcd/pki/etcd-key.pem --peer-cert-file=/etc/etcd/pki/etcd.pem --peer-key-file=/etc/etcd/pki/etcd-key.pem --trusted-ca-file=/etc/etcd/pki/ca.pem --peer-trusted-ca-file=/etc/etcd/pki/ca.pem --peer-client-cert-auth --client-cert-auth --initial-advertise-peer-urls https://192.8.47.20:2380 --listen-peer-urls https://192.8.47.20:2380 --advertise-client-urls https://192.8.47.20:2379 --listen-client-urls https://192.8.47.20:2379,https://127.0.0.1:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster etcd-server=https://192.8.47.20:2380 --initial-cluster-state new
root 1046 969 0 09:32 pts/0 00:00:00 grep -i etcd
etcd-server ~ ➜ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/etcd.pem --key=/etc/etcd/pki/etcd-key.pem member list
7a9b662b8a759cb7, started, etcd-server, https://192.8.47.20:2380, https://192.8.47.20:2379, false
# ETCD cluster에 하나의 노드 작동중
반응형
'Container > Kubernetes' 카테고리의 다른 글
[K8S] Logging & Monitoring (1) | 2024.04.26 |
---|---|
[K8S] kube-scheduler (0) | 2024.04.25 |
[K8S] CoreDNS (0) | 2024.04.21 |
[K8S] PV & PVC (0) | 2024.04.20 |
[K8S] Storage/Volume (0) | 2024.04.19 |