EKS 버전 업그레이드

Cloud

EKS 버전 업그레이드

Hanhorang31 2026. 5. 3. 00:19

개요

CloudNET@ 가시다님이 진행하는 AWES 4기 스터디 내용을 정리합니다.

워크샵을 제공해주신 AWS 김성한님 감사드립니다.

AWS 워크샵 (Amazon EKS 업그레이드: 전략 및 모범 사례)을 실습하고 정리하였습니다.

해당 워크샵은 EKS 버전 업그레이드 전략을 소개하고 실습하는 내용을 담고 있습니다.

본 장에서는 실습 내용을 참고하여 향후 참고할 EKS 버전 업그레이드 전략을 담았습니다.

버전 업그레이드 이유

Kubernetes 프로젝트는 새로운 기능, 최신 보안 패치 및 버그 수정으로 업데이트됩니다.

시멘틱 버전 관리로 xyz형식으로 표현되며, y인 부 버전이 약 4개월마다 출시됩니다.

EKS 도 Kubernetes 프로젝트의 릴리스 주기를 따르며, EKS 버전이 출시된 후 14개월동안

최대 4개의 마이너 버전을 지원합니다.

마이너 버전이 만료되면 약 1년 동안의 확장 지원이 들어가며 12개월 동안 추가로 제공되며, 비용은 6배로 증가하게 됩니다. (클러스터 시간 당 0.1$ → 0.6$)

EKS 릴리스 버전별 지원표

플랫폼 버전

https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

View Amazon EKS platform versions for each Kubernetes version - Amazon EKS

Help improve this page To contribute to this user guide, choose the Edit this page on GitHub link that is located in the right pane of every page. View Amazon EKS platform versions for each Kubernetes version Amazon EKS platform versions represent the capa

docs.aws.amazon.com

업그레이드 순서

EKS 업그레이드 방식의 차이로 인플레이스 / 블루 -그린이로 나뉘어 집니다.

인플레이스는 동일 클러스터에서 버전 업그레이드하는 방식이며, 업그레이드 순서는 컨트롤 플레인 → Addn → 데이터 플레인으로 진행됩니다.

블루-그린 방식은 새로운 클러스터를 만들어 트래픽을 마이그레이션합니다.

업그레이드 체크리스트

1. API 종속 구성 요소 식별

kubectl get ns

→ API 와 직접 상호 작용하는 에이전트 파악

2. 각 요소 호환성 확인

위에서 확인한 구성 요소들의 버전 호환성을 체크합니다.

AWS addon 의 경우 업그레이드 지침은 다음과 같이 확인할 수 있습니다.

Amazon VPC CNI: Upgrades for Amazon EKS Add-on deployments are limited to single minor version bumps at a time.
kube-proxy: Find upgrade instructions in Updating the Kubernetes kube-proxy self-managed add-on .
CoreDNS: Upgrade guidance is available in Updating the CoreDNS self-managed add-on .
AWS Load Balancer Controller: Compatibility with your EKS version is crucial. Refer to the installation guide for details.
Amazon EBS/EFS CSI Drivers: Installation and upgrade information can be found in Managing the Amazon EBS CSI driver as an Amazon EKS add-on and Amazon EFS CSI driver respectively.
Metrics Server: For more information, see metrics-server on GitHub .
Cluster Autoscaler: Upgrade by modifying the image version within the deployment definition. Given its tight coupling with the scheduler, the Cluster Autoscaler typically requires upgrading alongside the cluster itself. Consult the GitHub releases to locate the latest image compatible with your Kubernetes minor version.
Karpenter: Refer to the Karpenter documentation for installation and upgrade instructions .

3. 서드파티 툴 업그레이드 호환성 확인

Ingress control, CD, 모니터링 툴에 대한 호환성 확인이 필요

업그레이드 기본 요구 사항

1.가용 IP 확인 : 클러스터 생성 시 지정한 서브넷 내에 최소 5개의 여유 IP 주소가 필요합니다.

aws ec2 describe-subnets --subnet-ids \
  $(aws eks describe-cluster --name ${CLUSTER_NAME} \
  --query 'cluster.resourcesVpcConfig.subnetIds' \
  --output text) \
  --query 'Subnets[*].[SubnetId,AvailabilityZone,AvailableIpAddressCount]' \
  --output table

----------------------------------------------------
|                  DescribeSubnets                 |
+---------------------------+--------------+-------+
|  subnet-01a51da1a2446fe12 |  us-west-2a  |  4028 |
|  subnet-084fc89bc815c99c2 |  us-west-2c  |  4070 |
|  subnet-05acf5010e45ae9f3 |  us-west-2b  |  4049 |

기존 서브넷에 사용 가능한 IP 주소가 부족한 경우 Enhanced VPC flexibility 사용 검토
프라이빗 CIDR 블록 추가 : IP 풀을 확장하기 위해 RFC 1918을 준수하는 새로운 사설 IP 대역을 VPC에 추가할 수 있습니다.
새 CIDR 기반으로 서브넷 업데이트: VPC에 새로 추가된 CIDR 블록이 반영되도록 클러스터에서 사용하는 서브넷을 업데이트해야 합니다. (Secondary CIDR 추가)

2. IAM Role 확인 : EKS에 설정된 IAM 역할로 업그레이드를 수행할 수 있는 지 확인합니다.

# EKS Assume 확인 (EKS서비스만이 해당 Role를 사용함)
ROLE_ARN=$(aws eks describe-cluster --name ${CLUSTER_NAME} \
  --query 'cluster.roleArn' --output text)
aws iam get-role --role-name ${ROLE_ARN##*/} \
  --query 'Role.AssumeRolePolicyDocument' 
---
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EKSClusterAssumeRole",
            "Effect": "Allow",
            "Principal": {
                "Service": "eks.amazonaws.com"
            },
            "Action": [
                "sts:TagSession",
                "sts:AssumeRole"
            ]
        }
    ]
}

# 정책 확인
aws iam list-attached-role-policies --role-name ${ROLE_ARN##*/}
---
{
    "AttachedPolicies": [
        {
            "PolicyName": "eksworkshop-eksctl-cluster-ClusterEncryption2026042908213083750000001b",
            "PolicyArn": "arn:aws:iam::-:policy/eksworkshop-eksctl-cluster-ClusterEncryption2026042908213083750000001b"
        },
        {
            "PolicyName": "eksworkshop-eksctl-cluster-20260429082101297900000004",
            "PolicyArn": "arn:aws:iam::-:policy/eksworkshop-eksctl-cluster-20260429082101297900000004"
        },
        {
            "PolicyName": "AmazonEKSClusterPolicy",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
        },
        {
            "PolicyName": "AmazonEKSVPCResourceController",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
        }
    ]
}

3. 보안 그룹 : 컨트롤 플레인 <> 노드 플레인 간 보안 그룹 확인이 필요합니다.

클러스터를 생성하면 보안 그룹에 해당 태그가 자동으로 추가됩니다.

EKS를 기본적으로 생성하는 경우, 컨트롤 플레인 <> 노드 플레인 간 통신이 허용되며, 노드는 모든 목적지로 트래픽을 보낼 수 있습니다.

아웃바운드 트래픽 제한이 필요한 경우 아래 최소 요구 사항을 참고해야 합니다.

추가적인 트래픽 사항으로 아래 사항에 대해 확인이 필요합니다.

외부 인터넷 액세스(선택 사항) : 노드에 인터넷 액세스가 필요한 경우(예: EKS API 호출 또는 초기 등록) 특정 포트에 대한 외부 규칙을 구성하십시오. 프라이빗 클러스터의 경우 인터넷 액세스가 필요하지 않을 수 있습니다. 프라이빗 클러스터 요구 사항 참고
컨테이너 이미지 액세스 확인 : 노드는 이미지를 가져오기 위해 컨테이너 레지스트리(예: Amazon ECR 또는 DockerHub)에 액세스해야 합니다.
IPv4/IPv6에 대한 별도 규칙: VPC에서 두 가지 주소 체계를 모두 사용하는 경우 각각에 대해 별도의 규칙이 필요합니다.

(참고) AWS 주소 범위 찾는 방법 구문

# 스크립트 다운
curl -O https://ip-ranges.amazonaws.com/ip-ranges.json

# 특정 자원에 대한 주소 확인(GLOBALACCELERATOR)
jq -r '.prefixes[] | select(.service=="GLOBALACCELERATOR") | .ip_prefix' < ip-ranges.json

# 특정 지역 / 특정 자원에 대한 주소 확인
jq -r '.prefixes[] | select(.region=="us-east-1") | select(.service=="GLOBALACCELERATOR") | .ip_prefix' < ip-ranges.json

Velero 를 통한 백업

Velero의 각 작업(주문형 백업, 예약 백업, 복원)은 Kubernetes 사용자 정의 리소스 정의(CRD) 로 정의되고 etcd 에 저장되는 사용자 정의 리소스입니다 . Velero에는 백업, 복원 및 모든 관련 작업을 수행하기 위해 사용자 정의 리소스를 처리하는 컨트롤러도 포함되어 있습니다.

클러스터의 모든 객체를 백업하거나 복원할 수 있으며, 유형, 네임스페이스 및/또는 레이블별로 객체를 필터링할 수도 있습니다.

Velero는 재해 복구 사용 사례는 물론, 클러스터 업그레이드와 같은 시스템 작업을 수행하기 전에 애플리케이션 상태를 스냅샷하는 데 이상적입니다.

(velero 참고)

여기서는 velero 를 통해 S3에 백업을 진행하겠습니다. (참고)

1. velero 설치

# 리눅스 환경(velero 1.18.1 설치) 
wget https://github.com/velero-io/velero/releases/download/v1.14.1/velero-v1.14.1-linux-amd64.tar.gz
tar -xvf velero-v1.14.1-linux-amd64.tar.gz
sudo mv velero-v1.14.1-linux-amd64/velero /usr/local/bin/

# 버전 확인 
velero version

2. 백업용 S3 및 IAM User 생성

# 테스트용 S3 신규 생성 
BUCKET=velero-backup-s3
REGION=ap-northeast-2

aws s3api create-bucket \
    --bucket $BUCKET \
    --region $REGION \
    --create-bucket-configuration LocationConstraint=$REGION
    

# IAM User 생성 및 정책 부여
aws iam create-user --user-name velero

cat > velero-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}"
            ]
        }
    ]
}
EOF

# 접근 키 생성 
vi credentials-velero 
---
[default]
aws_access_key_id=<AWS_ACCESS_KEY_ID>
aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>

# 정책 생성
aws iam put-user-policy \
  --user-name velero \
  --policy-name velero \
  --policy-document file://velero-policy.json
  
  

# velero 설정 
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.14.0 \
  --bucket $BUCKET \
  --secret-file ./credentials-velero \
  --backup-location-config region=$REGION \
  --snapshot-location-config region=$REGION

3. velero 자원 확인

kubectl get -n velero all

4. 백업 및 복구 (PV 없이)

# 예제 파일 생성
git clone https://github.com/vmware-tanzu/velero.git
cd velero
kubectl apply -f examples/nginx-app/base.yaml


# 백업
velero backup create nginx-backup --include-namespaces nginx-example

# 객체 삭제
kubectl delete namespaces nginx-example

# 복구 
velero restore create --from-backup nginx-backup 


# 복구 확인 
ubectl get all -n nginx-example 
NAME                                  READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-8d6f5856-ftgfn   1/1     Running   0          15s
pod/nginx-deployment-8d6f5856-lgfs8   1/1     Running   0          15s

NAME               TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/my-nginx   LoadBalancer   172.20.147.198        80:31399/TCP   15s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-deployment   2/2     2            2           15s

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-deployment-8d6f5856   2         2         2       15s

PV 백업의 경우 --csi-snapshot-timeout 옵션을 통해 백업이 가능합니다. velero backup create nginx-backup --include-namespaces nginx-example --csi-snapshot-timeout=20m

컨트롤 플레인 업그레이드

https://catalog.us-east-1.prod.workshops.aws/event/dashboard/en-US/workshop/module-1/illustrative-scenario

컨트롤 플레인 버전 업그레이드에 따른 API 호환성은 EKS 콘솔에서 확인이 가능합니다.

K8S 메니페스트 API가 호환이 되지 않은 경우 마이그레이션이 필요합니다.

메니페스트 파일 마이그레이션은 kubectl-convert 로 가능합니다.

# 설치 
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl-convert"
sudo install -o root -g root -m 0755 kubectl-convert /usr/local/bin/kubectl-conver

# 실행 
kubectl convert --help

테스트

이전 Ingress 버전 API 의 매니페스트 파일을 생성하고 마이그레이션 동작을 확인하겠습니다.

# test-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: my-service
          servicePort: 80

마이그레이션 진행

kubectl convert -f test-ingress.yaml --output-version networking.k8s.io/v1
--
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - backend:
          service:
            name: my-service
            port:
              number: 80
        path: /
        pathType: ImplementationSpecific
status:
  loadBalancer: {}

-> 잘 됩니다.

EKS addon 업그레이드

EKS 버전별 addon 호환성은 다음을 통해 확인이 가능합니다.

# 사용 addon 확인
eksctl get addon --cluster $CLUSTER_NAME

# 버전 확인 
aws eks describe-addon-versions --addon-name coredns --kubernetes-version 1.31 --output table \
    --query "addons[].addonVersions[:10].{Version:addonVersion,DefaultVersion:compatibilities[0].defaultVersion}"
    
-------------------------------------------
|          DescribeAddonVersions          |
+-----------------+-----------------------+
| DefaultVersion  |        Version        |
+-----------------+-----------------------+
|  True           |  v1.11.4-eksbuild.33  |
|  False          |  v1.11.4-eksbuild.32  |
|  False          |  v1.11.4-eksbuild.28  |
|  False          |  v1.11.4-eksbuild.24  |
|  False          |  v1.11.4-eksbuild.22  |
|  False          |  v1.11.4-eksbuild.20  |
|  False          |  v1.11.4-eksbuild.14  |
|  False          |  v1.11.4-eksbuild.10  |
|  False          |  v1.11.4-eksbuild.2   |
|  False          |  v1.11.4-eksbuild.1   |

노드 그룹 업그레이드

인플레이스 방식의 Karpenter, Fargate 노드 업그레이드만 기술합니다.

Karpenter 관리 노드 업그레이드

Karpenter는 스케줄링이 불가능한 포드(Pod)들의 CPU, 메모리, 볼륨 요청 사항 및 기타 쿠버네티스 스케줄링 제약 조건(예: 어피니티, 포드 토폴로지 분산 제약 조건)을 종합적으로 분석하여, 이에 최적화된 크기의 노드를 프로비저닝하는 오픈소스 클러스터 오토스케일러입니다. 이를 통해 인프라 관리가 간소화됩니다.

또한, Karpenter는 노드 그룹(Node Groups)이나 Amazon EC2 오토 스케일링 그룹(ASG)과 같은 외부 인프라와 용량 관리를 연동하지 않습니다.

AMI 선택 및 구성 방식

AWS는 새로운 쿠버네티스 버전 출시는 물론, 패치 및 CVE(공통 보안 취약점 및 노출) 대응을 위해 수시로 AMI를 배포합니다. 사용자는 다양한 Amazon EKS 최적화 AMI 중에서 선택하거나, 직접 커스텀 AMI를 사용할 수도 있습니다.

최신 안정 버전인 Karpenter v1.0.0의 EC2NodeClass 리소스는 다음과 같은 특징을 가집니다:

지원 AMI 제품군: amiFamily 값으로 AL2, AL2023, Bottlerocket, Windows2019, Windows2022, Custom을 지원합니다.
Alias(별칭) 사용: spec.amiSelectorTerms.alias 객체를 지정하지 않을 때만 amiFamily 설정이 필수입니다. 만약 EC2NodeClass에 alias가 존재하면, Karpenter는 해당 제품군과 버전에 맞는 Amazon EKS 최적화 AMI를 자동으로 선택합니다. 이 방식을 통해 사용자는 특정 버전의 Amazon EKS 최적화 AMI를 고정(Pin)하여 사용할 수 있습니다.
세부 선택: 만약 alias 항목이 없다면 amiFamily를 구성해야 하며, amiSelectorTerms 내의 기존 태그(tags), 이름(name), 또는 ID 필드를 사용하여 AMI를 선택할 수 있습니다.

드리프트 (Drift)

Karpenter는 드리프트(Drift) 기능을 활용하여 롤링 업데이트 방식으로 쿠버네티스 노드를 업그레이드합니다. Karpenter로 프로비저닝된 노드가 설정된 사양과 일치하지 않게 되면(드리프트 발생), Karpenter는 새 노드를 먼저 생성한 뒤, 기존 노드에서 포드를 축출(Evict)하고 기존 노드를 삭제합니다.

노드 삭제 과정에서는 새 포드가 할당되지 않도록 노드를 차단(Cordon)하며, Kubernetes Eviction API를 사용하여 안전하게 포드를 이동시킵니다.

AMI에 관한 드리프트 동작은 다음 두 가지 경우로 나뉩니다.

1. 특정 AMI 값을 지정한 경우 (Drift with specified AMI values)

이 방식은 일관성을 위해 여러 애플리케이션 환경에 걸쳐 AMI 배포(Promotion)를 직접 제어하고자 할 때 적합합니다.

작동 방식: NodePool의 EC2NodeClass 내 AMI 정보를 변경하거나, NodePool에 연결된 EC2NodeClass 자체를 변경하면 Karpenter는 기존 노드들이 설정에서 벗어난 것으로 감지합니다.
AMI 지정 방법: amiSelectorTerms를 통해 AMI ID, 이름, 또는 특정 태그를 명시할 수 있습니다. 여러 AMI가 기준에 부합할 경우 가장 최신 AMI가 선택됩니다.
상태 확인: EC2NodeClass가 발견한 AMI 목록은 해당 리소스의 status 필드에서 확인할 수 있습니다 (예: kubectl describe ec2nodeclass [이름]).
교체 프로세스: 이전 AMI와 신규 AMI가 모두 검색되는 상황이라면, 이전 AMI를 사용하는 실행 중인 노드들은 드리프트된 것으로 간주되어 삭제되고, 새 AMI를 사용하는 노드로 교체됩니다.

2. Amazon EKS 최적화 AMI를 사용하는 경우 (Drift with Amazon EKS optimized AMIs)

alias 항목을 사용하여 EKS 최적화 AMI를 선택할 수 있습니다. 별칭은 family@version 형식을 따릅니다.

버전 지정: 버전 문자열은 latest로 설정하거나, 특정 AMI의 GitHub 릴리스 태그 형식을 사용하여 특정 버전에 고정(Pin)할 수 있습니다.
자동 모니터링: latest로 설정한 경우, Karpenter는 지정된 AMI 제품군에 대해 게시되는 SSM 파라미터를 모니터링합니다. 새로운 AMI가 출시된 것이 감지되면 기존 노드들을 드리프트 상태로 처리합니다

노드 자동 삭제를 위한 TTL (expireAfter)

Karpenter는 프로비저닝된 노드에 TTL(Time To Live)을 설정하여, 워크로드가 없는 노드나 만료 시간에 도달한 노드를 자동으로 삭제할 수 있습니다.

업그레이드 수단: 노드 만료는 노드를 퇴거시키고 최신 버전의 노드로 교체하는 업그레이드 수단으로 활용될 수 있습니다.
작동 방식: NodePool의 spec.disruption.expireAfter 값에 설정된 초(seconds)만큼 노드가 수명을 다하면, Karpenter는 해당 노드를 '만료'로 표시하고 중단(Disruption) 프로세스를 시작합니다.
보안 활용: 보안상의 이유로 노드를 주기적으로 교체(Recycle)해야 할 때 유용하게 사용할 수 있습니다.

중단 예산 (Disruption Budgets)

Karpenter의 자동화된 노드 교체 프로세스를 제어해야 하는 상황이 있습니다.

미션 크리티컬한 워크로드가 실행 중이라 특정 시간에는 자발적 중단(Voluntary Disruption)을 피해야 할 때
수백 개의 드리프트(Drift)가 동시에 발생하여 모든 노드가 한꺼번에 교체되는 것을 막고 싶을 때
공격적인 이미지 업그레이드를 진행하되, 특정 기간에는 중단을 막고 결국에는 만료 설정을 통해 모든 유저가 규정된 이미지를 쓰도록 강제하고 싶을 때

이를 위해 NodePool의 spec.disruption.budgets를 통해 Karpenter의 중단 속도를 제한할 수 있습니다.

주요 특징

속도 조절: 새 AMI가 적용된 노드로의 업그레이드 속도를 늦추거나, 특정 날짜 및 시간(schedule)에만 업그레이드가 일어나도록 설정할 수 있습니다.
제한 방식: 한 번에 중단할 수 있는 노드의 개수(숫자 또는 백분율)나 특정 스케줄에 따라 제한합니다.
기본값: 별도로 정의하지 않으면 Karpenter는 기본적으로 '한 번에 노드의 10%'를 중단할 수 있도록 설정합니다.
적용 범위: 이 예산은 드리프트(Drift), 비어 있음(Empty), 통합(Consolidation)으로 인한 자발적 중단에만 적용됩니다.

1. 업무 시간 중 중단 방지 예시

spec:
  disruption:
    budgets:
    - schedule: "0 9 * * mon-fri"
      duration: 8h
      nodes: 0      # 월~금 업무 시간(9시부터 8시간 동안)은 중단 노드 0개 (중단 금지)
    - nodes: 10     # 그 외 시간에는 최대 10개까지 동시 중단 허용

2. 차등 적용

spec:
  disruption:
    budgets:
    - nodes: "1"
      reasons:
      - Drifted        # 드리프트 발생 시에는 노드를 한 번에 1개씩만 교체
    - nodes: "100%"
      reasons:
      - Empty
      - Underutilized  # 비어 있거나 사용률이 낮은 노드는 한꺼번에 모두 정리 허용

3. 자발적 중단 차단

spec:
  disruption:
    budgets:
    - nodes: 0  # Karpenter가 자발적으로 노드를 삭제하는 것을 완전히 차단

현재 설정 확인

노드에 파드가 없어도 비용 절감 측면에서 교체되지는 않으나(Consolidate), 720h이 지나면 자동 교체

kubectl describe nodepool
..
Spec:
  Disruption:
    Budgets:
      Nodes:               10%  # 한 번에 전체 노드의 10%만 동시에 내릴 수 있도록 제한
    Consolidate After:     Never # 교체 조건 시간 무제한
    Consolidation Policy:  WhenEmpty # 노드에 포드가 하나도 없을 때만 노드를 삭제
  Limits:
    Cpu:  100
  Template:
    Metadata:
      Labels:
        Env:   dev
        Team:  checkout
    Spec:
      Expire After:  720h # 노드 생성 후 30일(720시간)이 지나면 자동으로 만료시켜 새 노드로 교체

노드 업그레이드

kubectl get pods -n checkout -o wide
--
NAME                             READY   STATUS    RESTARTS   AGE   IP           NODE                                        NOMINATED NODE   READINESS GATES
checkout-9c674566c-m9qcm         1/1     Running   0          22h   10.0.18.64   ip-10-0-21-240.us-west-2.compute.internal   <none>           <none>
checkout-redis-97ff8589d-vr7dv   1/1     Running   0          22h   10.0.20.92   ip-10-0-21-240.us-west-2.compute.internal   <none>           <none>


kubectl get nodes -l team=checkout 
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-10-184.us-west-2.compute.internal   Ready    <none>   27s   v1.30.14-eks-f69f56f
ip-10-0-21-240.us-west-2.compute.internal   Ready    <none>   22h   v1.30.14-eks-f69f56f

1.31 AMI 확인

 aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/standard/recommended/image_id \
    --region ${AWS_REGION} --query "Parameter.Value" --output text
    
 >> 
 ami-02b487869ba8cadfd

노드 풀, 노드 클래스 교체 진행

노드 클래스가 교체되면 Drift 상태로 마킹함
노드 풀에서 Drift 옵션을 노드로 설정하여 노드 1대씩 교체

# 노드 클래스 교체
kubectl edit ec2nodeclass 
--
spec:
  amiFamily: AL2023
  amiSelectorTerms:
  - id: ami-0f676a166352f02ab ->  ami-02b487869ba8cadfd 
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    
    
# 노드 풀 교체 (예산 옵션 수정) 
kubectl edit nodepool
--  
budgets:
    - nodes: "1"
      reasons:
      - Drifted
      

# 로그 확인      
kubectl -n karpenter logs deployment/karpenter -c controller --tail=33

필자의 경우 nodeclaim 초기화 이후 노드 교체가 안되었습니다.

kubectl get nodeclaims -o custom-columns=NAME:.metadata.name,NODE:.status.nodeName,DRIFTED:.status.conditions[?(@.type=="Drifted")].status
NAME   NODE   DRIFTED
error: unrecognized identifier Drifted

argocd 싱크로 전환되었음을 확인하였습니다. 로그는 다음과 같습니다.

kubectl get events -n argocd --field-selector involvedObject.name=karpenter
...
9m55s       Normal   OperationStarted     application/karpenter   Initiated automated sync to 'd16f1ae0df3b83c7b80ea3f5462da36825404d33'
9m55s       Normal   ResourceUpdated      application/karpenter   Updated sync status: Synced -> OutOfSync
9m55s       Normal   OperationCompleted   application/karpenter   Partial sync operation to d16f1ae0df3b83c7b80ea3f5462da36825404d33 succeeded
9m55s       Normal   ResourceUpdated      application/karpenter   Updated sync status: OutOfSync -> Synced
5m35s       Normal   OperationStarted     application/karpenter   Initiated automated sync to 'd16f1ae0df3b83c7b80ea3f5462da36825404d33'
5m35s       Normal   ResourceUpdated      application/karpenter   Updated sync status: Synced -> OutOfSync
5m35s       Normal   OperationCompleted   application/karpenter   Partial sync operation to d16f1ae0df3b83c7b80ea3f5462da36825404d33 succeeded
5m35s       Normal   ResourceUpdated      application/karpenter   Updated sync status: OutOfSync -> Synced
4m16s       Normal   OperationStarted     application/karpenter   Initiated automated sync to 'd16f1ae0df3b83c7b80ea3f5462da36825404d33'
4m16s       Normal   ResourceUpdated      application/karpenter   Updated sync status: Synced -> OutOfSync
4m16s       Normal   OperationCompleted   application/karpenter   Partial sync operation to d16f1ae0df3b83c7b80ea3f5462da36825404d33 succeeded
4m16s       Normal   ResourceUpdated      application/karpenter   Updated sync status: OutOfSync -> Synced
2m22s       Normal   OperationStarted     application/karpenter   Initiated automated sync to 'd16f1ae0df3b83c7b80ea3f5462da36825404d33'
2m22s       Normal   ResourceUpdated      application/karpenter   Updated sync status: Synced -> OutOfSync
2m22s       Normal   OperationCompleted   application/karpenter   Partial sync operation to d16f1ae0df3b83c7b80ea3f5462da36825404d33 succeeded
2m22s       Normal   ResourceUpdated      application/karpenter   Updated sync status: OutOfSync -> Synced
99s         Normal   OperationStarted     application/karpenter   Initiated automated sync to 'd16f1ae0df3b83c7b80ea3f5462da36825404d33'
99s         Normal   ResourceUpdated      application/karpenter   Updated sync status: Synced -> OutOfSync
99s         Normal   OperationCompleted   application/karpenter   Partial sync operation to d16f1ae0df3b83c7b80ea3f5462da36825404d33 succeeded
99s         Normal   ResourceUpdated      application/karpenter   Updated sync status: OutOfSync -> Synced


argocd app set karpenter --sync-policy manual

Fargate 노드 업그레이드

Fargate는 컨테이너에 필요한 컴퓨팅 용량을 온디맨드로 제공하는 기술입니다. Fargate를 사용하면 컨테이너 실행을 위해 가상 머신 그룹을 직접 프로비저닝, 구성 또는 확장할 필요가 없습니다. 또한 서버 유형을 선택하거나 노드 그룹 확장 시점을 결정하거나 클러스터 패킹을 최적화할 필요도 없습니다.

결론적으로 Fargate 노드를 업그레이드하려면 Kubernetes 배포를 다시 시작하여 새 Pod가 최신 Kubernetes 버전에 자동으로 스케줄링되도록 할 수 있습니다.

# 파드 확인 
kubectl get pods -n assets -o wide
---
OMINATED NODE   READINESS GATES
assets-784b5f5656-qcfwn   1/1     Running   0          32h   10.0.8.78   fargate-ip-10-0-8-78.us-west-2.compute.internal   <none>           <none>

# 노드 버전 확인
kubectl get node $(kubectl get pods -n assets -o jsonpath='{.items[0].spec.nodeName}') -o wide
---
NAME                                              STATUS   ROLES    AGE   VERSION                INTERNAL-IP   EXTERNAL-IP   OS-IMAGE   KERNEL-VERSION   CONTAINER-RUNTIME
fargate-ip-10-0-8-78.us-west-2.compute.internal   Ready    <none>   32h   v1.30.14-eks-d6694b8   10.0.8.78     <none>        Minimal    6.1.166          containerd://2.2.1+unknown

# 배포 재시작 
kubectl rollout restart deployment assets -n assets

# 노드 버전 확인
kubectl get node $(kubectl get pods -n assets -o jsonpath='{.items[0].spec.nodeName}') -o wide

참고

https://velero.io/docs/v1.14/customize-installation/

https://catalog.workshops.aws/eks-upgrades/en-US

'Cloud' 카테고리의 다른 글

GitOps를 사용하여 Amazon EKS에서 SaaS 애플리케이션 구축 실습 정리 (0)	2026.04.24
Vault VSO 맛보기 (0)	2025.12.07
HashiCorp Vault 맛보기 (0)	2025.11.29
KeyCloak SSO 실습 (0)	2025.11.23
ArgoCD로 멀티클러스터 관리하기 (0)	2025.11.23

현재글EKS 버전 업그레이드

호랑 테크 블로그

클라우드 엔지니어 HanHorang 블로그입니다. 피드백 댓글 환영합니다

llm, CloudOpsOne, t1014, cicd, kubernetes, Karpenter, kans3기, Terraform, kubeflow, aews3, ansible, eks, Ai, argocd, kans3, jenkins, Grafana, AEWS4기, cloudnet, AEWS,

Today :
Yesterday :

호랑 테크 블로그