EKS Self Managed NodeGroups

Self-Managed Node Group in EKS to Handle Spot Instance Unavailability

USE CASE: Ensure that On-Demand Instances are automatically used if Spot Instances become unavailable.

In this post, I’ll explain how I created a self-managed node group in Amazon EKS (Elastic Kubernetes Service). My goal was to ensure that on-demand instances are automatically used if spot instances become unavailable. Let’s dive into the details!

Limitations of Managed Node Groups in EKS

When using Amazon EKS Managed Node Groups, AWS handles many of the heavy-lifting tasks involved in managing worker nodes for your Kubernetes cluster. While this abstraction simplifies node management, it also introduces certain limitations, particularly when you need more granular control over your infrastructure.

1. Lack of Access to the Underlying Auto Scaling Group (ASG)

One of the key limitations of EKS managed node groups is that you don’t have direct access to the Auto Scaling Group (ASG) backing your node group. AWS manages this ASG for you, and while it’s convenient for standard use cases, this abstraction restricts your ability to configure certain ASG settings directly. Here's what you cannot control:

  • Custom Scaling Policies: You can't modify scaling policies like specifying scaling triggers based on custom CloudWatch metrics or adjusting scaling cooldown times.
  • Capacity Rebalance: In a managed node group, you cannot configure Capacity Rebalancing, which would automatically replace spot instances if they are interrupted by AWS.
  • On-Demand and Spot Instance Allocation: Unlike self-managed node groups, you can’t directly configure mixed instance policies (i.e., adjusting how many spot or on-demand instances should be used) in the managed node group's ASG.

2. Limited Control Over ASG Overrides

Managed node groups provide only basic options for configuring instance types and limits. You cannot fine-tune instance overrides, which means less flexibility when specifying multiple instance types or controlling instance weighting to distribute workloads more effectively across varying instance sizes.

3. Lack of Direct Access to ASG Lifecycle Hooks

For more advanced lifecycle management, such as running scripts during instance launch or termination, EKS managed node groups don’t give you access to the underlying ASG lifecycle hooks. These hooks are crucial when you need to perform custom actions during these lifecycle events.

4. No Control Over ASG Tags

n In a self-managed node group, you can apply custom tags to your ASG, which can be useful for cost allocation, logging, or monitoring purposes. With managed node groups, this flexibility is lost, as AWS automatically manages the tags for you.

5. No Support for Advanced Networking or Instance Features

With managed node groups, you can’t configure certain advanced networking options like custom network interfaces or Elastic Fabric Adapter (EFA) for high-performance computing workloads. These options are typically available when you manage the underlying EC2 instances yourself.

While EKS Managed Node Groups provide a high level of abstraction and ease of use, they come with trade-offs in terms of flexibility and configurability. For teams or workloads that require fine-tuned control over the underlying infrastructure, especially at the Auto Scaling Group level, self-managed node groups are a better option despite the additional operational overhead.

If you require custom configurations such as advanced scaling policies, mixed instance types, or direct ASG lifecycle management, you might find these limitations of managed node groups restrictive.

Why Self-Managed Node Groups?

The primary reason for creating a self-managed node group is to have greater control over the configuration and management of EC2 instances. Although spot instances are cost-effective, they are not guaranteed, and AWS can reclaim them at any time. By setting up a self-managed node group, I ensure that if spot instances are unavailable, on-demand instances are automatically utilized, maintaining the desired capacity of my EKS cluster.


Node Group Configuration

Below is the Terraform configuration that sets up the self-managed node group.

Scripts

1. Join Cluster Script

The provided cloud-init configuration script ensures that the EC2 instances launched in the EKS node group are initialized correctly with the required packages and scripts, ensuring seamless node setup and integration into the Kubernetes cluster. Below is a breakdown of what the script does

 1#cloud-config
 2package_update: true
 3package_upgrade: true
 4packages:
 5    - jq
 6
 7runcmd:
 8    - [ sh, -c, "set -o xtrace"]
 9    - [ sh, -c, "sleep 60"]
10    - [ sh, -c, "/etc/eks/bootstrap.sh ${CLUSTER_NAME}"]
11    - curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
12    - curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"
13    - sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
14    - [ sh, -c, "echo 'kubectl has been installed successfully.'"]
15    - [ sh, -c, "aws eks update-kubeconfig --name ${CLUSTER_NAME} --region ${REGION}"]
16    - [ sh, -c, "echo '[Unit]' > /etc/systemd/system/init.service"]
17    - [ sh, -c, "echo 'Description=Init Service' >> /etc/systemd/system/init.service"]
18    - [ sh, -c, "echo '[Service]' >> /etc/systemd/system/init.service"]
19    - [ sh, -c, "echo 'ExecStart=/bin/bash /var/lib/cloud/instance/scripts/part-002' >> /etc/systemd/system/init.service"]
20    - [ sh, -c, "echo '' >> /etc/systemd/system/init.service"]
21    - [ sh, -c, "echo '[Install]' >> /etc/systemd/system/init.service"]
22    - [ sh, -c, "echo 'WantedBy=multi-user.target' >> /etc/systemd/system/init.service"]
23    - [ sh, -c, "sudo systemctl daemon-reload"]
24    - [ sh, -c, "sudo systemctl enable init.service"]
25    - [ sh, -c, "sudo systemctl start init.service"]
26
27output:
28    all: '| tee -a /var/log/cloud-init-output.log'

2. User Data Script (Labelling Nodes for Affinity)

The user data script is responsible for configuring the nodes at startup. It does the following:

  1. Exports the Kubeconfig for the Kubernetes API.
  2. Fetches the instance ID and node name.
  3. Labels the node based on the node group.
 1locals {
 2  labelling_script = <<-EOF
 3    #!/bin/bash
 4    export KUBECONFIG=/root/.kube/config
 5    REGION=us-east-1
 6    INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
 7    NODE_NAME=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --region $REGION --query 'Reservations[0].Instances[0].PrivateDnsName' --output text)
 8    /usr/local/bin/kubectl label nodes $NODE_NAME ${var.NODE_GROUP}=workload
 9  EOF
10}
 1data "template_file" "template_userdata" {
 2  template = "${file("${path.module}/userdata/cloud-init.yaml")}"
 3vars = {
 4  "CLUSTER_NAME" = "${var.CLUSTER_NAME}"
 5  "REGION" = data.aws_region.current.name
 6    }
 7}
 8data "template_cloudinit_config" "userdata" {
 9  gzip          = true
10  base64_encode = true
11  part {
12    filename     = "cloud-init.yaml"
13    content_type = "text/cloud-config"
14    content      = data.template_file.template_userdata.rendered
15  }
16  part {
17    content_type = "text/x-shellscript"
18    content      = local.labelling_script
19  }
20}
21resource "aws_iam_instance_profile" "this" {
22  name = "${var.NODE_GROUP}"
23  role = "EKS_NodeGroup_Role"
24}

The labelling_script labels the nodes as soon as they are provisioned, making them ready for use in the cluster.

Launch Template

The launch template defines the EC2 instances that will be created as part of the node group. It includes details like:

Instance type EBS volume size IAM instance profile User data

 1resource "aws_launch_template" "this" {
 2  name                    = "${var.NODE_GROUP}_launch_template"
 3  vpc_security_group_ids  = var.SECURITY_GROUP_IDS
 4  iam_instance_profile {
 5    name = aws_iam_instance_profile.this.name
 6  }
 7
 8  block_device_mappings {
 9    device_name = "/dev/xvda"
10    ebs {
11      volume_size           = var.VOLUME_SIZE
12      volume_type           = "gp2"
13      delete_on_termination = true
14    }
15  }
16
17  image_id   = var.IMAGE_ID
18  user_data  = data.template_cloudinit_config.userdata.rendered
19  monitoring {
20    enabled = var.ENABLE_MONITORING
21  }
22
23  tag_specifications {
24    resource_type = "instance"
25    tags = merge(
26      var.COMMON_TAGS,
27      var.TAGS,
28      {
29        Name = "EKS-SELF-MANAGED-NODE-${var.NODE_GROUP}"
30        "k8s.io/cluster-autoscaler/enabled"              = "true"
31        "k8s.io/cluster-autoscaler/${var.CLUSTER_NAME}"  = "true"
32        "kubernetes.io/cluster/${var.CLUSTER_NAME}"      = "owned"
33        NodeGroup                                        = var.NODE_GROUP
34      }
35    )
36  }
37}

This template makes sure that all instances launched as part of this node group are configured with the correct parameters.

Auto Scaling Group (ASG)

The Auto Scaling Group (ASG) is configured to handle both on-demand and spot instances using a mixed instance policy. This ensures that when spot instances are unavailable, the ASG automatically provisions on-demand instances to maintain the desired capacity.

 1resource "aws_autoscaling_group" "this" {
 2  name                = "eks-${var.NODE_GROUP}-asg"
 3  capacity_rebalance  = true
 4  desired_capacity    = var.DESIRED_CAPACITY
 5  max_size            = var.MAX_SIZE
 6  min_size            = var.MIN_SIZE
 7  vpc_zone_identifier = var.SUBNET_IDS
 8  health_check_type         = "EC2"
 9  health_check_grace_period = 300
10
11  mixed_instances_policy {
12    instances_distribution {
13      on_demand_allocation_strategy            = "prioritized"
14      on_demand_base_capacity                  = var.ON_DEMAND_BASE_CAPACITY
15      on_demand_percentage_above_base_capacity = var.ON_DEMAND_PERCENTAGE_ABOVE_BASE
16      spot_allocation_strategy                 = "capacity-optimized"
17    }
18
19    launch_template {
20      launch_template_specification {
21        launch_template_id = aws_launch_template.this.id
22      }
23      override {
24        instance_type     = var.INSTANCE_TYPE_1_W1
25        weighted_capacity = "1"
26      }
27      override {
28        instance_type     = var.INSTANCE_TYPE_2_W1
29        weighted_capacity = "1"
30      }
31      override {
32        instance_type     = var.INSTANCE_TYPE_3_W1
33        weighted_capacity = "1"
34      }
35      override {
36        instance_type     = var.INSTANCE_TYPE_1_W2
37        weighted_capacity = "2"
38      }
39      override {
40        instance_type     = var.INSTANCE_TYPE_2_W2
41        weighted_capacity = "2"
42      }
43      override {
44        instance_type     = var.INSTANCE_TYPE_3_W2
45        weighted_capacity = "2"
46      }
47    }
48  }
49
50  tag {
51    key                 = "k8s.io/cluster-autoscaler/${var.CLUSTER_NAME}"
52    value               = "owned"
53    propagate_at_launch = true
54  }
55  tag {
56    key                 = "k8s.io/cluster-autoscaler/enabled"
57    value               = "true"
58    propagate_at_launch = true
59  }
60}

The mixed_instances_policy allows for multiple instance types to be used. It also prioritizes on-demand instances in case spot instances are not available.

Calling the Module

Finally, I call the module with specific parameters, including the instance types, cluster name, desired capacity, and more.

 1module "NODEGROUP" {
 2  source = "../../modules/NODEGROUP"
 3
 4  CLUSTER_NAME                     = var.CLUSTER_NAME
 5  NODE_GROUP                       = "backend"
 6  SUBNET_IDS                       = data.terraform_remote_state.vpc.outputs.PUBLIC_SUBNET_IDS
 7  SECURITY_GROUP_IDS               = ["${data.terraform_remote_state.eks_sg.outputs.EKS_SG_ID_LT}"]
 8  VOLUME_SIZE                      = "20"
 9  IMAGE_ID                         = "ami-0147d3ab35a9exxxx"
10  INSTANCE_TYPE_1_W1               = "t3a.medium"     #weight 1
11  INSTANCE_TYPE_2_W1               = "t3.medium"      #weight 1
12  INSTANCE_TYPE_3_W1               = "t2.medium"      #weight 1
13  INSTANCE_TYPE_1_W2               = "t3.large"       #weight 2
14  INSTANCE_TYPE_2_W2               = "t3a.large"      #weight 2
15  INSTANCE_TYPE_3_W2               = "t4g.large"      #weight 2
16  MAX_SIZE                         = 40
17  DESIRED_CAPACITY                 = 30
18  MIN_SIZE                         = 25
19  ON_DEMAND_BASE_CAPACITY          = 0
20  ON_DEMAND_PERCENTAGE_ABOVE_BASE  = 25
21  ENABLE_MONITORING                = true
22  COMMON_TAGS                      = local.common_tags
23  TAGS                             = var.TAGS
24}

Variables

 1variable "NODE_GROUP" {
 2  type = string
 3}
 4
 5variable "SECURITY_GROUP_IDS" {
 6  type = list(string)
 7}
 8
 9variable "VOLUME_SIZE" {
10  type = number
11}
12
13variable "IMAGE_ID" {
14  type = string
15}
16
17variable "DESIRED_CAPACITY" {
18  type = number
19}
20
21variable "MAX_SIZE" {
22  type = number
23}
24
25variable "MIN_SIZE" {
26  type = number
27}
28
29variable "INSTANCE_TYPE_1_W1" {
30  type = string  
31}
32variable "INSTANCE_TYPE_2_W1" {
33  type = string  
34}
35variable "INSTANCE_TYPE_3_W1" {
36  type = string  
37}
38variable "INSTANCE_TYPE_1_W2" {
39  type = string  
40}
41variable "INSTANCE_TYPE_2_W2" {
42  type = string  
43}
44variable "INSTANCE_TYPE_3_W2" {
45  type = string  
46}
47
48variable "SUBNET_IDS" {
49  type = list(string)
50}
51
52variable "ON_DEMAND_BASE_CAPACITY" {
53  type = number
54}
55
56variable "ON_DEMAND_PERCENTAGE_ABOVE_BASE" {
57  type = number
58}
59
60variable "ENABLE_MONITORING" {
61  type = bool
62}
63
64variable "CLUSTER_NAME" {
65  type = string
66}
67
68variable "COMMON_TAGS" {
69  type = map(string)
70}
71
72variable "TAGS" {
73  type = map(string)
74}