Critical Workloads on ON-DEMAND Nodes
Optimizing Application High Availability
USE CASE: Ensure that Critical Backend Workloads run on On-Demand Instances.
If you’re looking to scale your application nodes while keeping costs in check, AWS Spot Instances might be your best bet. In this guide, we’ll explore how to leverage these instances effectively.
Understanding AWS Instance Types
AWS primarily offers two instance types: On-Demand and Spot.
-
On-Demand Instances: These are the most expensive because you pay for them as you use them, providing immediate capacity when needed.
-
Spot Instances: These are unused AWS capacity available at significantly lower prices—typically 80–90% cheaper than On-Demand instances. However, AWS can reclaim these instances when demand increases, usually with about two minutes' notice.
The Value of Node Groups
Integrating Spot Instances into your workload can lead to substantial cost savings. Here’s how to structure your node groups effectively:
- Critical Workloads: Always run critical applications on On-Demand instances to ensure reliability. This can be achieved with
affinity
.
1affinity:
2 nodeAffinity:
3 requiredDuringSchedulingIgnoredDuringExecution:
4 nodeSelectorTerms:
5 - matchExpressions:
6 - key: eks.amazonaws.com/capacityType
7 operator: In
8 values:
9 - on-demand
-
Stateful Sets: These should also be deployed on On-Demand instances.
-
Diverse Spot Node Groups: Create multiple node groups for Spot Instances to maximize your chances of availability.
-
Use Cluster Autoscaler: Implement the Cluster Autoscaler (CA) to facilitate automatic scaling of your nodes.
-
Self-Managed Node Groups: If you're using self-managed node groups, you need to explicitly add the necessary labels. For On-Demand instances, include the following in your Cloud-Init configuration:
1locals {
2 labelling_script = <<-EOF
3 #!/bin/bash
4 export KUBECONFIG=/root/.kube/config
5 REGION=us-east-1
6 INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
7 NODE_NAME=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --region $REGION --query 'Reservations[0].Instances[0].PrivateDnsName' --output text)
8 NODE_TYPE=$(curl -s --fail http://169.254.169.254/latest/meta-data/instance-life-cycle)
9 /usr/local/bin/kubectl label nodes $NODE_NAME eks.amazonaws.com/capacityType=$NODE_TYPE
10 EOF
11}
This step is crucial for the scheduler to recognize your instances correctly.
Implementing a Real-World Architecture
Let’s consider a practical scenario: you have critical applications running in an On-Demand node group, while less critical cron jobs or monitoring tools operate in a Spot node group.
Proposed Architecture: If your application requires 8 replicas, aim for a distribution of 4-5 replicas in the On-Demand node group and 3-4 in the Spot node group. This ensures that if Spot instances are reclaimed, On-Demand instances can manage the load until new Spot instances are available, allowing for “Zero Downtime.”
Step-by-Step Implementation
-
Create Node Groups: Set up both On-Demand and Spot node groups in your AWS environment.
-
Check Node Labels: After creating the node groups, you can verify the labels of a Spot node with this command:
1kubectl describe no ip-100-45-51-226 | grep SPOT
You should see:
1eks.amazonaws.com/capacityType=SPOT
For On-Demand nodes, it will display:
1eks.amazonaws.com/capacityType=ON_DEMAND
-
Deploy Your Application: Create a sample Nginx deployment with configurations that request the scheduler to allocate approximately 40% of the pods to Spot instances and 60% to On-Demand instances.
Adjust these percentages based on your specific needs.
-
Monitor Pod Scheduling: After deploying your configuration, monitor how the pods are allocated. Ideally, you should see around 6 pods on On-Demand nodes and 4 pods on Spot nodes, as planned.
Note: The scheduler allocates pods based on a best-effort basis, so the exact ratio may fluctuate. However, you should achieve a distribution close to your target.
Conclusion
Using AWS Spot Instances effectively can lead to significant cost savings while ensuring your applications remain responsive and available. By structuring your node groups thoughtfully and monitoring pod allocations, you can optimize your cloud resources.