We use AWS EKS and the Karpenter autoscaler. Karpenter is cool because it can autoscale the cluster based on pod resource requests. For example, if I boot a pod that needs 7GB of RAM, it will deploy an m5.large, not a t4.micro.
Secondly, Karpenter can boot spot instances, or on-demand if there is no capacity. When spot termination occurs, karpenter will try to boot an instance in another AZ. If there is no capacity left at all, it can deploy on-demand until there is capacity.
the unit520 default provisioner looks like this:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["m5", "c5", "t4"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["micro", "small", "medium", "large"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
limits:
resources:
cpu: 1000
providerRef:
name: default
consolidation:
enabled: true
If I wanted to have a separate fleet of nodes that only ran specific workloads (for example, I don’t want my selenium instances running on the same node as my postgres pod) then I can add labels and taints. Here is the nodegroup I use to run load test workers:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: locust-workers
spec:
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["t3"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["micro", "small", "medium", "large"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
taints:
- key: locust
value: worker
effect: NoSchedule
labels:
locust: worker
limits:
resources:
cpu: 100
providerRef:
name: default
consolidation:
enabled: true
It should only run t3 instances, and based on my locust worker requests, it can deploy up to a .large instance on spot. Since the node is tainted, only workloads I explicitly configure to run there will schedule.