We use AWS EKS and the Karpenter autoscaler. Karpenter is cool because it can autoscale the cluster based on pod resource requests. For example, if I boot a pod that needs 7GB of RAM, it will deploy an m5.large, not a t4.micro.

Secondly, Karpenter can boot spot instances, or on-demand if there is no capacity. When spot termination occurs, karpenter will try to boot an instance in another AZ. If there is no capacity left at all, it can deploy on-demand until there is capacity.

the unit520 default provisioner looks like this:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.k8s.aws/instance-family
      operator: In
      values: ["m5", "c5", "t4"]
    - key: karpenter.k8s.aws/instance-size
      operator: In
      values: ["micro", "small", "medium", "large"]
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
  limits:
    resources:
      cpu: 1000
  providerRef:
    name: default
  consolidation:
    enabled: true

If I wanted to have a separate fleet of nodes that only ran specific workloads (for example, I don’t want my selenium instances running on the same node as my postgres pod) then I can add labels and taints. Here is the nodegroup I use to run load test workers:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: locust-workers
spec:
  requirements:
    - key: karpenter.k8s.aws/instance-family
      operator: In
      values: ["t3"]
    - key: karpenter.k8s.aws/instance-size
      operator: In
      values: ["micro", "small", "medium", "large"]
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
  taints:
    - key: locust
      value: worker
      effect: NoSchedule
  labels:
    locust: worker
  limits:
    resources:
      cpu: 100
  providerRef:
    name: default
  consolidation:
    enabled: true

It should only run t3 instances, and based on my locust worker requests, it can deploy up to a .large instance on spot. Since the node is tainted, only workloads I explicitly configure to run there will schedule.