← Blog

Kubernetes Resource Requests and Limits: A Practical Guide

Resource limits

Resource Limits are the upper limit on container resources. By default there are no limits. If you do not set any limits at all, containers can run completely unbounded and take up all of the resources on a node. Taking up all resources on a node can eventually lead to the node becoming unresponsive. The node needs memory and CPU for:

  • kubelet
  • container runtime
  • OS processes
  • networking components

Resource requests

Resource Requests are the amount of resources that a container is guaranteed to get. If a container requests more resources than are available on a node, it will not be scheduled. If a container requests less than it actually needs, it may be scheduled but may not perform well. If a container requests more than it actually needs, it may be scheduled but be throttled or evicted if it exceeds its memory limits.

Quality of service

The Kubernetes scheduler uses resource requests and limits to determine the Quality of Service (QoS) for a pod. There are three QoS classes:

  • Best Effort: no requests or limits — risky, and under heavy load can easily make nodes unresponsive.
  • Burstable: requests only — guaranteed to get your request, but once you exceed it you are subject to being throttled or terminated.
  • Guaranteed: request = limit — you get what you ask for, so there is less chance that a pod exceeding its limit will bring down a node or evict pods.

Kubernetes evicts pods in order of first Best Effort, then Burstable, then Guaranteed. This means that if a node is under resource pressure, the kubelet will first evict Best Effort pods, then Burstable pods, and finally Guaranteed pods.

Protect your nodes

A best practice is to configure kubeReserved and systemReserved in the kubelet configuration to reserve resources for the kubelet and system processes. This helps prevent the node from becoming unresponsive when containers consume too many resources.

Node Capacity
├─ System Reserved (systemReserved)
│   └─ OS daemons, sshd, systemd, etc.
├─ Kube Reserved (kubeReserved)
│   └─ kubelet, container runtime, kube-proxy
├─ Eviction Threshold
│   └─ Buffer for eviction decisions
└─ Allocatable
    └─ Available for pods

Therefore allocatable memory for pods is:

Allocatable = Node Capacity - systemReserved - kubeReserved - evictionThreshold

To figure out how to best set systemReserved you can use this formula:

CPU:     max(200m, node_cpu * 0.05)
Memory:  max(512Mi, node_memory * 0.05)
Storage: max(5Gi, node_storage * 0.02)

For kubeReserved you should scale with cluster size:

# Small cluster (< 100 nodes)
cpu: "500m"
memory: "1Gi"

# Medium cluster (100-500 nodes)
cpu: "1000m"
memory: "2Gi"

# Large cluster (500+ nodes)
cpu: "2000m"
memory: "4Gi"

These settings can be found in the kubelet configuration, for example:

# /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
# Reserve for OS processes
systemReserved:
  cpu: "500m"
  memory: "1Gi"
# Reserve for Kubernetes components
kubeReserved:
  cpu: "500m"
  memory: "1Gi"

Another best practice is to set eviction thresholds, also in the kubelet configuration:

# /var/lib/kubelet/config.yaml
evictionHard:
  memory.available: "500Mi"
  nodefs.available: "10%"
  imagefs.available: "15%"

This means:

  • If available memory drops below 500 MiB, kubelet starts evicting Pods.
  • If disk space becomes too low, kubelet also starts evicting Pods.

You can also add softer thresholds:

# /var/lib/kubelet/config.yaml
evictionSoft:
  memory.available: "1Gi"
evictionSoftGracePeriod:
  memory.available: "1m"

This gives Pods time to shut down gracefully before the hard threshold is reached. Regardless of the settings described, it is still a best practice to set resource requests and limits on your containers.

How to set requests and limits

CPU is a compressible resource, meaning it can be throttled (unlike memory). CPU requests are used by the kube-scheduler to determine which node a pod is scheduled on based on the cluster's available resources (limits are not considered during scheduling). You should always set CPU requests, but whether to set CPU limits is a heavily debated topic.

When a container hits its limit, the kernel CFS throttles it rather than killing it. The problem is that throttling is often unnecessary and surprisingly aggressive: the CFS quota is enforced per 100ms period, so a bursty workload can get throttled even when overall node CPU is sitting idle. You end up with artificial latency spikes for no benefit. So it is best practice to try to avoid setting CPU limits to get the most performance out of your Kubernetes cluster.

Memory is the opposite story. Memory is non-compressible, so you almost always do want memory limits (set requests == limits for Guaranteed QoS on critical pods) to avoid an OOM cascade taking out a node.

Practical defaults for most clusters:

  • CPU request: yes, sized to typical usage.
  • CPU limit: omit, or set very generously if you really need it.
  • Memory request and limit: yes, often equal for important workloads, especially in production.

Considerations

  • Leave 20–30% headroom on the cluster for both CPU and memory.
  • Always measure usage and fine-tune the reservations.
  • Monitoring cluster health and resource usage is key to understanding whether reservations and requests/limits are set up correctly.
  • Consider using tools like the Vertical Pod Autoscaler in recommendation mode or Robusta KRR (Kubernetes Resource Recommender) to figure out proper values for requests and limits.

← Back to all posts