What are Kubernetes Taints and Tolerations? How do you use them to control Pod scheduling?
Kubernetes Taints and Tolerations are mechanisms for controlling Pod scheduling. They allow nodes to reject (or accept) Pods with specific tolerations.
Taints
Taints are key-value pairs applied to nodes that prevent Pods from being scheduled to that node unless the Pod has matching tolerations.
Components of a Taint
Each taint consists of three parts:
-
Key: The taint key (required)
-
Value: The taint value (optional)
-
Effect: The taint effect (required)
Taint Effect Types
-
NoSchedule:
- New Pods will not be scheduled to this node
- Existing Pods are not affected
- Suitable for dedicated nodes (such as GPU nodes)
-
PreferNoSchedule:
- Try not to schedule new Pods to this node
- But if no other nodes are available, scheduling may still occur
- Suitable for soft restrictions
-
NoExecute:
- New Pods will not be scheduled to this node
- Existing Pods without matching tolerations will be evicted
- Suitable for node maintenance or failure scenarios
Adding Taints
bash# Add NoSchedule taint kubectl taint nodes node1 key=value:NoSchedule # Add NoExecute taint kubectl taint nodes node1 key=value:NoExecute # Add taint without value kubectl taint nodes node1 key:NoSchedule
Viewing Taints
bash# View taints on a node kubectl describe node node1 | grep Taint # View taints on all nodes kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
Removing Taints
bash# Remove specific taint kubectl taint nodes node1 key:NoSchedule- # Remove all taints with specific key kubectl taint nodes node1 key-
Tolerations
Tolerations are configurations applied to Pods that allow the Pod to be scheduled to nodes with matching taints.
Components of a Toleration
Tolerations include the following fields:
-
Key: The taint key to tolerate
-
Operator: The operator (Equal or Exists)
-
Value: The taint value to tolerate (required when Operator is Equal)
-
Effect: The taint effect to tolerate
-
TolerationSeconds: Tolerant time (only applicable to NoExecute)
Toleration Operators
-
Equal:
- Both key and value must match
- Value must be specified
-
Exists:
- Only the key needs to match
- Value does not need to be specified
Adding Tolerations
yamlapiVersion: v1 kind: Pod metadata: name: my-pod spec: tolerations: - key: "key" operator: "Equal" value: "value" effect: "NoSchedule" containers: - name: my-container image: nginx
Toleration Examples
- Tolerate specific taint:
yamltolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule"
- Tolerate all taints with a specific key:
yamltolerations: - key: "dedicated" operator: "Exists"
- Tolerate all taints:
yamltolerations: - operator: "Exists"
- Tolerate NoExecute taint and set toleration time:
yamltolerations: - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300
Taint and Toleration Matching Rules
-
Key Matching:
- If the toleration key is empty, it matches all taints
- If the taint key is empty, it cannot be matched by any toleration
-
Operator Matching:
- Equal: Both key and value must match
- Exists: Only the key needs to match
-
Effect Matching:
- If the toleration effect is empty, it matches all effects
- Otherwise, the effect must match
Common Use Cases
1. Dedicated Nodes
Add taints to nodes for specific purposes to ensure only specific Pods can be scheduled to these nodes.
bash# Add taint to GPU node kubectl taint nodes gpu-node dedicated=gpu:NoSchedule
yaml# Only GPU Pods can be scheduled to GPU node apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: tolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule" containers: - name: gpu-container image: nvidia/cuda:11.0.3-base-ubuntu20.04
2. Node Maintenance
Use NoExecute taint to evict Pods for node maintenance.
bash# Mark node as under maintenance kubectl taint nodes node1 maintenance:NoExecute
3. Special Hardware Nodes
Add taints to nodes with special hardware to ensure only Pods that need this hardware can be scheduled.
bash# Add taint to SSD node kubectl taint nodes ssd-node disktype=ssd:NoSchedule
4. Failed Nodes
Kubernetes automatically adds taints to failed nodes, evicting Pods.
yaml# Pod tolerates node failure tolerations: - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300 - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300
Taints and Tolerations vs Affinity
| Feature | Taints and Tolerations | Affinity |
|---|---|---|
| Target | Nodes and Pods | Nodes and Pods |
| Direction | Node rejects Pod | Pod selects node |
| Flexibility | Lower | Higher |
| Use Cases | Dedicated nodes, node maintenance | Performance optimization, high availability |
Best Practices
-
Use Taints Reasonably: Avoid overusing taints, which may lead to scheduling failures
-
Add Taints to Dedicated Nodes: Ensure only specific Pods can be scheduled to dedicated nodes
-
Set Reasonable Toleration Times: Set reasonable toleration times for NoExecute taints to avoid frequent evictions
-
Combine with Affinity: Combine taints/tolerations with affinity for more fine-grained scheduling control
-
Monitor Node Status: Monitor node taint status and handle failed nodes in a timely manner
-
Document Taint Policies: Record taint and toleration usage policies for team collaboration
-
Test Toleration Configuration: Test toleration configuration in non-production environments to ensure correctness
Troubleshooting
- View Node Taints:
bashkubectl describe node <node-name>
- View Pod Tolerations:
bashkubectl describe pod <pod-name>
- Check Scheduling Failure Reasons:
bashkubectl describe pod <pod-name> | grep -A 10 Events
- View Scheduler Logs:
bashkubectl logs -n kube-system <scheduler-pod-name>
Example: Multi-Node Type Cluster
yaml# Master node apiVersion: v1 kind: Node metadata: name: master-node spec: taints: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" # GPU node apiVersion: v1 kind: Node metadata: name: gpu-node spec: taints: - key: "dedicated" value: "gpu" effect: "NoSchedule" # Normal Pod (can be scheduled to normal nodes) apiVersion: v1 kind: Pod metadata: name: normal-pod spec: containers: - name: nginx image: nginx # GPU Pod (can only be scheduled to GPU node) apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: tolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule" containers: - name: gpu-app image: nvidia/cuda:11.0.3-base-ubuntu20.04