5月27日 18:31

What are Kubernetes Taints and Tolerations? How do you use them to control Pod scheduling?

Kubernetes Taints and Tolerations are mechanisms for controlling Pod scheduling. They allow nodes to reject (or accept) Pods with specific tolerations.

Taints

Taints are key-value pairs applied to nodes that prevent Pods from being scheduled to that node unless the Pod has matching tolerations.

Components of a Taint

Each taint consists of three parts:

Key: The taint key (required)
Value: The taint value (optional)
Effect: The taint effect (required)

Taint Effect Types

NoSchedule:
- New Pods will not be scheduled to this node
- Existing Pods are not affected
- Suitable for dedicated nodes (such as GPU nodes)
PreferNoSchedule:
- Try not to schedule new Pods to this node
- But if no other nodes are available, scheduling may still occur
- Suitable for soft restrictions
NoExecute:
- New Pods will not be scheduled to this node
- Existing Pods without matching tolerations will be evicted
- Suitable for node maintenance or failure scenarios

Adding Taints

bash
# Add NoSchedule taint
kubectl taint nodes node1 key=value:NoSchedule

# Add NoExecute taint
kubectl taint nodes node1 key=value:NoExecute

# Add taint without value
kubectl taint nodes node1 key:NoSchedule

Viewing Taints

bash
# View taints on a node
kubectl describe node node1 | grep Taint

# View taints on all nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Removing Taints

bash
# Remove specific taint
kubectl taint nodes node1 key:NoSchedule-

# Remove all taints with specific key
kubectl taint nodes node1 key-

Tolerations

Tolerations are configurations applied to Pods that allow the Pod to be scheduled to nodes with matching taints.

Components of a Toleration

Tolerations include the following fields:

Key: The taint key to tolerate
Operator: The operator (Equal or Exists)
Value: The taint value to tolerate (required when Operator is Equal)
Effect: The taint effect to tolerate
TolerationSeconds: Tolerant time (only applicable to NoExecute)

Toleration Operators

Equal:
- Both key and value must match
- Value must be specified
Exists:
- Only the key needs to match
- Value does not need to be specified

Adding Tolerations

yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"
  containers:
  - name: my-container
    image: nginx

Toleration Examples

Tolerate specific taint:

yaml
tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

Tolerate all taints with a specific key:

yaml
tolerations:
- key: "dedicated"
  operator: "Exists"

Tolerate all taints:

yaml
tolerations:
- operator: "Exists"

Tolerate NoExecute taint and set toleration time:

yaml
tolerations:
- key: "node.kubernetes.io/not-ready"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300

Taint and Toleration Matching Rules

Key Matching:
- If the toleration key is empty, it matches all taints
- If the taint key is empty, it cannot be matched by any toleration
Operator Matching:
- Equal: Both key and value must match
- Exists: Only the key needs to match
Effect Matching:
- If the toleration effect is empty, it matches all effects
- Otherwise, the effect must match

Common Use Cases

1. Dedicated Nodes

Add taints to nodes for specific purposes to ensure only specific Pods can be scheduled to these nodes.

bash
# Add taint to GPU node
kubectl taint nodes gpu-node dedicated=gpu:NoSchedule

yaml
# Only GPU Pods can be scheduled to GPU node
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"
  containers:
  - name: gpu-container
    image: nvidia/cuda:11.0.3-base-ubuntu20.04

2. Node Maintenance

Use NoExecute taint to evict Pods for node maintenance.

bash
# Mark node as under maintenance
kubectl taint nodes node1 maintenance:NoExecute

3. Special Hardware Nodes

Add taints to nodes with special hardware to ensure only Pods that need this hardware can be scheduled.

bash
# Add taint to SSD node
kubectl taint nodes ssd-node disktype=ssd:NoSchedule

4. Failed Nodes

Kubernetes automatically adds taints to failed nodes, evicting Pods.

yaml
# Pod tolerates node failure
tolerations:
- key: "node.kubernetes.io/not-ready"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300

Taints and Tolerations vs Affinity

Feature	Taints and Tolerations	Affinity
Target	Nodes and Pods	Nodes and Pods
Direction	Node rejects Pod	Pod selects node
Flexibility	Lower	Higher
Use Cases	Dedicated nodes, node maintenance	Performance optimization, high availability

Best Practices

Use Taints Reasonably: Avoid overusing taints, which may lead to scheduling failures
Add Taints to Dedicated Nodes: Ensure only specific Pods can be scheduled to dedicated nodes
Set Reasonable Toleration Times: Set reasonable toleration times for NoExecute taints to avoid frequent evictions
Combine with Affinity: Combine taints/tolerations with affinity for more fine-grained scheduling control
Monitor Node Status: Monitor node taint status and handle failed nodes in a timely manner
Document Taint Policies: Record taint and toleration usage policies for team collaboration
Test Toleration Configuration: Test toleration configuration in non-production environments to ensure correctness

Troubleshooting

View Node Taints:

bash
kubectl describe node <node-name>

View Pod Tolerations:

bash
kubectl describe pod <pod-name>

Check Scheduling Failure Reasons:

bash
kubectl describe pod <pod-name> | grep -A 10 Events

View Scheduler Logs:

bash
kubectl logs -n kube-system <scheduler-pod-name>

Example: Multi-Node Type Cluster

yaml
# Master node
apiVersion: v1
kind: Node
metadata:
  name: master-node
spec:
  taints:
  - key: "node-role.kubernetes.io/master"
    effect: "NoSchedule"

# GPU node
apiVersion: v1
kind: Node
metadata:
  name: gpu-node
spec:
  taints:
  - key: "dedicated"
    value: "gpu"
    effect: "NoSchedule"

# Normal Pod (can be scheduled to normal nodes)
apiVersion: v1
kind: Pod
metadata:
  name: normal-pod
spec:
  containers:
  - name: nginx
    image: nginx

# GPU Pod (can only be scheduled to GPU node)
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"
  containers:
  - name: gpu-app
    image: nvidia/cuda:11.0.3-base-ubuntu20.04

标签：Kubernetes