Pod scheduling

This page describes how to configure the following, using the Operator:

These settings control how CockroachDB pods can be identified or scheduled onto worker nodes.

Note:

All kubectl steps should be performed in the namespace where you installed the Operator. By default, this is cockroach-operator-system.

Enable feature gates

The affinity and toleration rules are not yet fully supported. To enable them, download the Operator manifest and add the following line to the spec.containers.args field:

icon/buttons/copy
spec:
  containers:
  - args:
    - -feature-gates=TolerationRules=true,AffinityRules=true

Node selectors

A pod with a node selector will be scheduled onto a worker node that has matching labels, or key-value pairs.

Specify the labels in nodeSelector in the Operator's custom resource, which is used to deploy the cluster. If you specify multiple nodeSelector labels, the node must match all of them.

The following configuration causes CockroachDB pods to be scheduled onto worker nodes that have both the labels worker-pool-name=crdb-workers and kubernetes.io/arch=amd64:

icon/buttons/copy
spec:
  nodeSelector:
    worker-pool-name: crdb-workers
    kubernetes.io/arch: amd64

For an example of labeling nodes, see Scheduling CockroachDB onto labeled nodes.

Affinities and anti-affinities

Note:

To use the affinity rules, first enable the feature gates.

A pod with a node affinity seeks out worker nodes that have matching labels. A pod with a pod affinity seeks out pods that have matching labels. A pod with a pod anti-affinity avoids pods that have matching labels.

Affinities and anti-affinities can be used together with operator fields to:

  • Require CockroachDB pods to be scheduled onto a labeled worker node.
  • Require CockroachDB pods to be co-located with labeled pods (e.g., on a node or region).
  • Prevent CockroachDB pods from being scheduled onto a labeled worker node.
  • Prevent CockroachDB pods from being co-located with labeled pods (e.g., on a node or region).

For an example, see Scheduling CockroachDB onto labeled nodes.

Add a node affinity

Specify node affinities in affinity.nodeAffinity in the Operator's custom resource, which is used to deploy the cluster. If you specify multiple matchExpressions labels, the node must match all of them. If you specify multiple values for a label, the node can match any of the values.

The following configuration requires that CockroachDB pods are scheduled onto worker nodes running either an intel or amd64 CPU, with a preference against worker nodes in the us-east4-b availability zone.

icon/buttons/copy
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        -   matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values: 
            - intel
            - amd64
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: NotIn
            values:
            - us-east4-b

The requiredDuringSchedulingIgnoredDuringExecution node affinity rule, using the In operator, requires CockroachDB pods to be scheduled onto nodes with either the matching label kubernetes.io/arch=intel or kubernetes.io/arch=amd64. It will not evict pods that are already running on nodes that do not match the affinity requirements.

The preferredDuringSchedulingIgnoredDuringExecution node affinity rule, using the NotIn operator and specified weight, discourages (but does not disallow) CockroachDB pods from being scheduled onto nodes with the label topology.kubernetes.io/zone=us-east4-b. This achieves a similar effect as a PreferNoSchedule taint.

For more context on how these rules work, see the Kubernetes documentation. The custom resource definition details the fields supported by the Operator.

Add a pod affinity or anti-affinity

Specify pod affinities and anti-affinities in affinity.podAffinity and affinity.podAntiAffinity in the Operator's custom resource, which is used to deploy the cluster. If you specify multiple matchExpressions labels, the node must match all of them. If you specify multiple values for a label, the node can match any of the values.

The following configuration attempts to schedule CockroachDB pods in the same zones as the pods that run our example load generator app. It disallows CockroachDB pods from being co-located on the same worker node.

icon/buttons/copy
spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - loadgen
          topologyKey: topology.kubernetes.io/zone
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/instance
            operator: In
            values:
            - cockroachdb
        topologyKey: kubernetes.io/hostname

The preferredDuringSchedulingIgnoredDuringExecution pod affinity rule, using the In operator and specified weight, encourages (but does not require) CockroachDB pods to be co-located with pods labeled app=loadgen already running in the same zone, as specified with topologyKey.

The requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity rule, using the In operator, requires CockroachDB pods not to be co-located on a worker node, as specified with topologyKey.

For more context on how these rules work, see the Kubernetes documentation. The custom resource definition details the fields supported by the Operator.

Example: Scheduling CockroachDB onto labeled nodes

In this example, CockroachDB has not yet been deployed to a running Kubernetes cluster. We use a combination of node affinity and pod anti-affinity rules to schedule 3 CockroachDB pods onto 3 labeled worker nodes.

  1. List the worker nodes on the running Kubernetes cluster:

    icon/buttons/copy
    kubectl get nodes
    
    NAME                                         STATUS   ROLES    AGE   VERSION
    gke-cockroachdb-default-pool-263138a5-kp3v   Ready    <none>   3m56s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-263138a5-nn62   Ready    <none>   3m56s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-41796213-75c9   Ready    <none>   3m56s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-41796213-bw3z   Ready    <none>   3m54s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-ccd74623-dghs   Ready    <none>   3m54s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-ccd74623-p5mf   Ready    <none>   3m55s   v1.20.10-gke.301
    
  2. Add a node=crdb label to 3 of the running worker nodes.

    icon/buttons/copy
    kubectl label nodes gke-cockroachdb-default-pool-263138a5-kp3v gke-cockroachdb-default-pool-41796213-75c9 gke-cockroachdb-default-pool-ccd74623-dghs node=crdb
    
    node/gke-cockroachdb-default-pool-5726e554-77r7 labeled
    node/gke-cockroachdb-default-pool-ee4d4d67-0922 labeled
    node/gke-cockroachdb-default-pool-ee4d4d67-w18b labeled
    

    In this example, 6 GKE nodes are deployed in 3 node pools, and each node pool resides in a separate availability zone. To maintain an even distribution of CockroachDB pods as specified in our topology recommendations, each of the 3 labeled worker nodes must belong to a different node pool.

    Tip:

    This also ensures that the CockroachDB pods, which will be bound to persistent volumes in the same 3 availability zones, can be scheduled onto worker nodes in their respective zones.

  3. Add the following rules to the Operator's custom resource, which is used to deploy the cluster:

    icon/buttons/copy
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node
                operator: In
                values:
                - crdb
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/instance
                operator: In
                values:
                - cockroachdb
            topologyKey: kubernetes.io/hostname               
    

    The nodeAffinity rule requires CockroachDB pods to be scheduled onto worker nodes with the label node=crdb. The podAntiAffinity rule requires CockroachDB pods not to be co-located on a worker node, as specified with topologyKey.

  4. Apply the settings to the cluster:

    icon/buttons/copy
    kubectl apply -f example.yaml
    
  5. The CockroachDB pods will be deployed to the 3 labeled nodes. To observe this:

    icon/buttons/copy
    kubectl get pods -o wide
    
    NAME                                 READY   STATUS    RESTARTS   AGE    IP           NODE                                         NOMINATED NODE   READINESS GATES
    cockroach-operator-bfdbfc9c7-tbpsw   1/1     Running   0          171m   10.32.2.4    gke-cockroachdb-default-pool-263138a5-kp3v   <none>           <none>
    cockroachdb-0                        1/1     Running   0          100s   10.32.4.10   gke-cockroachdb-default-pool-ccd74623-dghs   <none>           <none>
    cockroachdb-1                        1/1     Running   0          100s   10.32.2.6    gke-cockroachdb-default-pool-263138a5-kp3v   <none>           <none>
    cockroachdb-2                        1/1     Running   0          100s   10.32.0.5    gke-cockroachdb-default-pool-41796213-75c9   <none>           <none>
    

Taints and tolerations

Note:

To use the toleration rules, first enable the feature gates.

When a taint is added to a Kubernetes worker node, pods are prevented from being scheduled onto that node. This effect is ignored by adding a toleration to a pod that specifies a matching taint.

Taints and tolerations are useful if you want to:

  • Prevent CockroachDB pods from being scheduled onto a labeled worker node.
  • Evict CockroachDB pods from a labeled worker node on which they are currently running.

For an example, see Evicting CockroachDB from a running worker node.

Add a toleration

Specify pod tolerations in the tolerations object of the Operator's custom resource, which is used to deploy the cluster.

The following toleration matches a taint with the specified key, value, and NoSchedule effect, using the Equal operator. A toleration that uses the Equal operator must include a value field:

icon/buttons/copy
spec:
  tolerations:
    - key: "test"
      operator: "Equal"
      value: "example"
      effect: "NoSchedule"

A NoSchedule taint on a node prevents pods from being scheduled onto the node. The matching toleration allows a pod to be scheduled onto the node. A NoSchedule toleration is therefore best included before deploying the cluster.

Note:

A PreferNoSchedule taint discourages, but does not disallow, pods from being scheduled onto the node.

The following toleration matches every taint with the specified key and NoExecute effect, using the Exists operator. A toleration that uses the Exists operator must exclude a value field:

icon/buttons/copy
spec:
  tolerations:
    - key: "test"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 3600

A NoExecute taint on a node prevents pods from being scheduled onto the node, and evicts pods from the node if they are already running on the node. The matching toleration allows a pod to be scheduled onto the node, and to continue running on the node if tolerationSeconds is not specified. If tolerationSeconds is specified, the pod is evicted after this number of seconds.

For more information on using taints and tolerations, see the Kubernetes documentation. The custom resource definition details the fields supported by the Operator.

Example: Evicting CockroachDB from a running worker node

In this example, CockroachDB has already been deployed on a Kubernetes cluster. We use the NoExecute effect to evict one of the CockroachDB pods from its worker node.

  1. List the worker nodes on the running Kubernetes cluster:

    icon/buttons/copy
    kubectl get nodes
    
    NAME                                         STATUS   ROLES    AGE   VERSION
    gke-cockroachdb-default-pool-4e5ce539-68p5   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-4e5ce539-j1h1   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-95fde00d-173d   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-95fde00d-hw04   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-eb2b2889-q15v   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-eb2b2889-q704   Ready    <none>   56m   v1.20.9-gke.1001
    
  2. Add a taint to a running worker node:

    icon/buttons/copy
    kubectl taint nodes gke-cockroachdb-default-pool-4e5ce539-j1h1 test=example:NoExecute
    
    node/gke-cockroachdb-default-pool-4e5ce539-j1h1 tainted
    
  3. Add a matching tolerations object to the Operator's custom resource, which was used to deploy the cluster:

    spec:
      tolerations:
        - key: "test"
          operator: "Exists"
          effect: "NoExecute"
    

    Because no tolerationSeconds is specified, CockroachDB will be evicted immediately from the tainted worker node.

  4. Apply the new settings to the cluster:

    icon/buttons/copy
    $ kubectl apply -f example.yaml
    
  5. The CockroachDB pod running on the tainted node (in this case, cockroachdb-2) will be evicted and started on a different worker node. To observe this:

    icon/buttons/copy
    kubectl get pods -o wide
    
    NAME                                 READY   STATUS    RESTARTS   AGE     IP           NODE                                         NOMINATED NODE   READINESS GATES
    cockroach-operator-c9fc6cb5c-bl6rs   1/1     Running   0          44m     10.32.2.4    gke-cockroachdb-default-pool-4e5ce539-68p5   <none>           <none>
    cockroachdb-0                        1/1     Running   0          9m21s   10.32.4.10   gke-cockroachdb-default-pool-95fde00d-173d   <none>           <none>
    cockroachdb-1                        1/1     Running   0          9m21s   10.32.2.6    gke-cockroachdb-default-pool-eb2b2889-q15v   <none>           <none>
    cockroachdb-2                        0/1     Running   0          6s      10.32.0.5    gke-cockroachdb-default-pool-4e5ce539-68p5   <none>           <none>
    

    cockroachdb-2 is now scheduled onto the gke-cockroachdb-default-pool-4e5ce539-68p5 node.

Resource labels and annotations

To assist in working with your cluster, you can add labels and annotations to your resources.

Specify labels in additionalLabels and annotations in additionalAnnotations in the Operator's custom resource, which is used to deploy the cluster:

icon/buttons/copy
spec:
  additionalLabels:
    app.kubernetes.io/version: v21.2.4
  additionalAnnotations:
    operator: https://github.com/cockroachdb/cockroach-operator/blob/master/install/operator.yaml

To verify that the labels and annotations were applied to a pod, for example, run kubectl describe pod {pod-name}.

For more information about labels and annotations, see the Kubernetes documentation.

YesYes NoNo