Views:
Profile applicability: Level 2
Nodes in a degraded state are an unknown quantity and so may pose a security risk.
Kubernetes Engine's node auto-repair feature helps you keep the nodes in the cluster in a healthy, running state. When enabled, Kubernetes Engine makes periodic checks on the health state of each node in the cluster. If a node fails consecutive health checks over an extended time period, Kubernetes Engine initiates a repair process for that node.
Note
Note
Node auto-repair is enabled by default.

Impact

If multiple nodes require repair, Kubernetes Engine might repair them in parallel. Kubernetes Engine limits number of repairs depending on the size of the cluster (bigger clusters have a higher limit) and the number of broken nodes in the cluster (limit decreases if many nodes are broken).
Node auto-repair is not available on Alpha Clusters.

Audit

Using Google Cloud Console:
  1. Go to Kubernetes Engine website.
  2. From the list of clusters, select the desired cluster. For each Node pool, view the Node pool Details pane and ensure that under the Management heading, Auto-repair is set to Enabled.
Using Command Line:
To check the existence of node auto-repair for an existing cluster's node pool, run:
gcloud container node-pools describe <node_pool_name> --cluster 
<cluster_name> --zone <compute_zone> --format json | jq '.management'
Ensure the output of the above command has JSON key attribute autoRepair set to true:
{ 
    "autoRepair": true 
}

Remediation

Using Google Cloud Console:
  1. Go to Kubernetes Engine website.
  2. Select the Kubernetes cluster containing the node pool for which auto-repair is disabled.
  3. Select the Node pool by clicking on the name of the pool.
  4. Navigate to the Node pool details pane and click EDIT.
  5. Under the Management heading, check the Enable auto-repair box.
  6. Click SAVE.
  7. Repeat steps 2-6 for every cluster and node pool with auto-upgrade disabled.
Using Command Line:
To enable node auto-repair for an existing cluster's Node pool:
gcloud container node-pools update <node_pool_name> --cluster <cluster_name> 
--zone <compute_zone> --enable-autorepair