Profile applicability: Level 2
Nodes in a degraded state are an unknown quantity and so may pose a security risk.
Kubernetes Engine's node auto-repair feature helps you keep the nodes in the cluster
in a healthy, running state. When enabled, Kubernetes Engine makes periodic checks
on the health state of each node in the cluster. If a node fails consecutive health
checks over an extended time period, Kubernetes Engine initiates a repair process
for that node.
![]() |
NoteNode auto-repair is enabled by default.
|
Impact
If multiple nodes require repair, Kubernetes Engine might repair them in parallel.
Kubernetes Engine limits number of repairs depending on the size of the cluster (bigger
clusters have a higher limit) and the number of broken nodes in the cluster (limit
decreases if many nodes are broken).
Node auto-repair is not available on Alpha Clusters.
Audit
Using Google Cloud Console:
- Go to Kubernetes Engine website.
- From the list of clusters, select the desired cluster. For each Node pool, view the
Node pool Details pane and ensure that under the Management heading, Auto-repair is
set to
Enabled
.
Using Command Line:
To check the existence of node auto-repair for an existing cluster's node pool, run:
gcloud container node-pools describe <node_pool_name> --cluster <cluster_name> --zone <compute_zone> --format json | jq '.management'
Ensure the output of the above command has JSON key attribute
autoRepair
set to true
:{ "autoRepair": true }
Remediation
Using Google Cloud Console:
- Go to Kubernetes Engine website.
- Select the Kubernetes cluster containing the node pool for which auto-repair is disabled.
- Select the Node pool by clicking on the name of the pool.
- Navigate to the Node pool details pane and click EDIT.
- Under the Management heading, check the Enable auto-repair box.
- Click SAVE.
- Repeat steps 2-6 for every cluster and node pool with auto-upgrade disabled.
Using Command Line:
To enable node auto-repair for an existing cluster's Node pool:
gcloud container node-pools update <node_pool_name> --cluster <cluster_name> --zone <compute_zone> --enable-autorepair