High usage of the etcd database can cause cluster instability and resource shortages that prevent your Google Kubernetes Engine (GKE) clusters from scaling effectively.
Use this document to learn how to identify clusters where etcd usage is approaching its limit and find recommendations to free up space, helping to ensure that your cluster remains stable.
This information is important for Platform admins and operators responsible for maintaining the health and scalability of GKE clusters. For more information about the common roles and example tasks that we reference in Trusted Cloud by S3NS content, see Common GKE user roles and tasks.
This document covers troubleshooting cluster stability related to high etcd usage. If you experience a different scalability problem, one of the following documents might help:
Cluster autoscaler issues:
- For troubleshooting why new nodes aren't being added, see Troubleshoot cluster autoscaler not scaling up.
- For troubleshooting why underutilized nodes aren't being removed, see Troubleshoot cluster autoscaler not scaling down.
Horizontal Pod Autoscaler issues: for troubleshooting why your Horizontal Pod Autoscaler isn't working, see Troubleshoot horizontal Pod autoscaling.
Autopilot scaling issues: for more information about Autopilot-specific issues, including those related to scaling, see Troubleshoot Autopilot clusters.
Identify clusters where etcd usage is approaching the limit
GKE provides insights and recommendations for the scenario where etcd usage is approaching the limit. You can find these insights and recommendations in the following ways:
- Use the Trusted Cloud console. Go to the Kubernetes clusters page. In the Notifications column for specific clusters, check for the Free up space to reduce risk of cluster instability recommendation.
Use the gcloud CLI or Recommender API by specifying the
ETCD_DB_USAGE_APPROACHING_LIMIT
recommender subtype.To query for this recommendation, run the following command:
gcloud recommender recommendations list \ --recommender=google.container.DiagnosisRecommender \ --location LOCATION \ --project PROJECT_ID \ --format yaml \ --filter="recommenderSubtype:ETCD_DB_USAGE_APPROACHING_LIMIT"
To implement this recommendation, remove any unnecessary data from etcd to free up space. This might involve deleting old resources or moving large objects out of etcd. For more information, see Plan for large GKE clusters.
What's next
If you can't find a solution to your problem in the documentation, see Get support for further help, including advice on the following topics:
- Opening a support case by contacting Cloud Customer Care.
- Getting support from the community by
asking questions on StackOverflow
and using the
google-kubernetes-engine
tag to search for similar issues. You can also join the#kubernetes-engine
Slack channel for more community support. - Opening bugs or feature requests by using the public issue tracker.