Load balancing issues in Google Kubernetes Engine (GKE) can lead to service
disruptions, such as HTTP 502 errors, or prevent access to applications.
Use this document to learn how to troubleshoot 502 errors from external
Ingress and how to use load balancer logs and diagnostic tools, such as
check-gke-ingress, to identify problems.
This information is important for Platform admins and operators and Application developers who configure and maintain load-balanced services in GKE. For more information about the common roles and example tasks that we reference in Cloud de Confiance by S3NS content, see Common GKE user roles and tasks.
External Ingress produces HTTP 502 errors
Use the following guidance to troubleshoot HTTP 502 errors with external Ingress resources:
- Enable logs for each backend service associated with each GKE Service that is referenced by the Ingress.
- Use status details to identify causes for HTTP 502 responses. Status details that indicate the HTTP 502 response originated from the backend require troubleshooting within the serving Pods, not the load balancer.
Unmanaged instance groups
You might experience HTTP 502 errors with external Ingress resources if your external Ingress uses unmanaged instance group backends. This issue occurs when all of the following conditions are met:
- The cluster has a large total number of nodes among all node pools.
- The serving Pods for one or more Services that are referenced by the Ingress are located on only a few nodes.
- Services referenced by the Ingress use
externalTrafficPolicy: Local.
To determine if your external Ingress uses unmanaged instance group backends, do the following:
Go to the Ingress page in the Cloud de Confiance console.
Click the name of your external Ingress.
Click the name of the Load balancer. The Load balancing details page displays.
Check the table in the Backend services section to determine if your external Ingress uses NEGs or instance groups.
To resolve this issue, use one of the following solutions:
- Use a VPC-native cluster.
- Use
externalTrafficPolicy: Clusterfor each Service referenced by the external Ingress. This solution causes you to lose the original client IP address in the packet's sources. - Use the
node.kubernetes.io/exclude-from-external-load-balancers=trueannotation. Add the annotation to the nodes or node pools that don't run any serving Pod for any Service referenced by any external Ingress orLoadBalancerService in your cluster.
Use load balancer logs to troubleshoot
You can use internal passthrough Network Load Balancer logs and external passthrough Network Load Balancer logs to troubleshoot issues with load balancers and correlate traffic from load balancers to GKE resources.
Logs are aggregated per-connection and exported in near real time. Logs are generated for each GKE node involved in the data path of a LoadBalancer Service, for both ingress and egress traffic. Log entries include additional fields for GKE resources, such as:
- Cluster name
- Cluster location
- Service name
- Service namespace
- Pod name
- Pod namespace
Use diagnostic tools to troubleshoot
The check-gke-ingress diagnostic tool inspects Ingress resources for common
misconfigurations. You can use the check-gke-ingress tool in the following
ways:
- Run the
gcpdiagcommand-line tool on your cluster. Ingress results appear in the check rulegke/ERR/2023_004section. - Use the
check-gke-ingresstool alone or as a kubectl plugin by following the instructions in check-gke-ingress.