Some or all of the information on this page might not apply to Cloud de Confiance by S3NS. See Differences from Google Cloud for more details.

Troubleshoot load balancing in GKE

Autopilot Standard

Load balancing issues in Google Kubernetes Engine (GKE) can lead to service disruptions, such as HTTP 502 errors, or prevent access to applications.

Use this document to learn how to troubleshoot 502 errors from external Ingress and how to use load balancer logs and diagnostic tools, such as check-gke-ingress, to identify problems.

This information is important for Platform admins and operators and Application developers who configure and maintain load-balanced services in GKE. For more information about the common roles and example tasks that we reference in Cloud de Confiance by S3NS content, see Common GKE user roles and tasks.

External Ingress produces HTTP 502 errors

Use the following guidance to troubleshoot HTTP 502 errors with external Ingress resources:

Enable logs for each backend service associated with each GKE Service that is referenced by the Ingress.
Use status details to identify causes for HTTP 502 responses. Status details that indicate the HTTP 502 response originated from the backend require troubleshooting within the serving Pods, not the load balancer.

Unmanaged instance groups

You might experience HTTP 502 errors with external Ingress resources if your external Ingress uses unmanaged instance group backends. This issue occurs when all of the following conditions are met:

The cluster has a large total number of nodes among all node pools.
The serving Pods for one or more Services that are referenced by the Ingress are located on only a few nodes.
Services referenced by the Ingress use externalTrafficPolicy: Local.

To determine if your external Ingress uses unmanaged instance group backends, do the following:

Go to the Ingress page in the Cloud de Confiance console.

Go to Ingress
Click the name of your external Ingress.
Click the name of the Load balancer. The Load balancing details page displays.
Check the table in the Backend services section to determine if your external Ingress uses NEGs or instance groups.

To resolve this issue, use one of the following solutions:

Use a VPC-native cluster.
Use externalTrafficPolicy: Cluster for each Service referenced by the external Ingress. This solution causes you to lose the original client IP address in the packet's sources.
Use the node.kubernetes.io/exclude-from-external-load-balancers=true annotation. Add the annotation to the nodes or node pools that don't run any serving Pod for any Service referenced by any external Ingress or LoadBalancer Service in your cluster.

Use load balancer logs to troubleshoot

You can use internal passthrough Network Load Balancer logs and external passthrough Network Load Balancer logs to troubleshoot issues with load balancers and correlate traffic from load balancers to GKE resources.

Logs are aggregated per-connection and exported in near real time. Logs are generated for each GKE node involved in the data path of a LoadBalancer Service, for both ingress and egress traffic. Log entries include additional fields for GKE resources, such as:

Cluster name
Cluster location
Service name
Service namespace
Pod name
Pod namespace

Use diagnostic tools to troubleshoot

The check-gke-ingress diagnostic tool inspects Ingress resources for common misconfigurations. You can use the check-gke-ingress tool in the following ways:

Run the gcpdiag command-line tool on your cluster. Ingress results appear in the check rule gke/ERR/2023_004 section.
Use the check-gke-ingress tool alone or as a kubectl plugin by following the instructions in check-gke-ingress.