Troubleshoot load balancing in GKE


This page shows you how to resolve issues related to load balancing in Google Kubernetes Engine (GKE) clusters using Service, Ingress, or Gateway resources.

External Ingress produces HTTP 502 errors

Use the following guidance to troubleshoot HTTP 502 errors with external Ingress resources:

  1. Enable logs for each backend service associated with each GKE Service that is referenced by the Ingress.
  2. Use status details to identify causes for HTTP 502 responses. Status details that indicate the HTTP 502 response originated from the backend require troubleshooting within the serving Pods, not the load balancer.

Unmanaged instance groups

You might experience HTTP 502 errors with external Ingress resources if your external Ingress uses unmanaged instance group backends. This issue occurs when all of the following conditions are met:

  • The cluster has a large total number of nodes among all node pools.
  • The serving Pods for one or more Services that are referenced by the Ingress are located on only a few nodes.
  • Services referenced by the Ingress use externalTrafficPolicy: Local.

To determine if your external Ingress uses unmanaged instance group backends, do the following:

  1. Go to the Ingress page in the Trusted Cloud console.

    Go to Ingress

  2. Click the name of your external Ingress.

  3. Click the name of the Load balancer. The Load balancing details page displays.

  4. Check the table in the Backend services section to determine if your external Ingress uses NEGs or instance groups.

To resolve this issue, use one of the following solutions:

  • Use a VPC-native cluster.
  • Use externalTrafficPolicy: Cluster for each Service referenced by the external Ingress. This solution causes you to lose the original client IP address in the packet's sources.
  • Use the node.kubernetes.io/exclude-from-external-load-balancers=true annotation. Add the annotation to the nodes or node pools that don't run any serving Pod for any Service referenced by any external Ingress or LoadBalancer Service in your cluster.

Use load balancer logs to troubleshoot

You can use internal passthrough Network Load Balancer logs and external passthrough Network Load Balancer logs to troubleshoot issues with load balancers and correlate traffic from load balancers to GKE resources.

Logs are aggregated per-connection and exported in near real time. Logs are generated for each GKE node involved in the data path of a LoadBalancer Service, for both ingress and egress traffic. Log entries include additional fields for GKE resources, such as:

  • Cluster name
  • Cluster location
  • Service name
  • Service namespace
  • Pod name
  • Pod namespace

Use diagnostic tools to troubleshoot

The check-gke-ingress diagnostic tool inspects Ingress resources for common misconfigurations. You can use the check-gke-ingress tool in the following ways:

  • Run the gcpdiag command-line tool on your cluster. Ingress results appear in the check rule gke/ERR/2023_004 section.
  • Use the check-gke-ingress tool alone or as a kubectl plugin by following the instructions in check-gke-ingress.