Some or all of the information on this page might not apply to Cloud de Confiance by S3NS. See Differences from Google Cloud for more details.

Request distribution for external Application Load Balancers

This document delves into the intricacies of how external Application Load Balancers load balancers handle connections, route traffic, and maintain session affinity.

How connections work

Cloud de Confiance's external Application Load Balancers—global and regional—streamline routing using distributed proxies (GFEs) or Envoy-managed subnets. With configurable timeouts, TLS termination, and built-in security, they ensure compliant, scalable application delivery worldwide or regionally.

Regional external Application Load Balancer connections

The regional external Application Load Balancer is a managed service implemented on the Envoy proxy. The regional external Application Load Balancer uses a shared subnet called a proxy-only subnet to provision a set of IP addresses that Google uses to run Envoy proxies on your behalf. The --purpose flag for this proxy-only subnet is set to REGIONAL_MANAGED_PROXY. All regional Envoy-based load balancers in a particular network and region share this subnet.

Clients use the load balancer's IP address and port to connect to the load balancer. Client requests are directed to the proxy-only subnet in the same region as the client. The load balancer terminates clients requests and then opens new connections from the proxy-only subnet to your backends. Therefore, packets sent from the load balancer have source IP addresses from the proxy-only subnet.

Depending on the backend service configuration, the protocol used by Envoy proxies to connect to your backends can be HTTP, HTTPS, or HTTP/2. If HTTP or HTTPS, the HTTP version is HTTP 1.1. HTTP keepalive is enabled by default, as specified in the HTTP 1.1 specification. The Envoy proxy sets both the client HTTP keepalive timeout and the backend keepalive timeout to a default value of 600 seconds each. You can update the client HTTP keepalive timeout but the backend keepalive timeout value is fixed. You can configure the request/response timeout by setting the backend service timeout. For more information, see timeouts and retries.

Client communications with the load balancer

Clients can communicate with the load balancer by using the HTTP/1.0, HTTP/1.1, HTTP/2, or HTTP/3 protocol.
When HTTPS is used, modern clients default to HTTP/2. This is controlled on the client, not on the HTTPS load balancer.
You cannot disable HTTP/2 by making a configuration change on the load balancer. However, you can configure some clients to use HTTP 1.1 instead of HTTP/2. For example, with curl, use the --http1.1 parameter.
External Application Load Balancers support the HTTP/1.1 100 Continue response.

For the complete list of protocols supported by external Application Load Balancer forwarding rules in each mode, see Load balancer features.

Source IP addresses for client packets

The source IP address for packets, as seen by the backends, is not the Cloud de Confiance external IP address of the load balancer. In other words, there are two TCP connections.

For the regional external Application Load Balancers:

Connection 1, from original client to the load balancer (proxy-only subnet):
- Source IP address: the original client (or external IP address if the client is behind a NAT gateway or a forward proxy).
- Destination IP address: your load balancer's IP address.
Connection 2, from the load balancer (proxy-only subnet) to the backend VM or endpoint:
- Source IP address: an IP address in the proxy-only subnet that is shared among all the Envoy-based load balancers deployed in the same region and network as the load balancer.
- Destination IP address: the internal IP address of the backend VM or container in the VPC network.

Special routing paths

Cloud de Confiance uses special routes not defined in your VPC network to route packets for the following types of traffic:

For health checks, except distributed Envoy health checks. For more information, see Paths for health checks.

Cloud de Confiance uses subnet routes for proxy-only subnets to route packets for the following types of traffic:

When using distributed Envoy health checks.

For regional external Application Load Balancers, Cloud de Confiance uses open-source Envoy proxies to terminate client requests to the load balancer. The load balancer terminates the TCP session and opens a new TCP session from the region's proxy- only subnet to your backend. Routes defined within your VPC network facilitate communication from Envoy proxies to your backends and from your backends to the Envoy proxies.

TLS termination

The following table summarizes how TLS termination is handled by external Application Load Balancers.

Load balancer mode	TLS termination
Regional external Application Load Balancer	TLS is terminated on Envoy proxies located in a proxy-only subnet in a region chosen by the user. Use this load balancer mode if you need geographic control over the region where TLS is terminated.

Timeouts and retries

External Application Load Balancers support the following types of timeouts for HTTP or HTTPS traffic:

Timeout type and description	Default values	Supports custom timeout values
Timeout type and description	Default values	Backend service timeout¹ A request and response timeout. Represents the maximum amount of time allowed between the load balancer sending the first byte of a request to the backend and the backend returning the last byte of the HTTP response to the load balancer. If the backend hasn't returned the entire HTTP response to the load balancer within this time limit, the remaining response data is dropped.	For all other backend types on a backend service: 30 seconds
Connection setup timeout The maximum time allowed for the proxy to establish a connection to the backend. A successful setup includes completing the TCP three-way handshake, and—if the backend protocol is HTTPS—completing the TLS handshake. For regional external Application Load Balancers, the load balancer's proxy is Envoy software.	4.5 seconds
Client TLS timeout The maximum amount of time that the load balancer's proxy waits for the TLS handshake to complete. For regional external Application Load Balancers, the load balancer's proxy is Envoy software.	10 seconds
Client HTTP keepalive timeout The maximum amount of time that the TCP connection between a client and the load balancer's proxy can be idle. (The same TCP connection might be used for multiple HTTP requests.) For regional external Application Load Balancers, the load balancer's proxy is Envoy software.	610 seconds
Backend HTTP keepalive timeout The maximum amount of time that the TCP connection between the load balancer's proxy and a backend can be idle. (The same TCP connection might be used for multiple HTTP requests.) For regional external Application Load Balancers, the load balancer's proxy is Envoy software.	For backend services: 10 minutes (600 seconds)

¹Not configurable for serverless NEG backends. Not configurable for backend buckets.

Backend service timeout

The configurable backend service timeout represents the maximum amount of time that the load balancer waits for your backend to process an HTTP request and return the corresponding HTTP response. Except for serverless NEGs, the default value for the backend service timeout is 30 seconds.

For example, if you want to download a 500-MB file, and the value of the backend service timeout is 90 seconds, the load balancer expects the backend to deliver the entire 500-MB file within 90 seconds. It is possible to configure the backend service timeout to be insufficient for the backend to send its complete HTTP response. In this situation, if the load balancer has at least received HTTP response headers from the backend, the load balancer returns the complete response headers and as much of the response body as it could obtain within the backend service timeout.

We recommend that you set the backend service timeout to the longest amount of time that you expect your backend to need in order to process an HTTP response. If the software running on your backend needs more time to process an HTTP request and return its entire response, we recommend that you increase the backend service timeout. For example, we recommend that you increase the timeout if you see HTTP 408 status code responses with jsonPayload.statusDetail client_timed_out errors.

The backend service timeout accepts values between 1 and 2,147,483,647 seconds; however, larger values aren't practical configuration options. Cloud de Confiance also doesn't guarantee that an underlying TCP connection can remain open for the entirety of the value of the backend service timeout. Client systems must implement retry logic instead of relying on a TCP connection to be open for long periods of time.

To configure the backend service timeout, use one of the following methods:

Console

Modify the Timeout field of the load balancer's backend service.

gcloud

Use the gcloud compute backend-services update command to modify the --timeout parameter of the backend service resource.

API

For a regional external Application Load Balancer, modify the timeoutSec parameter for the regionBackendServices resource.

Websocket connection timeouts aren't always the same as backend service timeouts. Websocket connection timeouts depend on the type of load balancer:

Load balancer mode	Default values	Timeout description for websockets
Regional external Application Load Balancer	backend service timeout: 30 seconds	Active websocket connections don't use the backend service timeout of the load balancer. Idle websocket connections are closed after the backend service times out. Cloud de Confiance periodically restarts or changes the number of serving Envoy software tasks. The longer the backend service timeout value is, the more likely it is that Envoy tasks restart or terminate TCP connections.

Load balancer mode

Default values

Timeout description for websockets

Regional external Application Load Balancer

backend service timeout: 30 seconds

Active websocket connections don't use the backend service timeout of the load balancer.

Idle websocket connections are closed after the backend service times out.

Cloud de Confiance periodically restarts or changes the number of serving Envoy software tasks. The longer the backend service timeout value is, the more likely it is that Envoy tasks restart or terminate TCP connections.

Regional external Application Load Balancers use the configured routeActions.timeout parameter of the URL maps and ignores the backend service timeout. When routeActions.timeout isn't configured, the value of the backend service timeout is used. When routeActions.timeout is supplied, the backend service timeout is ignored, and the value of routeActions.timeout is used as the request and response timeout instead.

Client HTTP keepalive timeout

The client HTTP keepalive timeout represents the maximum amount of time that a TCP connection can be idle between the (downstream) client and one of the following types of proxies:

For a regional external Application Load Balancer: an Envoy proxy

The client HTTP keepalive timeout represents the TCP idle timeout for the underlying TCP connections. The client HTTP keepalive timeout doesn't apply to websockets.

The default value for the client HTTP keepalive timeout is 610 seconds. For global and regional external Application Load Balancers, you can configure the client HTTP keepalive timeout with a value between 5 and 1200 seconds.

To configure the client HTTP keepalive timeout, use one of the following methods:

Console

Modify the HTTP keepalive timeout field of the load balancer's frontend configuration.

gcloud

For global external Application Load Balancers, use the gcloud compute target-http-proxies update command or the gcloud compute target-https-proxies update command to modify the --http-keep-alive-timeout-sec parameter of the target HTTP proxy or the target HTTPS proxy resource.

For a regional external Application Load Balancer, you cannot update the keepalive timeout parameter of a regional target HTTP(S) proxy directly. To update the keepalive timeout parameter of a regional target proxy, you need to do the following:

Create a new target proxy with the intended timeout settings.
Mirror all other settings from the current target proxy on the new one. For target HTTPS proxies, this includes linking any SSL certificates or certificate maps to the new target proxy.
Update the forwarding rules to point to the new target proxy.
Delete the previous target proxy.

API

For global external Application Load Balancers, modify the httpKeepAliveTimeoutSec parameter for the targetHttpProxies resource or the targetHttpsProxies resource.

Create a new target proxy with the intended timeout settings.
Mirror all other settings from the current target proxy on the new one. For target HTTPS proxies, this includes linking any SSL certificates or certificate maps to the new target proxy.
Update the forwarding rules to point to the new target proxy.
Delete the previous target proxy.

The load balancer's client HTTP keepalive timeout must be greater than the HTTP keepalive (TCP idle) timeout used by downstream clients or proxies. If a downstream client has a greater HTTP keepalive (TCP idle) timeout than the load balancer's client HTTP keepalive timeout, it's possible for a race condition to occur. From the perspective of a downstream client, an established TCP connection is permitted to be idle for longer than permitted by the load balancer. This means that the downstream client can send packets after the load balancer considers the TCP connection to be closed. When that happens, the load balancer responds with a TCP reset (RST) packet.

When the client HTTP keepalive timeout expires, either the GFE or the Envoy proxy sends a TCP FIN to the client to gracefully close the connection.

Backend HTTP keepalive timeout

External Application Load Balancers are proxies that use at least two TCP connections:

For a regional external Application Load Balancer, a first TCP connection exists between the (downstream) client and an Envoy proxy. The Envoy proxy then opens a second TCP connection to your backends.

The load balancer's secondary TCP connections might not get closed after each request; they can stay open to handle multiple HTTP requests and responses. The backend HTTP keepalive timeout defines the TCP idle timeout between the load balancer and your backends. The backend HTTP keepalive timeout doesn't apply to websockets.

The backend keepalive timeout is fixed at 10 minutes (600 seconds) and cannot be changed. This helps ensure that the load balancer maintains idle connections for at least 10 minutes. After this period, the load balancer can send termination packets to the backend at any time.

The load balancer's backend keepalive timeout must be less than the keepalive timeout used by software running on your backends. This avoids a race condition where the operating system of your backends might close TCP connections with a TCP reset (RST). Because the backend keepalive timeout for the load balancer isn't configurable, you must configure your backend software so that its HTTP keepalive (TCP idle) timeout value is greater than 600 seconds.

When the backend HTTP keepalive timeout expires, either the GFE or the Envoy proxy sends a TCP FIN to the backend VM to gracefully close the connection.

The following table lists the changes necessary to modify keepalive timeout values for common web server software.

Web server software	Parameter	Default setting	Recommended setting
Apache	KeepAliveTimeout	`KeepAliveTimeout 5`	`KeepAliveTimeout 620`
nginx	keepalive_timeout	`keepalive_timeout 75s;`	`keepalive_timeout 620s;`

Retries

Support for retry logic depends on the mode of the external Application Load Balancer.

Load balancer mode Retry logic

Regional external Application Load Balancer

Load balancer mode	Retry logic
Regional external Application Load Balancer	Configurable by using a retry policy in the URL map. The default number of retries (`numRetries`) is 1. The maximum number of retries that can be configured by using the retry policy is 25. The maximum configurable `perTryTimeout` is 24 hours. Without a retry policy, unsuccessful requests that have no HTTP body (for example, `GET` requests) that result in HTTP `502`, `503`, or `504` responses are retried once. HTTP `POST` requests aren't retried. Retried requests only generate one log entry for the final response.

Configurable by using a retry policy in the URL map. The default number of retries (numRetries) is 1. The maximum number of retries that can be configured by using the retry policy is 25. The maximum configurable perTryTimeout is 24 hours.

Without a retry policy, unsuccessful requests that have no HTTP body (for example, GET requests) that result in HTTP 502, 503, or 504 responses are retried once.

HTTP POST requests aren't retried.

Retried requests only generate one log entry for the final response.

The WebSocket protocol is supported with GKE Ingress.

Illegal request and response handling

The load balancer blocks both client requests and backend responses from reaching the backend or the client, respectively, for a number of reasons. Some reasons are strictly for HTTP/1.1 compliance and others are to avoid unexpected data being passed to or from the backends. None of the checks can be disabled.

The load balancer blocks the following requests for HTTP/1.1 compliance:

It cannot parse the first line of the request.
A header is missing the colon (:) delimiter.
Headers or the first line contain invalid characters.
The content length is not a valid number, or there are multiple content length headers.
There are multiple transfer encoding keys, or there are unrecognized transfer encoding values.
There's a non-chunked body and no content length specified.
Body chunks are unparseable. This is the only case where some data reaches the backend. The load balancer closes the connections to the client and backend when it receives an unparseable chunk.

Request handling

The load balancer blocks the request if any of the following are true:

The total size of request headers and the request URL exceeds the limit for the maximum request header size for external Application Load Balancers.
The request method does not allow a body, but the request has one.
The request contains an Upgrade header, and the Upgrade header is not used to enable WebSocket connections.
The HTTP version is unknown.

Response handling

The load balancer blocks the backend's response if any of the following are true:

The total size of response headers exceeds the limit for maximum response header size for external Application Load Balancers.
The HTTP version is unknown.

When handling both the request and response, the load balancer might remove or overwrite hop-by-hop headers in HTTP/1.1 before forwarding them to the intended destination.

Traffic distribution

When you add a backend instance group or NEG to a backend service, you specify a balancing mode, which defines a method measuring backend load and a target capacity. External Application Load Balancers support two balancing modes:

RATE, for instance groups or NEGs, is the target maximum number of requests (queries) per second (RPS, QPS). The target maximum RPS/QPS can be exceeded if all backends are at or above capacity.
UTILIZATION is the backend utilization of VMs in an instance group.

How traffic is distributed among backends depends on the mode of the load balancer.

Regional external Application Load Balancer

For regional external Application Load Balancers, traffic distribution is based on the load balancing mode and the load balancing locality policy.

The balancing mode determines the weight and fraction of traffic to send to each group (instance group or NEG). The load balancing locality policy (LocalityLbPolicy) determines how backends within the group are load balanced.

When a backend service receives traffic, it first directs traffic to a backend (instance group or NEG) according to the backend's balancing mode. After a backend is selected, traffic is then distributed among instances or endpoints in that backend group according to the load balancing locality policy.

For more information, see the following:

Session affinity

Session affinity, configured on the backend service of Application Load Balancers, provides a best-effort attempt to send requests from a particular client to the same backend as long as the number of healthy backend instances or endpoints remains constant, and as long as the previously selected backend instance or endpoint is not at capacity. The target capacity of the balancing mode determines when the backend is at capacity.

The following table outlines the different types of session affinity options supported for the different Application Load Balancers. In the section that follows, Types of session affinity, each session affinity type is discussed in further detail.

**Table:** Supported session affinity settings
Product	Session affinity options
Regional external Application Load Balancer	None (`NONE`) Client IP (`CLIENT_IP`) Generated cookie (`GENERATED_COOKIE`) Header field (`HEADER_FIELD`) HTTP cookie (`HTTP_COOKIE`) Stateful cookie-based affinity (`STRONG_COOKIE_AFFINITY`) Also note: The effective default value of the load balancing locality policy (`localityLbPolicy`) changes according to your session affinity settings. If session affinity is not configured—that is, if session affinity remains at the default value of `NONE`—then the default value for `localityLbPolicy` is `ROUND_ROBIN`. If session affinity is set to a value other than `NONE`, then the default value for `localityLbPolicy` is `MAGLEV`.

Keep the following in mind when configuring session affinity:

Don't rely on session affinity for authentication or security purposes. Session affinity, except for stateful cookie-based session affinity, can break whenever the number of serving and healthy backends changes. For more details, see Losing session affinity.
The default values of the --session-affinity and --subsetting-policy flags are both NONE, and only one of them at a time can be set to a different value.

Types of session affinity

The session affinity for external Application Load Balancers can be classified into one of the following categories:

Hash-based session affinity (NONE, CLIENT_IP)
HTTP header-based session affinity (HEADER_FIELD)
Cookie-based session affinity (GENERATED_COOKIE, HTTP_COOKIE, STRONG_COOKIE_AFFINITY)

Hash-based session affinity

For hash-based session affinity, the load balancer uses the consistent hashing algorithm to select an eligible backend. The session affinity setting determines which fields from the IP header are used to calculate the hash.

Hash-based session affinity can be of the following types:

None
Client IP affinity

None

A session affinity setting of NONE does not mean that there is no session affinity. It means that no session affinity option is explicitly configured.

Hashing is always performed to select a backend. And a session affinity setting of NONE means that the load balancer uses a 5-tuple hash to select a backend. The 5-tuple hash consists of the source IP address, the source port, the protocol, the destination IP address, and the destination port.

A session affinity of NONE is the default value.

Client IP affinity

Client IP session affinity (CLIENT_IP) is a 2-tuple hash created from the source and destination IP addresses of the packet. Client IP affinity forwards all requests from the same client IP address to the same backend, as long as that backend has capacity and remains healthy.

When you use client IP affinity, keep the following in mind:

The packet destination IP address is only the same as the load balancer forwarding rule's IP address if the packet is sent directly to the load balancer.
The packet source IP address might not match an IP address associated with the original client if the packet is processed by an intermediate NAT or proxy system before being delivered to a Cloud de Confiance load balancer. In situations where many clients share the same effective source IP address, some backend VMs might receive more connections or requests than others.

HTTP header-based session affinity

With header field affinity (HEADER_FIELD), requests are routed to the backends based on the value of the HTTP header in the consistentHash.httpHeaderName field of the backend service. To distribute requests across all available backends, each client needs to use a different HTTP header value.

Header field affinity is supported when the following conditions are true:

The load balancing locality policy is RING_HASH or MAGLEV.
The backend service's consistentHash specifies the name of the HTTP header (httpHeaderName).

Cookie-based session affinity can be of the following types:

Generated cookie affinity
HTTP cookie affinity
Stateful cookie-based session affinity

Generated cookie affinity

When you use generated cookie-based affinity (GENERATED_COOKIE), the load balancer includes an HTTP cookie in the Set-Cookie header in response to the initial HTTP request.

The name of the generated cookie varies depending on the type of the load balancer.

Product	Cookie name
Global external Application Load Balancers	`GCLB`
Classic Application Load Balancers	`GCLB`
Regional external Application Load Balancers	`GCILB`

The generated cookie's path attribute is always a forward slash (/), so it applies to all backend services on the same URL map, provided that the other backend services also use generated cookie affinity.

You can configure the cookie's time to live (TTL) value between 0 and 1,209,600 seconds (inclusive) by using the affinityCookieTtlSec backend service parameter. If affinityCookieTtlSec isn't specified, the default TTL value is 0.

When the client includes the generated session affinity cookie in the Cookie request header of HTTP requests, the load balancer directs those requests to the same backend instance or endpoint, as long as the session affinity cookie remains valid. This is done by mapping the cookie value to an index that references a specific backend instance or an endpoint, and by making sure that the generated cookie session affinity requirements are met.

To use generated cookie affinity, configure the following balancing mode and localityLbPolicy settings:

For backend instance groups, use the RATE balancing mode.
For the localityLbPolicy of the backend service, use either RING_HASH or MAGLEV. If you don't explicitly set the localityLbPolicy, the load balancer uses MAGLEV as an implied default.

For more information, see losing session affinity.

HTTP cookie affinity

When you use HTTP cookie-based affinity (HTTP_COOKIE), the load balancer includes an HTTP cookie in the Set-Cookie header in response to the initial HTTP request. You specify the name, path, and time to live (TTL) for the cookie.

All Application Load Balancers support HTTP cookie-based affinity.

You can configure the cookie's TTL values using seconds, fractions of a second (as nanoseconds), or both seconds plus fractions of a second (as nanoseconds) using the following backend service parameters and valid values:

consistentHash.httpCookie.ttl.seconds can be set to a value between 0 and 315576000000 (inclusive).
consistentHash.httpCookie.ttl.nanos can be set to a value between 0 and 999999999 (inclusive). Because the units are nanoseconds, 999999999 means .999999999 seconds.

If both consistentHash.httpCookie.ttl.seconds and consistentHash.httpCookie.ttl.nanos aren't specified, the value of the affinityCookieTtlSec backend service parameter is used instead. If affinityCookieTtlSec isn't specified, the default TTL value is 0.

When the client includes the HTTP session affinity cookie in the Cookie request header of HTTP requests, the load balancer directs those requests to the same backend instance or endpoint, as long as the session affinity cookie remains valid. This is done by mapping the cookie value to an index that references a specific backend instance or an endpoint, and by making sure that the generated cookie session affinity requirements are met.

To use HTTP cookie affinity, configure the following balancing mode and localityLbPolicy settings:

For backend instance groups, use the RATE balancing mode.
For the localityLbPolicy of the backend service, use either RING_HASH or MAGLEV. If you don't explicitly set the localityLbPolicy, the load balancer uses MAGLEV as an implied default.

For more information, see losing session affinity.

Stateful cookie-based session affinity

When you use stateful cookie-based affinity (STRONG_COOKIE_AFFINITY), the load balancer includes an HTTP cookie in the Set-Cookie header in response to the initial HTTP request. You specify the name, path, and time to live (TTL) for the cookie.

The following load balancers support stateful cookie-based affinity:

Regional external Application Load Balancers
Regional internal Application Load Balancers

You can configure the cookie's TTL values using seconds, fractions of a second (as nanoseconds), or both seconds plus fractions of a second (as nanoseconds). The duration represented by strongSessionAffinityCookie.ttl cannot be set to a value representing more than two weeks (1,209,600 seconds).

The value of the cookie identifies a selected backend instance or endpoint by encoding the selected instance or endpoint in the value itself. For as long as the cookie is valid, if the client includes the session affinity cookie in the Cookie request header of subsequent HTTP requests, the load balancer directs those requests to selected backend instance or endpoint.

Unlike other session affinity methods:

Stateful cookie-based affinity has no specific requirements for the balancing mode or for the load balancing locality policy (localityLbPolicy).
Stateful cookie-based affinity is not affected when autoscaling adds a new instance to a managed instance group.
Stateful cookie-based affinity is not affected when autoscaling removes an instance from a managed instance group unless the selected instance is removed.
Stateful cookie-based affinity is not affected when autohealing removes an instance from a managed instance group unless the selected instance is removed.

For more information, see losing session affinity.

Meaning of zero TTL for cookie-based affinities

All cookie-based session affinities, such as generated cookie affinity, HTTP cookie affinity, and stateful cookie-based affinity, have a TTL attribute.

A TTL of zero seconds means the load balancer does not assign an Expires attribute to the cookie. In this case, the client treats the cookie as a session cookie. The definition of a session varies depending on the client:

Some clients, like web browsers, retain the cookie for the entire browsing session. This means that the cookie persists across multiple requests until the application is closed.
Other clients treat a session as a single HTTP request, discarding the cookie immediately after.

Losing session affinity

All session affinity options require the following:

The selected backend instance or endpoint must remain configured as a backend. Session affinity can break when one of the following events occurs:
- You remove the selected instance from its instance group.
- Managed instance group autoscaling or autohealing removes the selected instance from its managed instance group.
- You remove the selected endpoint from its NEG.
- You remove the instance group or NEG that contains the selected instance or endpoint from the backend service.
The selected backend instance or endpoint must remain healthy. Session affinity can break when the selected instance or endpoint fails health checks.
For Global external Application Load Balancers and Classic Application Load Balancers, session affinity can break if a different first-layer Google Front End (GFE) is used for subsequent requests or connections after the change in routing path. A different first-layer GFE might be selected if the routing path from a client on the internet to Google changes between requests or connections.

Except for stateful cookie-based session affinity, all session affinity options have the following additional requirements:

The instance group or NEG that contains the selected instance or endpoint must not be full as defined by its target capacity. (For regional managed instance groups, the zonal component of the instance group that contains the selected instance must not be full.) Session affinity can break when the instance group or NEG is full and other instance groups or NEGs are not. Because fullness can change in unpredictable ways when using the UTILIZATION balancing mode, you should use the RATE or CONNECTION balancing mode to minimize situations when session affinity can break.
The total number of configured backend instances or endpoints must remain constant. When at least one of the following events occurs, the number of configured backend instances or endpoints changes, and session affinity can break:
- Adding new instances or endpoints:
  - You add instances to an existing instance group on the backend service.
  - Managed instance group autoscaling adds instances to a managed instance group on the backend service.
  - You add endpoints to an existing NEG on the backend service.
  - You add non-empty instance groups or NEGs to the backend service.
- Removing any instance or endpoint, not just the selected instance or endpoint:
  - You remove any instance from an instance group backend.
  - Managed instance group autoscaling or autohealing removes any instance from a managed instance group backend.
  - You remove any endpoint from a NEG backend.
  - You remove any existing, non-empty backend instance group or NEG from the backend service.
The total number of healthy backend instances or endpoints must remain constant. When at least one of the following events occurs, the number of healthy backend instances or endpoints changes, and session affinity can break:
- Any instance or endpoint passes its health check, transitioning from unhealthy to healthy.
- Any instance or endpoint fails its health check, transitioning from healthy to unhealthy or timeout.