Review service health and incidents

When your Google Kubernetes Engine (GKE) clusters or applications experience issues, it's crucial to quickly determine if the cause is internal or related to a wider Trusted Cloud by S3NS service disruption. Spending time on local debugging is inefficient if the root cause is a known platform incident.

Use this page to determine if an issue with your GKE cluster is caused by a wider Trusted Cloud by S3NS service disruption. Learn where to find official status updates, personalized health events, and service incident insights from the following sources:

  • Trusted Cloud by S3NS Service Health: status information for Trusted Cloud by S3NS services, by region.
  • Personalized Service Health: service disruptions relevant to your projects.
  • Service incident insights and recommendations: GKE clusters that are affected by an ongoing service incident.

This information is important for Platform admins and operators and Application developers who are troubleshooting and need to understand if observed issues are linked to a broader Trusted Cloud by S3NS service health event. For more information about the common roles and example tasks that we reference in Trusted Cloud by S3NS content, see Common GKE user roles and tasks.

Review Trusted Cloud by S3NS service health

The Trusted Cloud by S3NS Service Health page provides status information about the services that are part of Trusted Cloud by S3NS.

To review incidents related to GKE, go to the Trusted Cloud by S3NS Service Health page.

Go to all incidents reported for Google Kubernetes Engine

Review Personalized Service Health

Personalized Service Health lets you identify Trusted Cloud by S3NS service disruptions that are relevant to your projects. These disruptions are called service health events, and information about them is available in the Trusted Cloud console and a variety of integration points.

To review incidents related to GKE that are relevant to your projects, view service health events in the Personalized Service Health dashboard in the Trusted Cloud console.

Go to Personalized Service Health

You can filter incidents by service, location, relevance, and status. The dashboard also provides incident details such as scope of impact, symptoms, workarounds, and resolution progress updates. To get started, see Quickstart: View service health events in the Trusted Cloud console.

Review service incident insights and recommendations

Service incident insights and recommendations let you identify GKE clusters that are impacted by an ongoing service incident.

To get service incident insights, view insights and recommendations for the GKE_RELIABILITY_INCIDENT subtype. You can get insights by using the Trusted Cloud console, the Google Cloud CLI, or the Recommender API. For more information, see View insights and recommendations.

Insights and recommendations include the following information:

  • Impacted cluster: a cluster that's impacted by the incident.
  • Incident name: an incident identifier for reference when you communicate with Cloud Customer Care.
  • Incident description: information about the incident from the incident response team.
  • Last effective time: the last time that information about the incident was updated.
  • Mitigation action: mitigation action that's recommended by the incident response team, if available.

The service incident insight remains visible until the Trusted Cloud by S3NS incident response team mitigates the incident and determines that the insight is no longer relevant. There will be a delay between the time the incident is mitigated and no longer impacts your resources, and the time the insight is removed. If you implemented a workaround and no longer want to see the insight, you can dismiss it.

What's next