Some or all of the information on this page might not apply to Trusted Cloud by S3NS.
Clustering overview
Clustering is an unsupervised machine learning technique you can use to group
similar records together. It is a useful approach for when you want to
understand what groups or clusters you have in your data, but don't have
labeled data to train a model on. For example, if you had unlabeled data about
subway ticket purchases, you could cluster that data by ticket purchase time to
better understand what time periods have the heaviest subway usage. For more
information, see
What is clustering?
K-means models
are widely used to perform clustering. You can use k-means models with the
ML.PREDICT
function
to cluster data, or with the
ML.DETECT_ANOMALIES
function
to perform anomaly detection.
K-means models use
centroid-based clustering to organize data into clusters.
To get information about a k-mean model's centroids, you can use the
ML.CENTROIDS
function.
Recommended knowledge
By using the default settings in the CREATE MODEL
statements and the
inference functions, you can create and use a clustering model even
without much ML knowledge. However, having basic knowledge about
ML development, and clustering models in particular,
helps you optimize both your data and your model to
deliver better results. We recommend using the following resources to develop
familiarity with ML techniques and processes:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-07-02 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-07-02 UTC."],[[["Clustering is an unsupervised machine learning technique that groups similar records together, useful for understanding data patterns without labeled training data."],["K-means models, a widely used clustering method, can be used with `ML.PREDICT` to cluster data or with `ML.DETECT_ANOMALIES` for anomaly detection."],["K-means models utilize centroid-based clustering, and information about a model's centroids can be obtained using the `ML.CENTROIDS` function."],["While you can create and use clustering models with default settings without extensive machine learning knowledge, basic familiarity with ML and clustering models can improve results."]]],[]]