Anomaly detection overview
Anomaly detection is a data mining technique that you can use to identify data
deviations in a given dataset. For example, if the return rate for a given
product increases substantially from the baseline for that product, that might
indicate a product defect or potential fraud. You can use anomaly detection to
detect critical incidents, such as technical issues, or opportunities, such as
changes in consumer behavior.
One challenge when you use anomaly detection is determining what counts as
anomalous data. If you have labeled data that identifies anomalies, you can
perform anomaly detection by using the
ML.PREDICT
function
with one of the following supervised machine learning models:
If you aren't certain what counts as anomalous data, or you don't have labeled
data to train a model on, you can use unsupervised machine learning to perform
anomaly detection. Use the
ML.DETECT_ANOMALIES
function
with one of the following models to detect anomalies in training data or new
serving data:
Recommended knowledge
By using the default settings in the CREATE MODEL
statements and the
inference functions, you can create and use an anomaly detection
model even without much ML knowledge. However, having basic knowledge about
ML development helps you optimize both your data and your model to
deliver better results. We recommend using the following resources to develop
familiarity with ML techniques and processes:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eAnomaly detection is a data mining technique used to identify deviations in datasets, which can signal product defects, fraud, or changes in consumer behavior.\u003c/p\u003e\n"],["\u003cp\u003eIf you have labeled data, supervised machine learning models like linear regression, boosted trees, random forest, DNN, Wide & Deep, and AutoML models can be used with the \u003ccode\u003eML.PREDICT\u003c/code\u003e function for anomaly detection.\u003c/p\u003e\n"],["\u003cp\u003eWhen you lack labeled data or are uncertain about what constitutes anomalous data, unsupervised machine learning can be employed with the \u003ccode\u003eML.DETECT_ANOMALIES\u003c/code\u003e function.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.DETECT_ANOMALIES\u003c/code\u003e function supports various model types, including ARIMA_PLUS, ARIMA_PLUS_XREG, K-means, Autoencoder, and PCA, each suited for different data types such as time series or independent and identically distributed random variables.\u003c/p\u003e\n"],["\u003cp\u003eBasic knowledge of ML can enhance anomaly detection results, and resources such as the Machine Learning Crash Course, Intro to Machine Learning, and Intermediate Machine Learning are recommended to develop this knowledge.\u003c/p\u003e\n"]]],[],null,["# Anomaly detection overview\n==========================\n\nAnomaly detection is a data mining technique that you can use to identify data\ndeviations in a given dataset. For example, if the return rate for a given\nproduct increases substantially from the baseline for that product, that might\nindicate a product defect or potential fraud. You can use anomaly detection to\ndetect critical incidents, such as technical issues, or opportunities, such as\nchanges in consumer behavior.\n\nOne challenge when you use anomaly detection is determining what counts as\nanomalous data. If you have labeled data that identifies anomalies, you can\nperform anomaly detection by using the\n[`ML.PREDICT` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict)\nwith one of the following supervised machine learning models:\n\n- [Linear and logistic regression models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-glm)\n- [Boosted trees models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-boosted-tree)\n- [Random forest models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-random-forest)\n- [Deep neural network (DNN) models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models)\n- [Wide \\& Deep models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-wnd-models)\n- [AutoML models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-automl)\n\nIf you aren't certain what counts as anomalous data, or you don't have labeled\ndata to train a model on, you can use unsupervised machine learning to perform\nanomaly detection. Use the\n[`ML.DETECT_ANOMALIES` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-detect-anomalies)\nwith one of the following models to detect anomalies in training data or new\nserving data:\n\nRecommended knowledge\n---------------------\n\nBy using the default settings in the `CREATE MODEL` statements and the\ninference functions, you can create and use an anomaly detection\nmodel even without much ML knowledge. However, having basic knowledge about\nML development helps you optimize both your data and your model to\ndeliver better results. We recommend using the following resources to develop\nfamiliarity with ML techniques and processes:\n\n- [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course)\n- [Intro to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning)\n- [Intermediate Machine Learning](https://www.kaggle.com/learn/intermediate-machine-learning)"]]