Classification overview
A common use case for machine learning is classifying new data by using a model
trained on similar labeled data. For example, you might want to predict whether
an email is spam, or whether a customer product review is positive, negative, or
neutral.
You can use any of the following models in combination with the
ML.PREDICT
function
to perform classification:
Recommended knowledge
By using the default settings in the CREATE MODEL
statements and the
ML.PREDICT
function, you can create and use a classification model even
without much ML knowledge. However, having basic knowledge about
ML development helps you optimize both your data and your model to
deliver better results. We recommend using the following resources to develop
familiarity with ML techniques and processes:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eMachine learning classification involves using a model trained on labeled data to classify new data, such as identifying spam emails or categorizing customer reviews.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.PREDICT\u003c/code\u003e function can be used with various classification models, including logistic regression, boosted tree, random forest, deep neural network (DNN), wide & deep, and AutoML models.\u003c/p\u003e\n"],["\u003cp\u003eDifferent models can be specified using the \u003ccode\u003eMODEL_TYPE\u003c/code\u003e option, such as \u003ccode\u003eLOGISTIC_REG\u003c/code\u003e, \u003ccode\u003eBOOSTED_TREE_CLASSIFIER\u003c/code\u003e, \u003ccode\u003eRANDOM_FOREST_CLASSIFIER\u003c/code\u003e, \u003ccode\u003eDNN_CLASSIFIER\u003c/code\u003e, \u003ccode\u003eDNN_LINEAR_COMBINED_CLASSIFIER\u003c/code\u003e, and \u003ccode\u003eAUTOML_CLASSIFIER\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eWhile classification models can be created and used without extensive ML knowledge, understanding the basics can help optimize both data and the model for better results.\u003c/p\u003e\n"],["\u003cp\u003eResources like the Machine Learning Crash Course, Intro to Machine Learning, and Intermediate Machine Learning are recommended for gaining familiarity with machine learning techniques.\u003c/p\u003e\n"]]],[],null,["# Classification overview\n=======================\n\nA common use case for machine learning is classifying new data by using a model\ntrained on similar labeled data. For example, you might want to predict whether\nan email is spam, or whether a customer product review is positive, negative, or\nneutral.\n\nYou can use any of the following models in combination with the\n[`ML.PREDICT` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict)\nto perform classification:\n\n- [Logistic regression models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-glm): use [logistic regression](https://developers.google.com/machine-learning/crash-course/logistic-regression) by setting the `MODEL_TYPE` option to `LOGISTIC_REG`.\n- [Boosted tree models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-boosted-tree): use a [gradient boosted decision tree](https://developers.google.com/machine-learning/decision-forests/intro-to-gbdt) by setting the `MODEL_TYPE` option to `BOOSTED_TREE_CLASSIFIER`.\n- [Random forest models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-random-forest): use a [random forest](https://developers.google.com/machine-learning/decision-forests/intro-to-decision-forests) by setting the `MODEL_TYPE` option to `RANDOM_FOREST_CLASSIFIER`.\n- [Deep neural network (DNN) models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models): use a [neural network](https://developers.google.com/machine-learning/crash-course/neural-networks) by setting the `MODEL_TYPE` option to `DNN_CLASSIFIER`.\n- [Wide \\& Deep models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-wnd-models): use [wide \\& deep learning](https://dl.acm.org/doi/10.1145/2988450.2988454) by setting the `MODEL_TYPE` option to `DNN_LINEAR_COMBINED_CLASSIFIER`.\n- [AutoML models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-automl): use an [AutoML classification model](/vertex-ai/docs/tabular-data/classification-regression/overview) by setting the `MODEL_TYPE` option to `AUTOML_CLASSIFIER`.\n\nRecommended knowledge\n---------------------\n\nBy using the default settings in the `CREATE MODEL` statements and the\n`ML.PREDICT` function, you can create and use a classification model even\nwithout much ML knowledge. However, having basic knowledge about\nML development helps you optimize both your data and your model to\ndeliver better results. We recommend using the following resources to develop\nfamiliarity with ML techniques and processes:\n\n- [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course)\n- [Intro to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning)\n- [Intermediate Machine Learning](https://www.kaggle.com/learn/intermediate-machine-learning)"]]