Feature preprocessing overview
Feature preprocessing is one of the most important steps in the machine
learning lifecycle. It consists of creating features and cleaning the training
data. Creating features is also referred as feature engineering.
BigQuery ML provides the following feature preprocessing techniques:
Automatic preprocessing. BigQuery ML performs automatic
preprocessing during training. For more information, see Automatic feature
preprocessing.
Manual preprocessing. You can use the TRANSFORM
clause
in the CREATE MODEL
statement to define custom preprocessing using manual
preprocessing
functions.
You can also use these functions outside of the TRANSFORM
clause to
process training data before creating the model.
You can use the ML.FEATURE_INFO
function to
retrieve the statistics of all input feature columns.
Recommended knowledge
By using the default settings in the CREATE MODEL
statements and the
inference functions, you can create and use BigQuery ML models
even without much ML knowledge. However, having basic knowledge about the
ML development lifecycle, such as feature engineering and model training,
helps you optimize both your data and your model to
deliver better results. We recommend using the following resources to develop
familiarity with ML techniques and processes:
What's next
Learn about feature serving in
BigQuery ML.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["\u003cp\u003eFeature preprocessing, encompassing both feature creation (engineering) and data cleaning, is a crucial step in the machine learning process.\u003c/p\u003e\n"],["\u003cp\u003eBigQuery ML offers automatic preprocessing during training, simplifying the process for users.\u003c/p\u003e\n"],["\u003cp\u003eManual preprocessing is also available in BigQuery ML, allowing for custom preprocessing definitions using the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause and specific functions.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.FEATURE_INFO\u003c/code\u003e function enables users to retrieve statistics about the input feature columns.\u003c/p\u003e\n"],["\u003cp\u003eBasic knowledge of the ML development lifecycle, including feature engineering and model training, is recommended for better optimization of data and models.\u003c/p\u003e\n"]]],[],null,[]]