Reference documentation and code samples for the BigQuery API class Google::Cloud::Bigquery::Model.
Model
A model in BigQuery ML represents what an ML system has learned from the training data.
The following types of models are supported by BigQuery ML:
- Linear regression for forecasting; for example, the sales of an item on a given day. Labels are real-valued (they cannot be +/- infinity or NaN).
- Binary logistic regression for classification; for example, determining whether a customer will make a purchase. Labels must only have two possible values.
- Multiclass logistic regression for classification. These models can be used to predict multiple possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values. In BigQuery ML, multiclass logistic regression training uses a multinomial classifier with a cross entropy loss function.
- K-means clustering for data segmentation (beta); for example, identifying customer segments. K-means is an unsupervised learning technique, so model training does not require labels nor split data for training or evaluation.
In BigQuery ML, a model can be used with data from multiple BigQuery datasets for training and for prediction.
Inherits
- Object
Example
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model"
Methods
#created_at
def created_at() -> Time, nilThe time when this model was created.
-
(Time, nil) — The creation time, or
nilif the object is a reference (see #reference?).
#dataset_id
def dataset_id() -> StringThe ID of the Dataset containing this model.
-
(String) — The ID must contain only letters (
[A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.
#delete
def delete() -> BooleanPermanently deletes the model.
-
(Boolean) — Returns
trueif the model was deleted.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.delete
#description
def description() -> String, nilA user-friendly description of the model.
-
(String, nil) — The description, or
nilif the object is a reference (see #reference?).
#description=
def description=(new_description)Updates the user-friendly description of the model.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_description (String) — The new user-friendly description.
#encryption
def encryption() -> EncryptionConfiguration, nilThe EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.
Present only if this model is using custom encryption.
-
(EncryptionConfiguration, nil) — The encryption configuration.
@!group Attributes
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" encrypt_config = model.encryption
#encryption=
def encryption=(value)Set the EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.
Present only if this model is using custom encryption.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- value (EncryptionConfiguration) — The new encryption config.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name model.encryption = encrypt_config
#etag
def etag() -> String, nilThe ETag hash of the model.
-
(String, nil) — The ETag hash, or
nilif the object is a reference (see #reference?).
#exists?
def exists?(force: false) -> BooleanDetermines whether the model exists in the BigQuery service. The
result is cached locally. To refresh state, set force to true.
-
force (Boolean) (defaults to: false) — Force the latest resource representation to be
retrieved from the BigQuery service when
true. Otherwise the return value of this method will be memoized to reduce the number of API calls made to the BigQuery service. The default isfalse.
-
(Boolean) —
truewhen the model exists in the BigQuery service,falseotherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.exists? #=> true
#expires_at
def expires_at() -> Time, nilThe time when this model expires. If not present, the model will persist indefinitely. Expired models will be deleted and their storage reclaimed.
-
(Time, nil) — The expiration time, or
nilif not present or the object is a reference (see #reference?).
#expires_at=
def expires_at=(new_expires_at)Updates time when this model expires.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_expires_at (Integer) — The new time when this model expires.
#extract
def extract(extract_url, format: nil, &block) { |job| ... } -> BooleanExports the model to Google Cloud Storage using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #extract_job.
The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method. If the model is a full resource representation (see #resource_full?), the location of the job will automatically be set to the location of the model.
- extract_url (String) — The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.
-
format (String) (defaults to: nil) —
The exported file format. The default value is
ml_tf_saved_model.The following values are supported:
ml_tf_saved_model- TensorFlow SavedModelml_xgboost_booster- XGBoost Booster
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.
-
(Boolean) — Returns
trueif the extract operation succeeded.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.extract "gs://my-bucket/#{model.model_id}"
#extract_job
def extract_job(extract_url, format: nil, job_id: nil, prefix: nil, labels: nil) { |job| ... } -> Google::Cloud::Bigquery::ExtractJobExports the model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #extract.
The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method. If the model is a full resource representation (see #resource_full?), the location of the job will automatically be set to the location of the model.
- extract_url (String) — The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.
-
format (String) (defaults to: nil) —
The exported file format. The default value is
ml_tf_saved_model.The following values are supported:
ml_tf_saved_model- TensorFlow SavedModelml_xgboost_booster- XGBoost Booster
-
job_id (String) (defaults to: nil) — A user-defined ID for the extract job. The ID
must contain only letters (
[A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. Ifjob_idis provided, thenprefixwill not be used.See Generating a job ID.
-
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be
prepended to a generated value to produce a unique job ID. For
example, the prefix
daily_import_job_can be given to generate a job ID such asdaily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. Ifjob_idis provided, thenprefixwill not be used. -
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" extract_job = model.extract_job "gs://my-bucket/#{model.model_id}" extract_job.wait_until_done! extract_job.done? #=> true
#feature_columns
def feature_columns() -> Array<StandardSql::Field>The input feature columns that were used to train this model.
- (Array<StandardSql::Field>)
#label_columns
def label_columns() -> Array<StandardSql::Field>The label columns that were used to train this model. The output of the model will have a "predicted_" prefix to these columns.
- (Array<StandardSql::Field>)
#labels
def labels() -> Hash<String, String>, nilA hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.
The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.
- (Hash<String, String>, nil) — A hash containing key/value pairs.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" labels = model.labels
#labels=
def labels=(new_labels)Updates the hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
-
new_labels (Hash<String, String>) —
A hash containing key/value pairs. The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.labels = { "env" => "production" }
#location
def location() -> String, nilThe geographic location where the model should reside. Possible
values include EU and US. The default value is US.
- (String, nil) — The location code.
#model_id
def model_id() -> StringA unique ID for this model.
-
(String) — The ID must contain only letters (
[A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.
#model_type
def model_type() -> String, nilType of the model resource. Expected to be one of the following:
- LINEAR_REGRESSION - Linear regression model.
- LOGISTIC_REGRESSION - Logistic regression based classification model.
- KMEANS - K-means clustering model (beta).
- TENSORFLOW - An imported TensorFlow model (beta).
-
(String, nil) — The model type, or
nilif the object is a reference (see #reference?).
#modified_at
def modified_at() -> Time, nilThe date when this model was last modified.
-
(Time, nil) — The last modified time, or
nilif not present or the object is a reference (see #reference?).
#name
def name() -> String, nilThe name of the model.
-
(String, nil) — The friendly name, or
nilif the object is a reference (see #reference?).
#name=
def name=(new_name)Updates the name of the model.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_name (String) — The new friendly name.
#project_id
def project_id() -> StringThe ID of the Project containing this model.
- (String) — The project ID.
#reference?
def reference?() -> BooleanWhether the model was created without retrieving the resource representation from the BigQuery service.
-
(Boolean) —
truewhen the model is just a local reference object,falseotherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.reference? #=> true model.reload! model.reference? #=> false
#refresh!
def refresh!() -> Google::Cloud::Bigquery::ModelReloads the model with current data from the BigQuery service.
- (Google::Cloud::Bigquery::Model) — Returns the reloaded model.
Skip retrieving the model from the service, then load it:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.reference? #=> true model.reload! model.resource? #=> true
#reload!
def reload!() -> Google::Cloud::Bigquery::ModelReloads the model with current data from the BigQuery service.
- (Google::Cloud::Bigquery::Model) — Returns the reloaded model.
Skip retrieving the model from the service, then load it:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.reference? #=> true model.reload! model.resource? #=> true
#resource?
def resource?() -> BooleanWhether the model was created with a resource representation from the BigQuery service.
-
(Boolean) —
truewhen the model was created with a resource representation,falseotherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.resource? #=> false model.reload! model.resource? #=> true
#resource_full?
def resource_full?() -> BooleanWhether the model was created with a full resource representation from the BigQuery service.
-
(Boolean) —
truewhen the model was created with a full resource representation,falseotherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.resource_full? #=> true
#resource_partial?
def resource_partial?() -> BooleanWhether the model was created with a partial resource representation from the BigQuery service by retrieval through Dataset#models. See Models: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.
-
(Boolean) —
truewhen the model was created with a partial resource representation,falseotherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.models.first model.resource_partial? #=> true model.description # Loads the full resource. model.resource_partial? #=> false
#training_runs
def training_runs() -> Array<Google::Cloud::Bigquery::Model::TrainingRun>Information for all training runs in increasing order of startTime.
- (Array<Google::Cloud::Bigquery::Model::TrainingRun>)