Reference documentation and code samples for the BigQuery API class Google::Cloud::Bigquery::LoadJob.
LoadJob
A Job subclass representing a load operation that may be performed on a Table. A LoadJob instance is created when you call Table#load_job.
Inherits
Example
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end load_job.wait_until_done! load_job.done? #=> true
Methods
#allow_jagged_rows?
def allow_jagged_rows?() -> BooleanChecks if the load operation accepts rows that are missing trailing
optional columns. The missing values are treated as nulls. If false,
records with missing trailing columns are treated as bad records, and
if there are too many bad records, an error is returned. The default
value is false. Only applicable to CSV, ignored for other formats.
-
(Boolean) —
truewhen jagged rows are allowed,falseotherwise.
#autodetect?
def autodetect?() -> BooleanChecks if BigQuery should automatically infer the options and schema
for CSV and JSON sources. The default is false.
-
(Boolean) —
truewhen autodetect is enabled,falseotherwise.
#backup?
def backup?() -> BooleanChecks if the source data is a Google Cloud Datastore backup.
-
(Boolean) —
truewhen the source format isDATASTORE_BACKUP,falseotherwise.
#clustering?
def clustering?() -> BooleanChecks if the destination table will be clustered.
See Updater#clustering_fields=, Table#clustering_fields and Table#clustering_fields=.
-
(Boolean) —
truewhen the table will be clustered, orfalseotherwise.
#clustering_fields
def clustering_fields() -> Array<String>, nilOne or more fields on which the destination table should be clustered. Must be specified with time-based partitioning, data in the table will be first partitioned and subsequently clustered. The order of the returned fields determines the sort order of the data.
BigQuery supports clustering for both partitioned and non-partitioned tables.
See Updater#clustering_fields=, Table#clustering_fields and Table#clustering_fields=.
-
(Array<String>, nil) — The clustering fields, or
nilif the destination table will not be clustered.
#csv?
def csv?() -> BooleanChecks if the format of the source data is CSV. The default is true.
-
(Boolean) —
truewhen the source format isCSV,falseotherwise.
#delimiter
def delimiter() -> StringThe delimiter used between fields in the source data. The default is a comma (,).
-
(String) — A string containing the character, such as
",".
#destination
def destination(view: nil) -> TableThe table into which the operation loads data. This is the table on which Table#load_job was invoked.
-
view (String) (defaults to: nil) — Specifies the view that determines which table information is returned.
By default, basic table information and storage statistics (STORAGE_STATS) are returned.
Accepted values include
:unspecified,:basic,:storage, and:full. For more information, see BigQuery Classes. The default value is the:unspecifiedview type.
- (Table) — A table instance.
#encryption
def encryption() -> Google::Cloud::BigQuery::EncryptionConfigurationThe encryption configuration of the destination table.
- (Google::Cloud::BigQuery::EncryptionConfiguration) — Custom encryption configuration (e.g., Cloud KMS keys).
#hive_partitioning?
def hive_partitioning?() -> BooleanChecks if hive partitioning options are set.
-
(Boolean) —
truewhen hive partitioning options are set, orfalseotherwise.
#hive_partitioning_mode
def hive_partitioning_mode() -> String, nilThe mode of hive partitioning to use when reading data. The following modes are supported:
AUTO: automatically infer partition key name(s) and type(s).STRINGS: automatically infer partition key name(s). All types are interpreted as strings.CUSTOM: partition key schema is encoded in the source URI prefix.
-
(String, nil) — The mode of hive partitioning, or
nilif not set.
#hive_partitioning_source_uri_prefix
def hive_partitioning_source_uri_prefix() -> String, nilThe common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:
gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro
When hive partitioning is requested with either AUTO or STRINGS mode, the common prefix can be either of
gs://bucket/path_to_table or gs://bucket/path_to_table/ (trailing slash does not matter).
-
(String, nil) — The common prefix for all source uris, or
nilif not set.
#ignore_unknown_values?
def ignore_unknown_values?() -> BooleanChecks if the load operation allows extra values that are not
represented in the table schema. If true, the extra values are
ignored. If false, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned. The default is false.
-
(Boolean) —
truewhen unknown values are ignored,falseotherwise.
#input_file_bytes
def input_file_bytes() -> IntegerThe number of bytes of source data in the load job.
- (Integer) — The number of bytes.
#input_files
def input_files() -> IntegerThe number of source data files in the load job.
- (Integer) — The number of source files.
#iso8859_1?
def iso8859_1?() -> BooleanChecks if the character encoding of the data is ISO-8859-1.
-
(Boolean) —
truewhen the character encoding is ISO-8859-1,falseotherwise.
#json?
def json?() -> BooleanChecks if the format of the source data is newline-delimited
JSON. The default is false.
-
(Boolean) —
truewhen the source format isNEWLINE_DELIMITED_JSON,falseotherwise.
#max_bad_records
def max_bad_records() -> IntegerThe maximum number of bad records that the load operation can ignore.
If the number of bad records exceeds this value, an error is returned.
The default value is 0, which requires that all records be valid.
- (Integer) — The maximum number of bad records.
#null_marker
def null_marker() -> StringSpecifies a string that represents a null value in a CSV file. For
example, if you specify \N, BigQuery interprets \N as a null value
when loading a CSV file. The default value is the empty string. If you
set this property to a custom value, BigQuery throws an error if an
empty string is present for all data types except for STRING and BYTE.
For STRING and BYTE columns, BigQuery interprets the empty string as
an empty value.
- (String) — A string representing null value in a CSV file.
#orc?
def orc?() -> BooleanChecks if the source format is ORC.
-
(Boolean) —
truewhen the source format isORC,falseotherwise.
#output_bytes
def output_bytes() -> IntegerThe number of bytes that have been loaded into the table. While an import job is in the running state, this value may change.
- (Integer) — The number of bytes that have been loaded.
#output_rows
def output_rows() -> IntegerThe number of rows that have been loaded into the table. While an import job is in the running state, this value may change.
- (Integer) — The number of rows that have been loaded.
#parquet?
def parquet?() -> BooleanChecks if the source format is Parquet.
-
(Boolean) —
truewhen the source format isPARQUET,falseotherwise.
#parquet_enable_list_inference?
def parquet_enable_list_inference?() -> Boolean, nilIndicates whether to use schema inference specifically for Parquet LIST logical type.
-
(Boolean, nil) — The
enable_list_inferencevalue in Parquet options, ornilif Parquet options are not set.
#parquet_enum_as_string?
def parquet_enum_as_string?() -> Boolean, nilIndicates whether to infer Parquet ENUM logical type as STRING instead of BYTES by default.
-
(Boolean, nil) — The
enum_as_stringvalue in Parquet options, ornilif Parquet options are not set.
#parquet_options?
def parquet_options?() -> BooleanChecks if Parquet options are set.
-
(Boolean) —
truewhen Parquet options are set, orfalseotherwise.
#quote
def quote() -> StringThe value that is used to quote data sections in a CSV file. The
default value is a double-quote ("). If your data does not contain
quoted sections, the value should be an empty string. If your data
contains quoted newline characters, #quoted_newlines? should return
true.
-
(String) — A string containing the character, such as
"\"".
#quoted_newlines?
def quoted_newlines?() -> BooleanChecks if quoted data sections may contain newline characters in a CSV
file. The default is false.
-
(Boolean) —
truewhen quoted newlines are allowed,falseotherwise.
#range_partitioning?
def range_partitioning?() -> BooleanChecks if the destination table will be range partitioned. See Creating and using integer range partitioned tables.
-
(Boolean) —
truewhen the table is range partitioned, orfalseotherwise.
#range_partitioning_end
def range_partitioning_end() -> Integer, nilThe end of range partitioning, exclusive. See Creating and using integer range partitioned tables.
-
(Integer, nil) — The end of range partitioning, exclusive, or
nilif not range partitioned.
#range_partitioning_field
def range_partitioning_field() -> String, nilThe field on which the destination table will be range partitioned, if any. The field must be a
top-level NULLABLE/REQUIRED field. The only supported type is INTEGER/INT64. See
Creating and using integer range partitioned
tables.
-
(String, nil) — The partition field, if a field was configured, or
nilif not range partitioned.
#range_partitioning_interval
def range_partitioning_interval() -> Integer, nilThe width of each interval. See Creating and using integer range partitioned tables.
-
(Integer, nil) — The width of each interval, for data in range partitions, or
nilif not range partitioned.
#range_partitioning_start
def range_partitioning_start() -> Integer, nilThe start of range partitioning, inclusive. See Creating and using integer range partitioned tables.
-
(Integer, nil) — The start of range partitioning, inclusive, or
nilif not range partitioned.
#schema
def schema() -> Schema, nilThe schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
The returned object is frozen and changes are not allowed. Use Table#schema to update the schema.
-
(Schema, nil) — A schema object, or
nil.
#schema_update_options
def schema_update_options() -> Array<String>Allows the schema of the destination table to be updated as a side
effect of the load job if a schema is autodetected or supplied in the
job configuration. Schema update options are supported in two cases:
when write disposition is WRITE_APPEND; when write disposition is
WRITE_TRUNCATE and the destination table is a partition of a table,
specified by partition decorators. For normal tables, WRITE_TRUNCATE
will always overwrite the schema. One or more of the following values
are specified:
ALLOW_FIELD_ADDITION: allow adding a nullable field to the schema.ALLOW_FIELD_RELAXATION: allow relaxing a required field in the original schema to nullable.
- (Array<String>) — An array of strings.
#skip_leading_rows
def skip_leading_rows() -> IntegerThe number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
- (Integer) — The number of header rows at the top of a CSV file to skip.
#sources
def sources()The URI or URIs representing the Google Cloud Storage files from which the operation loads data.
#time_partitioning?
def time_partitioning?() -> BooleanChecks if the destination table will be time partitioned. See Partitioned Tables.
-
(Boolean) —
truewhen the table will be time-partitioned, orfalseotherwise.
#time_partitioning_expiration
def time_partitioning_expiration() -> Integer, nilThe expiration for the destination table time partitions, if any, in seconds. See Partitioned Tables.
-
(Integer, nil) — The expiration time, in seconds, for data in
time partitions, or
nilif not present.
#time_partitioning_field
def time_partitioning_field() -> String, nilThe field on which the destination table will be time partitioned, if any.
If not set, the destination table will be time partitioned by pseudo column
_PARTITIONTIME; if set, the table will be time partitioned by this field.
See Partitioned Tables.
-
(String, nil) — The time partition field, if a field was configured.
nilif not time partitioned or not set (partitioned by pseudo column '_PARTITIONTIME').
#time_partitioning_require_filter?
def time_partitioning_require_filter?() -> BooleanIf set to true, queries over the destination table will require a time partition filter that can be used for partition elimination to be specified. See Partitioned Tables.
-
(Boolean) —
truewhen a time partition filter will be required, orfalseotherwise.
#time_partitioning_type
def time_partitioning_type() -> String, nilThe period for which the destination table will be time partitioned, if any. See Partitioned Tables.
-
(String, nil) — The time partition type. The supported types are
DAY,HOUR,MONTH, andYEAR, which will generate one partition per day, hour, month, and year, respectively; ornilif not present.
#utf8?
def utf8?() -> BooleanChecks if the character encoding of the data is UTF-8. This is the default.
-
(Boolean) —
truewhen the character encoding is UTF-8,falseotherwise.