Optimize load jobs
The strategies and best practices described in this document help you optimize batch loading or streaming data into BigQuery to avoid reaching the limit for the number of load jobs per table, per day.
Because the limit for load jobs is fixed and can't be increased, you should optimize your load jobs by structuring your tables through methods such as table partitions, or by managing your loads through methods such as batch loading or streaming.
How table operations quotas work
The BigQuery limit for table modifications per table per day per project is fixed, regardless of whether modifications append or update data, or truncate the table. This limit includes the combined total of all load jobs, copy jobs, and query jobs that add to or overwrite a destination table.
Load jobs have a refill rate. If you exceed the table operation limit or its
refill rate, load jobs fail with a quotaExceeded error. The
project-level limit for load jobs per day refills within a rolling 24-hour
period. When load jobs finish, your available quota decreases. The quota then
gradually refills over the next 24 hours. Failed load jobs still count toward
both per-table and per-project quotas. For more information about load job
limits, see Load jobs.
For partitioned tables, a separate limit for partitioned table modifications applies, replacing the standard table limit.
To stay within your daily table operation limits, spread operations over a
24-hour period. For example, if you perform 25 updates, each with 60 operations,
you can run about 60 operations every 58 minutes. This approach helps you meet
the daily limit. To monitor table updates, see BigQuery
INFORMATION_SCHEMA views.
Table operations excluded from quota
Updating table information (metadata) and using DML statements does not count toward your daily table modification limit. This exclusion applies to both standard and partitioned tables.
Your project can run an unlimited number of DML statements. While DML statements previously counted toward daily table modifications and were not throttled even at the limit, they no longer do.
Streaming inserts also modify tables, but their own specific quotas govern them.
Load strategies to avoid the table operations limit
To stay within BigQuery's daily table operation limit, consider these best practices:
- Perform fewer, larger writes instead of many small ones.
- Minimize separate write jobs to your final production table each day.
To use these best practices, batch or stream your data into BigQuery. Your choice of load method depends on whether you need to load high volumes of data in real time, or if real-time loading is not a concern. The following sections explain batch loading and data streaming in detail, including the tools and services you can use for each method.
Batch loading
To stay within the daily load limit per project for BigQuery, batch large amounts of data and load it with fewer jobs into BigQuery. The following sections describe several methods you can use to batch load your data.
Load more data for each job
Instead of sending data to BigQuery each time new information becomes available, collect and load it into BigQuery using a single large job.
For example, instead of running a separate load job for every few rows of data, you can wait until you accumulate several thousand rows of data in a file—for example, in a CSV or JSON file—and then run one load job to append all the data to a table. This action counts as one table operation, even though the job contains much more data. You can batch your files by using wildcards with your load job. Wildcards let you select batches of files in a directory to load multiple files in a single load job.
The following example shows how to use wildcards with your bq load command or
SQL LOAD DATA queries.
bq
The following example shows a bq load command
to load CSV data from Cloud Storage into a BigQuery table named
my_target_table. To select more than one source filename, use a wildcard
with the command. The AUTODETECT flag automatically determines your table
schema from the source data in Cloud Storage, and can support
a wildcard (*) to load multiple files that fit a specific naming pattern
into the BigQuery table.
bq load \ --source_format=CSV \ --autodetect \ --project_id=PROJECT_ID \ DATASET_NAME.TABLE_NAME \ "gs://BUCKET_NAME/OBJECT_PATH_WILDCARD"
Replace the following:
PROJECT_ID: the ID of your Cloud de Confiance project.DATASET_NAME: the name of the BigQuery dataset where you want to load the data.TABLE_NAME: the name of the BigQuery table where you want to load the data.BUCKET_NAME: the name of your Cloud Storage bucket that contains the source files.OBJECT_PATH_WILDCARD: the path to your CSV files in the Cloud Storage bucket. Include a wildcard (*) to match multiple files. For example, the stringgs://my-bucket/path/to/data/my_prefix_*.csvuses the wildcard character*to load all files ings://my-bucket/path/to/data/that begin withmy_prefix_and end with.csv.
For more information, see the following:
SQL
The following example shows how to use the SQL
LOAD DATA query
to load CSV data from a Cloud Storage bucket into
BigQuery table. To select more than one source filename, use a
wildcard with the command.
LOAD DATA INTO
DATASET_NAME.TABLE_NAME
FROM FILES (
format = 'SOURCE_FORMAT',
uris = ['gs://BUCKET_NAME/OBJECT_PATH_WILDCARD]
);
Replace the following:
DATASET_NAME: the name of the BigQuery dataset where you want to load the data.TABLE_NAME: the name of the BigQuery table where you want to load the data.- The
SOURCE_FORMATsets the type of your source files, for example,CSVorJSON. In this example, useCSV. BUCKET_NAME: the name of your Cloud Storage bucket that contains the source files.OBJECT_PATH_WILDCARD: the path to your CSV files in the Cloud Storage bucket. Include a wildcard (*) to match multiple files. For example, the stringgs://my-bucket/path/to/data/my_prefix_*.csvuses the wildcard character*to load all files ings://my-bucket/path/to/data/that begin withmy_prefix_and end with.csv.
For more information, see Load statements in GoogleSQL.
Batch load using the BigQuery Storage Write API
To load batch data into BigQuery, one option is to use the Storage Write API directly from your application with the Google API Client Libraries.
The Storage Write API optimizes data loading to stay within table
limits. For high-volume, real-time streaming, use a PENDING stream, rather
than a COMMITTED stream. When you use a PENDING stream, the API temporarily
stores records until you commit the stream.
For a complete example of batch loading data using the Storage Write API, see Batch load data using the Storage Write API.
Batch load using Dataflow
If you want to stream, transform, and write data into BigQuery
using data pipelines, you can use Dataflow. The data pipelines
that you create read from supported sources like Pub/Sub or
Apache Kafka. You can also create a Dataflow pipeline
using the BigQueryIO connector, which uses the Storage Write API
for high-performance data streaming and exactly-once semantics.
For information about using Dataflow to batch load data to BigQuery, see Write from Dataflow to BigQuery.
Data streaming
To load high volumes of data with frequent updates, we recommend that you stream your data into BigQuery. With data streaming, new data continuously writes from your client application into BigQuery, a strategy that avoids reaching the limit for running too many load jobs. The following sections describe several methods to stream your data into BigQuery.
Stream data using the Storage Write API
Use the Storage Write API to stream records in real time into BigQuery with minimal latency. The Storage Write API provides an efficient streaming protocol that provides advanced functionality like exactly-once delivery semantics, schema update detection, and streaming Change Data Capture (CDC) upserts. In addition, you can ingest up to 2 TiB per month at no cost.
For information about using the Storage Write API, see Streaming data using the Storage Write API.
Stream data using Dataflow
Use Dataflow to create data pipelines that read from
supported sources, for example, Pub/Sub or Apache
Kafka. These pipelines then transform and write the data to
BigQuery as a destination. You can create a Dataflow
pipeline using the BigQueryIO connector, which uses the
Storage Write API.
For information about using Dataflow to stream data to BigQuery, see Write from Dataflow to BigQuery.
Best practices to manage your tables for loading
In addition to batch loading or streaming data into BigQuery, manage your tables in the following ways to optimize them for data ingestion.
Use partitioned tables
Table partitioning is a powerful technique for managing large tables in BigQuery, especially when you need to perform frequent data loading operations. You can significantly improve table performance and cost-effectiveness by dividing a table into smaller, more manageable segments based on a date, timestamp, or integer.
The primary advantage of partitioning for data loading is that the daily table operation quotas for BigQuery apply at the partition level rather than at the table level. For partitioned tables, a separate, higher limit applies to partition modifications, which replaces the standard table limit. The limit for partitioned tables dramatically increases the number of load jobs you can run per day without reaching quota limits.
A common and highly effective strategy is to batch load your daily data. For
example, you can gather all of the day's data for 2025-09-18 in a temporary
staging table. Then, at the end of the day, you run a single job to load
this data into the specific partition for this day in your main production table.
Because BigQuery interacts only with the data for a single
partition, this approach keeps your data well organized and makes your
loading operations faster and less expensive.
While partitioning is highly recommended for large, growing tables, it's best to avoid it if your partitions would be consistently smaller than 10 GB. For more information, see When to use partitioning.
To learn more about the different partitioning methods available, such as time-unit and integer-range partitioning, see Types of partitioned tables.
Take advantage of built-in exponential backoff, truncate, and jitter
Built-in exponential backoff and
retry
is an error-handling method that helps your application recover smoothly when an
operation fails temporarily. Such failures can include a rate limit error
(rateLimitExceeded) or a brief network problem (unavailable).
In a reliable system, workers that take tasks from your client-side queue also use exponential backoff and retry. They do this when calling BigQuery, which creates two levels of protection.
For example, the official google-cloud-bigquery-storage library for Python
includes built-in retry logic with exponential backoff. This logic handles
temporary gRPC errors, for example, UNAVAILABLE. In most cases, you don't need
to write this retry code yourself. The client.append_rows() call handles these
retries automatically.
This built-in handling is a significant benefit of using the official client
libraries. You only need to deal with errors that cannot be retried, for example,
INVALID_ARGUMENT, which means there is a schema mismatch.