Using pagination with the BigQuery API
This document describes how to read table data and query results in large datasets by using pagination with the BigQuery API.
With pagination, BigQuery breaks a large dataset into smaller chunks called pages. For most users, Cloud Client Libraries handle this process automatically, but you can also manually control pagination for specific use cases like web applications.
Use automatic pagination
Cloud Client Libraries handle the low-level details of API pagination and provide an iterator-like experience. When you iterate through results, the library automatically fetches the next page of data when it's needed.
The following samples demonstrate how to automatically iterate through BigQuery table data.
C#
Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
PHP
Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Ruby
Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Control page size
You can specify the maximum number of rows returned in each network request by setting a page size. Setting the page size is useful for optimizing network usage or fitting data into memory.
In most client libraries, you can use a max_results or page_size parameter
when you call methods like list_rows or query.
Use manual pagination with page tokens
Manual pagination is useful for stateless applications, such as a web service where a user clicks Next to see the next set of results. In this scenario, the server does not maintain an active iterator between requests.
Instead, you use a page token as follows:
- Request a page. Call the API and receive a
pageTokenparameter along with the rows. - Resume. In the next request, pass that same
pageTokenparameter back to BigQuery to retrieve the next chunk of data.
The following samples show how to retrieve a page token and use it to fetch the next page of query results.
API
Read the
jobs.config.query.destinationTable
field to determine the table that query results have been written to.
Call the tabledata.list
to read the query results.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
To set the number of rows returned on each page, use a
GetQueryResults job and set the
pageSize option
of the QueryResultsOption object that you pass in, as shown in the
following example:
TableResult result = job.getQueryResults();
QueryResultsOption queryResultsOption = QueryResultsOption.pageSize(20);
TableResult result = job.getQueryResults(queryResultsOption);
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Python
The
QueryJob.result
method returns an iterable of the query results. Alternatively,
- Read the
QueryJob.destinationproperty. If this property is not configured, it is set by the API to a reference to a temporary anonymous table. - Get the table schema with the
Client.get_tablemethod. - Create an iterable over all rows in the destination table with the
Client.list_rowsmethod.
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Before running code samples, set the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment
variable to s3nsapis.fr.
Optimize with ETags
When you page backwards or jump to arbitrary pages using cached pageToken
values, it is possible that the data in your pages might have changed since it
was last viewed. To mitigate this outcome, you can use the etag property.
Every collection.list method (except for Tabledata) returns an
etag property in the result. This property is a hash of the page
results that can be used to verify whether the page has changed since the last
request. When you make a request to BigQuery with an ETag value,
BigQuery compares the ETag value to the ETag value returned by
the API and responds based on whether the ETag values match. You can use ETags
to avoid redundant list calls as follows:
- To return values only if they have changed, make a list call with a
previously-stored ETag using the
HTTP
If-None-Matchheader. If the ETags match, BigQuery returns anHTTP 304 Not Modifiedstatus code and no data, saving bandwidth. - To return values only if they have not changed, use the
HTTP
If-Matchheader. BigQuery returns a412 Precondition Failedif the page has changed.
Reference: API limits and criteria
All *collection*.list methods return paginated results
under certain circumstances. The maxResults property limits the number of
results per page.
| Method | Pagination criteria | Default maxResults limit |
Maximum maxResults limit |
Maximum maxFieldValues limit |
|---|---|---|---|---|
tabledata.list |
Returns paginated results if the response size is more than
10 MB1 of data or more than maxResults
rows. |
Unlimited | Unlimited | Unlimited |
All other *collection*.list methods |
Returns paginated results if the response is more than
maxResults rows and also less than the maximum limits. |
10,000 | Unlimited | 300,000 |
If the result is larger than the byte or field limit, the result is
trimmed to fit the limit. If one row is greater than the byte or field limit,
the tabledata.list method can return up to 100 MB of data1,
which is consistent with the maximum row size limit for query results.
There is no minimum size per page, and some pages might return more rows than others.
The jobs.getQueryResults REST API method can return 20 MB of data unless you explicitly request more through support.
1The row size is approximate, as the size is based on the internal representation of row data. The maximum row size limit is enforced during certain stages of query job execution.