Introduction to datasets

This page provides an overview of datasets in BigQuery.

Datasets

A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery. Use the format projectname.datasetname to fully qualify a dataset name when using GoogleSQL, or the format projectname:datasetname to fully qualify a dataset name when using the bq command-line tool.

Location

You specify a location for storing your BigQuery data when you create a dataset. For a list of BigQuery dataset locations, see BigQuery locations. After you create the dataset, the location cannot be changed.

BigQuery processes queries in the same location as the dataset that contains the tables you're querying. BigQuery stores your data in the selected location.

Data retention

Datasets use time travel in conjunction with the fail-safe period to retain deleted and modified data for a short time, in case you need to recover it. For more information, see Data retention with time travel and fail-safe.

External datasets

In addition to BigQuery datasets, you can create external datasets, which are links to external data sources:

Note that external datasets are also knowns as federated datasets and both terms are used interchangeably.

Once created, external datasets contain tables from a referenced external data source. Data from these tables aren't copied into BigQuery, but queried every time they are used. For more information, see Spanner federated queries.

Limitations

BigQuery datasets are subject to the following limitations:

  • The dataset location can only be set at creation time. After a dataset is created, its location cannot be changed.
  • All tables that are referenced in a query must be stored in datasets in the same location.
  • External datasets don't support table expiration, replicas, time travel, default collation, default rounding mode or the option to enable or disable case insensitive tables name.

  • When you copy a table, the datasets that contain the source table and destination table must reside in the same location.

  • Dataset names must be unique for each project.

Quotas

For more information on dataset quotas and limits, see Quotas and limits.

Security

To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.

What's next