Introduction to datasets
This page provides an overview of datasets in BigQuery.
Datasets
A dataset is contained within a specific project. Datasets
are top-level containers that are used to organize and control access to your
tables and views. A table
or view must belong to a dataset, so you need to create at least one dataset before
loading data into BigQuery.
Use the format projectname.datasetname
to fully qualify a dataset name when
using GoogleSQL, or the format projectname:datasetname
to fully qualify
a dataset name when using the bq command-line tool.
Location
You specify a location for storing your BigQuery data when you create a dataset. For a list of BigQuery dataset locations, see BigQuery locations. After you create the dataset, the location cannot be changed.
BigQuery processes queries in the same location as the dataset that contains the tables you're querying. BigQuery stores your data in the selected location.
Data retention
Datasets use time travel in conjunction with the fail-safe period to retain deleted and modified data for a short time, in case you need to recover it. For more information, see Data retention with time travel and fail-safe.
External datasets
In addition to BigQuery datasets, you can create external datasets, which are links to external data sources:
Note that external datasets are also knowns as federated datasets and both terms are used interchangeably.
Once created, external datasets contain tables from a referenced external data source. Data from these tables aren't copied into BigQuery, but queried every time they are used. For more information, see Spanner federated queries.
Limitations
BigQuery datasets are subject to the following limitations:
- The dataset location can only be set at creation time. After a dataset is created, its location cannot be changed.
- All tables that are referenced in a query must be stored in datasets in the same location.
External datasets don't support table expiration, replicas, time travel, default collation, default rounding mode or the option to enable or disable case insensitive tables name.
When you copy a table, the datasets that contain the source table and destination table must reside in the same location.
Dataset names must be unique for each project.
Quotas
For more information on dataset quotas and limits, see Quotas and limits.
Security
To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.
What's next
- For more information on creating datasets, see Creating datasets.
- For more information on assigning access controls to datasets, see Controlling access to datasets.