Some or all of the information on this page might not apply to Trusted Cloud by S3NS. See Differences from Google Cloud for more details.

Introduction to BigLake metastore

BigLake metastore is a unified, managed, serverless, and scalable metastore that connects lakehouse data stored in Cloud Storage or BigQuery to multiple runtimes, including open source runtimes (such as Apache Spark and Apache Flink) and BigQuery.

BigLake metastore provides a single source of truth for managing metadata from multiple engines. It supports key open source table formats, such as Apache Iceberg, through BigLake Iceberg tables and standard BigQuery tables. Additionally, BigLake metastore has support for open APIs and an Iceberg REST catalog (Preview).

Use the following table to help determine where to start your BigLake metastore journey:

Use case	Recommendation
Open source engine needs to access data in Cloud Storage.	Explore the Iceberg REST catalog (Preview).
Open source engine needs interoperability with BigQuery.	Explore the BigLake metastore integration with open source engines (such as Spark) using the BigQuery custom Iceberg catalog plugin.

Benefits

BigLake metastore offers several advantages for data management and analysis:

Serverless architecture. BigLake metastore provides a serverless architecture, eliminating the need for server or cluster management. This helps reduce operational overhead, simplifies deployment, and allows for automatic scaling based on demand.
Engine interoperability. BigLake metastore provides you with direct table access across open source engines (such as Spark and Flink) and BigQuery, allowing you to query open-format tables without additional configuration. For example, you can create a table in Spark and then query it directly in BigQuery. This helps streamline your analytics workflow and reduces the need for complex data movement or ETL processes.
Unified user experience. BigLake metastore provides a unified workflow across BigQuery and open source engines. This unified experience means you can configure a Spark environment that's self-hosted or hosted by Dataproc through the Iceberg REST catalog (Preview), or you can configure a Spark environment in a BigQuery Studio notebook to do the same thing.

Table formats in BigLake metastore

BigLake supports several table types. Use the following table to help select the format that best fits your use case:

	External tables	BigLake Iceberg tables	BigLake Iceberg tables in BigQuery	Standard BigQuery tables
Metastore	External or self-hosted metastore	BigLake metastore	BigLake metastore	BigLake metastore
Storage	Cloud Storage / Amazon S3 / Azure	Cloud Storage	Cloud Storage	BigQuery
Management	Customer or third party	Google	Google (highly managed experience)	Google (most managed experience)
Read / Write	Open source engines (read/write) BigQuery (read only)	Open source engines (read/write) BigQuery (read only)	Open source engines (read only with Iceberg libraries, read/write interoperability with BigQuery Storage API) BigQuery (read/write)	Open source engines (read/write interoperability with BigQuery Storage API) BigQuery (read/write)
Use cases	Migrations, staging tables for BigQuery loads, self-management	Open lakehouse	Open lakehouse, enterprise-grade storage for analytics, streaming, and AI	Enterprise-grade storage for analytics, streaming, and AI

Differences with BigLake metastore (classic)

BigLake metastore is the recommended metastore on Trusted Cloud by S3NS.

The core differences between BigLake metastore and BigLake metastore (classic) include the following details:

BigLake metastore (classic) is a standalone metastore service that is distinct from BigQuery and only supports Iceberg tables. It has a different three-part resource model. BigLake metastore (classic) tables aren't automatically discovered from BigQuery.
Tables in BigLake metastore are accessible from multiple open source engines and BigQuery. BigLake metastore supports direct integration with Spark, which helps reduce redundancy when you store metadata and run jobs. BigLake metastore also supports the Iceberg REST catalog (Preview), which connects lakehouse data across multiple runtimes.

Limitations

The following limitations apply to tables in BigLake metastore:

You can't create or modify BigLake metastore tables with DDL or DML statements using the BigQuery engine. You can modify BigLake metastore tables using the BigQuery API (with the bq command-line tool or client libraries), but doing so risks making changes that are incompatible with the external engine.
BigLake metastore tables don't support renaming operations or ALTER TABLE ... RENAME TO Spark SQL statements.
BigLake metastore tables are subject to the same quotas and limits as standard tables.
Query performance for BigLake metastore tables from the BigQuery engine might be slow compared to querying data in a standard BigQuery table. In general, the query performance for a BigLake metastore table should be equivalent to reading the data directly from Cloud Storage.
A dry run of a query that uses a BigLake metastore table might report a lower bound of 0 bytes of data, even if rows are returned. This result occurs because the amount of data that is processed from the table can't be determined until the actual query completes. Running the query incurs a cost for processing this data.
You can't reference a BigLake metastore table in a wildcard table query.
You can't use the tabledata.list method to retrieve data from BigLake metastore tables. Instead, you can save query results to a destination table, then use the tabledata.list method on that table.
BigLake metastore tables don't support clustering.
BigLake metastore tables don't support flexible column names.
The display of table storage statistics for BigLake metastore tables isn't supported.
BigLake metastore doesn't support Iceberg views.

Introduction to BigLake metastore

Benefits

Table formats in BigLake metastore

Differences with BigLake metastore (classic)

Limitations

What's next