BigLake metastore is a unified, managed, serverless, and scalable metastore that
connects lakehouse data stored in Cloud Storage or BigQuery to
multiple runtimes, including open source runtimes (such as Apache Spark
and Apache Flink) and BigQuery.
BigLake metastore provides a single source of truth for managing metadata from
multiple engines. It supports key open source table formats, such as
Apache Iceberg, through BigLake Iceberg tables and
standard BigQuery tables. Additionally, BigLake metastore has
support for open APIs and an
Iceberg REST catalog
(Preview).
Use the following table to help determine where to start your
BigLake metastore journey:
Use case |
Recommendation |
Open source engine needs to access data in Cloud Storage.
|
Explore the
Iceberg REST catalog
(Preview).
|
Open source engine needs interoperability with BigQuery.
|
Explore the BigLake metastore integration with open source engines (such
as Spark)
using the BigQuery custom Iceberg
catalog plugin.
|
Benefits
BigLake metastore offers several advantages for data management
and analysis:
- Serverless architecture. BigLake metastore provides a serverless
architecture, eliminating the need for server or cluster management. This
helps reduce operational overhead, simplifies deployment, and allows for
automatic scaling based on demand.
- Engine interoperability. BigLake metastore provides you with direct
table access across open source engines (such as Spark
and Flink) and BigQuery, allowing you
to query open-format tables without additional configuration. For example,
you can create a table in Spark and then query it
directly in BigQuery. This helps streamline your analytics
workflow and reduces the need for complex data movement or ETL processes.
- Unified user experience. BigLake metastore provides a unified workflow
across BigQuery and open source engines. This unified
experience means you can configure a Spark environment
that's self-hosted or hosted by Dataproc through the
Iceberg REST catalog
(Preview), or you can configure a
Spark environment in a BigQuery Studio notebook to
do the same thing.
Table formats in BigLake metastore
BigLake supports several table types. Use the following table to help
select the format that best fits your use case:
|
External tables |
BigLake Iceberg tables |
BigLake Iceberg tables in BigQuery |
Standard BigQuery tables |
Metastore |
External or self-hosted metastore |
BigLake metastore |
BigLake metastore |
BigLake metastore |
Storage |
Cloud Storage / Amazon S3 / Azure |
Cloud Storage |
Cloud Storage |
BigQuery |
Management |
Customer or third party |
Google |
Google (highly managed experience) |
Google (most managed experience) |
Read / Write |
Open source engines (read/write)
BigQuery (read only)
|
Open source engines (read/write)
BigQuery (read only)
|
Open source engines (read only with Iceberg
libraries, read/write interoperability with BigQuery Storage API)
BigQuery (read/write)
|
Open source engines (read/write interoperability with
BigQuery Storage API)
BigQuery (read/write)
|
Use cases |
Migrations, staging tables for BigQuery loads,
self-management
|
Open lakehouse
|
Open lakehouse, enterprise-grade storage for analytics, streaming, and AI
|
Enterprise-grade storage for analytics, streaming, and AI
|
Differences with BigLake metastore (classic)
BigLake metastore is the recommended metastore on Trusted Cloud by S3NS.
The core differences between BigLake metastore and BigLake metastore (classic)
include the following details:
- BigLake metastore (classic) is a standalone metastore service that is distinct
from BigQuery and only supports Iceberg
tables. It has a different three-part resource model.
BigLake metastore (classic) tables aren't automatically discovered from
BigQuery.
- Tables in BigLake metastore are accessible from multiple open source engines
and BigQuery. BigLake metastore supports direct integration with
Spark, which helps reduce redundancy when you store
metadata and run jobs. BigLake metastore also supports the
Iceberg REST catalog
(Preview), which connects lakehouse data
across multiple runtimes.
Limitations
The following limitations apply to tables in BigLake metastore:
- You can't create or modify BigLake metastore tables with DDL or DML
statements using the BigQuery engine. You can modify
BigLake metastore tables using the BigQuery API (with the bq command-line tool or
client libraries), but doing so risks making changes that are incompatible
with the external engine.
- BigLake metastore tables don't support
renaming operations or
ALTER TABLE ... RENAME TO
Spark SQL statements.
- BigLake metastore tables are subject to the same
quotas and limits as standard tables.
- Query performance for BigLake metastore tables from the
BigQuery engine might be slow compared to querying data in a
standard BigQuery table. In general, the query performance for
a BigLake metastore table should be equivalent to reading the data directly
from Cloud Storage.
- A dry run of a query that uses a
BigLake metastore table might report a lower bound of 0 bytes of data, even
if rows are returned. This result occurs because the amount of data that is
processed from the table can't be determined until the actual query completes.
Running the query incurs a cost for processing this data.
- You can't reference a BigLake metastore table in a
wildcard table query.
- You can't use the
tabledata.list
method to
retrieve data from BigLake metastore tables. Instead, you can
save query results to a destination table, then use the tabledata.list
method on that table.
- BigLake metastore tables don't support
clustering.
- BigLake metastore tables don't support
flexible column names.
- The display of table storage statistics for BigLake metastore tables isn't
supported.
- BigLake metastore doesn't support Iceberg views.
What's next
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-26 UTC."],[],[],null,["Introduction to BigLake metastore\n\nBigLake metastore is a unified, managed, serverless, and scalable metastore that\nconnects lakehouse data stored in Cloud Storage or BigQuery to\nmultiple runtimes, including open source runtimes (such as Apache Spark\nand Apache Flink) and BigQuery.\n\nBigLake metastore provides a single source of truth for managing metadata from\nmultiple engines. It supports key open source table formats, such as\nApache Iceberg, through BigLake Iceberg tables and\nstandard BigQuery tables. Additionally, BigLake metastore has\nsupport for open APIs and an\n[Iceberg REST catalog](/bigquery/docs/blms-rest-catalog)\n([Preview](/products#product-launch-stages)).\n\nUse the following table to help determine where to start your\nBigLake metastore journey:\n\n| **Use case** | **Recommendation** |\n|-----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Open source engine needs to access data in Cloud Storage. | Explore the [Iceberg REST catalog](/bigquery/docs/blms-rest-catalog) ([Preview](/products#product-launch-stages)). |\n| Open source engine needs interoperability with BigQuery. | Explore the BigLake metastore integration with open source engines (such as [Spark](/bigquery/docs/blms-use-dataproc#connect-biglake)) using the BigQuery custom Iceberg catalog plugin. |\n\nBenefits\n\nBigLake metastore offers several advantages for data management\nand analysis:\n\n- **Serverless architecture.** BigLake metastore provides a serverless architecture, eliminating the need for server or cluster management. This helps reduce operational overhead, simplifies deployment, and allows for automatic scaling based on demand.\n- **Engine interoperability.** BigLake metastore provides you with direct table access across open source engines (such as Spark and Flink) and BigQuery, allowing you to query open-format tables without additional configuration. For example, you can create a table in Spark and then query it directly in BigQuery. This helps streamline your analytics workflow and reduces the need for complex data movement or ETL processes.\n- **Unified user experience.** BigLake metastore provides a unified workflow across BigQuery and open source engines. This unified experience means you can configure a Spark environment that's self-hosted or hosted by Dataproc through the [Iceberg REST catalog](/bigquery/docs/blms-rest-catalog) ([Preview](/products#product-launch-stages)), or you can configure a Spark environment in a BigQuery Studio notebook to do the same thing.\n\nTable formats in BigLake metastore\n\nBigLake supports several table types. Use the following table to help\nselect the format that best fits your use case:\n\n| | **External tables** | **BigLake Iceberg tables** | **BigLake Iceberg tables in BigQuery** | **Standard BigQuery tables** |\n|------------------|----------------------------------------------------------------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|\n| **Metastore** | External or self-hosted metastore | BigLake metastore | BigLake metastore | BigLake metastore |\n| **Storage** | Cloud Storage / Amazon S3 / Azure | Cloud Storage | Cloud Storage | BigQuery |\n| **Management** | Customer or third party | Google | Google (highly managed experience) | Google (most managed experience) |\n| **Read / Write** | Open source engines (read/write) BigQuery (read only) | Open source engines (read/write) BigQuery (read only) | Open source engines (read only with Iceberg libraries, read/write interoperability with BigQuery Storage API) \u003cbr /\u003e BigQuery (read/write) | Open source engines (read/write interoperability with BigQuery Storage API) BigQuery (read/write) |\n| **Use cases** | Migrations, staging tables for BigQuery loads, self-management | Open lakehouse | Open lakehouse, enterprise-grade storage for analytics, streaming, and AI | Enterprise-grade storage for analytics, streaming, and AI |\n\nDifferences with BigLake metastore (classic)\n\nBigLake metastore is the recommended metastore on Google Cloud.\n\nThe core differences between BigLake metastore and BigLake metastore (classic)\ninclude the following details:\n\n- BigLake metastore (classic) is a standalone metastore service that is distinct from BigQuery and only supports Iceberg tables. It has a different three-part resource model. BigLake metastore (classic) tables aren't automatically discovered from BigQuery.\n- Tables in BigLake metastore are accessible from multiple open source engines and BigQuery. BigLake metastore supports direct integration with Spark, which helps reduce redundancy when you store metadata and run jobs. BigLake metastore also supports the [Iceberg REST catalog](/bigquery/docs/blms-rest-catalog) ([Preview](/products#product-launch-stages)), which connects lakehouse data across multiple runtimes.\n\nLimitations\n\nThe following limitations apply to tables in BigLake metastore:\n\n- You can't create or modify BigLake metastore tables with DDL or DML statements using the BigQuery engine. You can modify BigLake metastore tables using the BigQuery API (with the bq command-line tool or client libraries), but doing so risks making changes that are incompatible with the external engine.\n- BigLake metastore tables don't support [renaming operations](/bigquery/docs/managing-tables#renaming-table) or `ALTER TABLE ... RENAME TO` Spark SQL statements.\n- BigLake metastore tables are subject to the same [quotas and limits](/bigquery/quotas#standard_tables) as standard tables.\n- Query performance for BigLake metastore tables from the BigQuery engine might be slow compared to querying data in a standard BigQuery table. In general, the query performance for a BigLake metastore table should be equivalent to reading the data directly from Cloud Storage.\n- A [dry run](/bigquery/docs/running-queries#dry-run) of a query that uses a BigLake metastore table might report a lower bound of 0 bytes of data, even if rows are returned. This result occurs because the amount of data that is processed from the table can't be determined until the actual query completes. Running the query incurs a cost for processing this data.\n- You can't reference a BigLake metastore table in a [wildcard table](/bigquery/docs/querying-wildcard-tables) query.\n- You can't use the [`tabledata.list` method](/bigquery/docs/reference/rest/v2/tabledata/list) to retrieve data from BigLake metastore tables. Instead, you can save query results to a destination table, then use the `tabledata.list` method on that table.\n- BigLake metastore tables don't support [clustering](/bigquery/docs/clustered-tables).\n- BigLake metastore tables don't support [flexible column names](/bigquery/docs/schemas#flexible-column-names).\n- The display of table storage statistics for BigLake metastore tables isn't supported.\n- BigLake metastore doesn't support Iceberg views.\n\nWhat's next\n\n- [Use BigLake metastore with Dataproc](/bigquery/docs/blms-use-dataproc)\n- [Use BigLake metastore with Dataproc Serverless](/bigquery/docs/blms-use-dataproc-serverless)"]]