This document guides you through the optimal data and metadata formats as you
design your data lakehouse with BigLake.
A data lakehouse is a data architecture that combines the structure of a data
warehouse with the raw data flexibility of a data lake. This architecture
provides flexibility and scalability for a wide range of data use cases. The
Trusted Cloud by S3NS data lakehouse solution is called
BigLake, which connects Trusted Cloud and open
source services to create a unified interface for analytics and AI. A data
lakehouse that's built with BigLake consists of the following key
components:
- Storage capabilities: Cloud Storage or BigQuery, with
Apache Iceberg as the recommended open table format
- A metastore: BigLake metastore
- A query engine: BigQuery, Apache Spark,
Apache Flink, Trino, or other open source engines
- A tool for data writing and analytics: various BigQuery and
open source connections
BigLake packages all of these components in a single experience
with uniform governance. For more information on BigLake
architecture and innovations, see
BigLake evolved.
For your metastore, we recommend using
BigLake metastore. BigLake metastore is a fully
managed and serverless metastore for your lakehouse on Trusted Cloud. It
provides a single source of truth for metadata from multiple sources and is
accessible from BigQuery and various open data processing
engines, removing the need to copy and synchronize metadata between different
repositories with customized tools. BigLake metastore is supported with
Dataplex Universal Catalog, which provides unified and fine-grained access controls
across all supported engines and enables end-to-end governance that includes
comprehensive lineage, data quality, and discoverability capabilities.
With BigLake metastore as the metastore for your open lakehouse, you have
following choices for the format of your tables:
- Choose standard BigQuery tables for data managed in BigQuery.
These tables are fully managed by BigQuery and have the most
advanced data analytics and management features. You can still connect these
tables to BigLake metastore. Choose this option for
non-Iceberg tables.
- Choose BigLake Iceberg tables in BigQuery for a fully managed experience on BigQuery.
These tables are Iceberg tables that you create from
BigQuery and
store in Cloud Storage.
Like all tables that use BigLake metastore, they can be read by open source
engines or BigQuery. However, BigQuery is the
only engine that can directly write to them. Choose this option if you want
your extract, transform, and load (ETL) workflow to be managed by
BigQuery.
- Choose BigLake Iceberg tables for a semi-managed experience on Trusted Cloud.
These tables are Iceberg tables that you create from
open source engines and store in Cloud Storage. Like all tables that
use BigLake metastore, they can be read by open source engines or
BigQuery. However, the open source engine that created the
table is the only engine that can write to it. Choose this option if you want
your ETL workflow to be managed by the open source engine.
- Choose external tables for tables outside of BigLake metastore.
The data and metadata of these tables are completely self-managed, where you
fully rely on the capabilities of open table formats
(such as Iceberg, Apache Hudi, or
Delta Lake). BigQuery only has the ability to read
from these tables. Choose this option for data and metadata that you want to
manage on your own in a third-party catalog.
Use the following table to compare your table format options:
|
External tables |
BigLake Iceberg tables |
BigLake Iceberg tables in BigQuery |
Standard BigQuery tables |
Metastore |
External or self-hosted metastore |
BigLake metastore |
BigLake metastore |
BigLake metastore |
Storage |
Cloud Storage / Amazon S3 / Azure |
Cloud Storage |
Cloud Storage |
BigQuery |
Management |
Customer or third party |
Google |
Google (highly managed experience) |
Google (most managed experience) |
Read / Write |
Open source engines (read/write)
BigQuery (read only)
|
Open source engines (read/write)
BigQuery (read only)
|
Open source engines (read only with Iceberg
libraries, read/write interoperability with BigQuery Storage API)
BigQuery (read/write)
|
Open source engines (read/write interoperability with
BigQuery Storage API)
BigQuery (read/write)
|
Use cases |
Migrations, staging tables for BigQuery loads,
self-management
|
Open lakehouse
|
Open lakehouse, enterprise-grade storage for analytics, streaming, and AI
|
Enterprise-grade storage for analytics, streaming, and AI
|
What's next
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-29 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[],[],null,["# Optimal data and metadata formats for lakehouses\n================================================\n\nThis document guides you through the optimal data and metadata formats as you\ndesign your data lakehouse with BigLake.\n\nA *data lakehouse* is a data architecture that combines the structure of a data\nwarehouse with the raw data flexibility of a data lake. This architecture\nprovides flexibility and scalability for a wide range of data use cases. The\nGoogle Cloud data lakehouse solution is called\n[BigLake](/biglake), which connects Google Cloud and open\nsource services to create a unified interface for analytics and AI. A data\nlakehouse that's built with BigLake consists of the following key\ncomponents:\n\n- **Storage capabilities**: Cloud Storage or BigQuery, with Apache Iceberg as the recommended open table format\n- **A metastore**: BigLake metastore\n- **A query engine**: BigQuery, Apache Spark, Apache Flink, Trino, or other open source engines\n- **A tool for data writing and analytics**: various BigQuery and open source connections\n\nBigLake packages all of these components in a single experience\nwith uniform governance. For more information on BigLake\narchitecture and innovations, see\n[BigLake evolved](/blog/products/data-analytics/enhancing-biglake-for-iceberg-lakehouses).\n\nSelect a metastore\n------------------\n\nFor your metastore, we recommend using\n[BigLake metastore](/bigquery/docs/about-blms). BigLake metastore is a fully\nmanaged and serverless metastore for your lakehouse on Google Cloud. It\nprovides a single source of truth for metadata from multiple sources and is\naccessible from BigQuery and various open data processing\nengines, removing the need to copy and synchronize metadata between different\nrepositories with customized tools. BigLake metastore is supported with\nDataplex Universal Catalog, which provides unified and fine-grained access controls\nacross all supported engines and enables end-to-end governance that includes\ncomprehensive lineage, data quality, and discoverability capabilities.\n\nSelect a table format\n---------------------\n\nWith BigLake metastore as the metastore for your open lakehouse, you have\nfollowing choices for the format of your tables:\n\n- **Choose [standard BigQuery tables](/bigquery/docs/tables) for data managed in BigQuery.** These tables are fully managed by BigQuery and have the most advanced data analytics and management features. You can still connect these tables to BigLake metastore. Choose this option for non-Iceberg tables.\n- **Choose [BigLake Iceberg tables in BigQuery](/bigquery/docs/iceberg-tables) for a fully managed experience on BigQuery.** These tables are Iceberg tables that you create from BigQuery and [store in Cloud Storage](/bigquery/docs/iceberg-tables#bucket-best-practices). Like all tables that use BigLake metastore, they can be read by open source engines or BigQuery. However, BigQuery is the only engine that can directly write to them. Choose this option if you want your extract, transform, and load (ETL) workflow to be managed by BigQuery.\n- **Choose [BigLake Iceberg tables](/bigquery/docs/blms-rest-catalog) for a semi-managed experience on Google Cloud.** These tables are Iceberg tables that you create from open source engines and store in Cloud Storage. Like all tables that use BigLake metastore, they can be read by open source engines or BigQuery. However, the open source engine that created the table is the only engine that can write to it. Choose this option if you want your ETL workflow to be managed by the open source engine.\n- **Choose [external tables](/bigquery/docs/iceberg-external-tables) for tables outside of BigLake metastore.** The data and metadata of these tables are completely self-managed, where you fully rely on the capabilities of open table formats (such as Iceberg, Apache Hudi, or Delta Lake). BigQuery only has the ability to read from these tables. Choose this option for data and metadata that you want to manage on your own in a third-party catalog.\n\nUse the following table to compare your table format options:\n\nWhat's next\n-----------\n\n- Learn more about [BigLake metastore](/bigquery/docs/about-blms)."]]