本页面上的部分或全部信息可能不适用于 Trusted Cloud by S3NS。
BigLake metastore 是一种全托管式 metastore,适用于Trusted Cloud by S3NS上的数据分析产品。它提供了一个可信来源,用于管理来自多个来源的元数据。您可以从 BigQuery 和各种开放数据处理引擎访问 metastore,这使其成为数据分析师和工程师的实用工具。
例如,您可以将 BigLake metastore 用作目录,并将其与 Apache Spark 等开源查询引擎搭配使用。使用 Spark 创建的表可以使用 BigQuery 进行查询,而无需同步元数据。
优势
BigLake metastore 在数据管理和分析方面具有以下优势:
- 无服务器架构。BigLake metastore 提供无服务器架构,从而无需进行服务器或集群管理。这有助于减少运营开销、简化部署并允许根据需求自动扩缩。
- 引擎互操作性。BigLake metastore 可让您在 BigQuery 中直接访问表,从而无需进行额外配置即可查询存储在 BigQuery 中的开放格式表。例如,您可以在 Spark 中创建一个表,然后直接在 BigQuery 中查询该表。这有助于简化分析工作流,并减少对复杂数据移动或 ETL 流程的需求。
统一的用户体验。BigLake metastore 可在 BigQuery 和 BigQuery Studio 之间提供统一的工作流。这样,您就可以直接在 BigQuery 和 BigQuery Studio 中使用 Spark。例如:
首先,您可以使用 BigQuery Studio 笔记本在 Spark 中创建表。
接下来,您可以在Trusted Cloud 控制台中查询同一个 Spark 表。
受支持的集成
您可以将 BigLake metastore 与 Trusted Cloud 控制台、gcloud CLI、BigQuery REST API 或 Iceberg REST API 搭配使用。
BigLake metastore 支持以下集成:
与 BigLake metastore(经典版)的差异
BigLake metastore 是 Trusted Cloud by S3NS上推荐使用的 metastore。
BigLake metastore 与 BigLake metastore(经典版)之间的核心区别包括以下详细信息:
- BigLake metastore(经典版)是一项独立的 metastore 服务,它与 BigQuery 不同,并且仅支持 Iceberg 表。它具有不同的三部分资源模型。BigLake metastore(经典版)表不会在 BigQuery 中被自动发现。
- 您可以从多个开源引擎和 BigQuery 访问 BigLake metastore 中的表。BigLake metastore 支持与 Spark 直接集成,这有助于减少存储元数据和运行作业时的冗余。BigLake metastore 还支持 Iceberg REST 目录(预览版),后者可跨多个运行时连接湖仓一体数据。
限制
BigLake metastore 表存在以下限制:
- 您无法使用 BigQuery 引擎通过 DDL 或 DML 语句创建或修改 BigLake metastore 表。您可以使用 BigQuery API(通过 bq 命令行工具或客户端库)修改 BigLake 元数据库表,但这样做可能会导致更改与外部引擎不兼容。
- BigLake metastore 表不支持重命名操作或
ALTER TABLE ... RENAME TO
Spark SQL 语句。
- BigLake metastore 表与标准表具有相同的配额和限制。
- 与在标准 BigQuery 表中查询数据相比,BigQuery 引擎中 BigLake metastore 表的查询速度可能较慢。一般而言,BigLake metastore 表的查询性能应等同于直接从 Cloud Storage 读取数据。
- 使用 BigLake 元数据库表的查询的试运行可能会报告 0 字节数据的下限,即使在返回了数据行的情况下也是如此。出现这种结果的原因是,在实际查询完成之前,无法确定从表中处理的数据量。运行查询会产生处理此数据的费用。
- 您无法在通配符表查询中引用 BigLake metastore 表。
- 您无法使用
tabledata.list
方法从 BigLake metastore 表中检索数据。不过,您可以将查询结果保存到目标表中,然后对该表使用 tabledata.list
方法。
- BigLake metastore 表不支持聚类。
- BigLake metastore 表不支持使用灵活的列名称。
- 不支持显示 BigLake metastore 表的表存储空间统计信息。
后续步骤
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-06-23。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-06-23。"],[],[],null,["Introduction to BigLake metastore\n\nBigLake metastore is a unified, managed, serverless, and scalable metastore that\nconnects lakehouse data stored in Cloud Storage or BigQuery to\nmultiple runtimes, including open source runtimes (such as Apache Spark\nand Apache Flink) and BigQuery.\n\nBigLake metastore provides a single source of truth for managing metadata from\nmultiple engines. It supports key open source table formats, such as\nApache Iceberg, through BigLake Iceberg tables and\nstandard BigQuery tables. Additionally, BigLake metastore has\nsupport for open APIs and an\n[Iceberg REST catalog](/bigquery/docs/blms-rest-catalog)\n([Preview](/products#product-launch-stages)).\n\nUse the following table to help determine where to start your\nBigLake metastore journey:\n\n| **Use case** | **Recommendation** |\n|-----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Open source engine needs to access data in Cloud Storage. | Explore the [Iceberg REST catalog](/bigquery/docs/blms-rest-catalog) ([Preview](/products#product-launch-stages)). |\n| Open source engine needs interoperability with BigQuery. | Explore the BigLake metastore integration with open source engines (such as [Spark](/bigquery/docs/blms-use-dataproc#connect-biglake)) using the BigQuery custom Iceberg catalog plugin. |\n\nBenefits\n\nBigLake metastore offers several advantages for data management\nand analysis:\n\n- **Serverless architecture.** BigLake metastore provides a serverless architecture, eliminating the need for server or cluster management. This helps reduce operational overhead, simplifies deployment, and allows for automatic scaling based on demand.\n- **Engine interoperability.** BigLake metastore provides you with direct table access across open source engines (such as Spark and Flink) and BigQuery, allowing you to query open-format tables without additional configuration. For example, you can create a table in Spark and then query it directly in BigQuery. This helps streamline your analytics workflow and reduces the need for complex data movement or ETL processes.\n- **Unified user experience.** BigLake metastore provides a unified workflow across BigQuery and open source engines. This unified experience means you can configure a Spark environment that's self-hosted or hosted by Dataproc through the [Iceberg REST catalog](/bigquery/docs/blms-rest-catalog) ([Preview](/products#product-launch-stages)), or you can configure a Spark environment in a BigQuery Studio notebook to do the same thing.\n\nTable formats in BigLake metastore\n\nBigLake supports several table types. Use the following table to help\nselect the format that best fits your use case:\n\n| | **External tables** | **BigLake Iceberg tables** | **BigLake Iceberg tables in BigQuery** | **Standard BigQuery tables** |\n|------------------|----------------------------------------------------------------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|\n| **Metastore** | External or self-hosted metastore | BigLake metastore | BigLake metastore | BigLake metastore |\n| **Storage** | Cloud Storage / Amazon S3 / Azure | Cloud Storage | Cloud Storage | BigQuery |\n| **Management** | Customer or third party | Google | Google (highly managed experience) | Google (most managed experience) |\n| **Read / Write** | Open source engines (read/write) BigQuery (read only) | Open source engines (read/write) BigQuery (read only) | Open source engines (read only with Iceberg libraries, read/write interoperability with BigQuery Storage API) \u003cbr /\u003e BigQuery (read/write) | Open source engines (read/write interoperability with BigQuery Storage API) BigQuery (read/write) |\n| **Use cases** | Migrations, staging tables for BigQuery loads, self-management | Open lakehouse | Open lakehouse, enterprise-grade storage for analytics, streaming, and AI | Enterprise-grade storage for analytics, streaming, and AI |\n\nDifferences with BigLake metastore (classic)\n\nBigLake metastore is the recommended metastore on Google Cloud.\n\nThe core differences between BigLake metastore and BigLake metastore (classic)\ninclude the following details:\n\n- BigLake metastore (classic) is a standalone metastore service that is distinct from BigQuery and only supports Iceberg tables. It has a different three-part resource model. BigLake metastore (classic) tables aren't automatically discovered from BigQuery.\n- Tables in BigLake metastore are accessible from multiple open source engines and BigQuery. BigLake metastore supports direct integration with Spark, which helps reduce redundancy when you store metadata and run jobs. BigLake metastore also supports the [Iceberg REST catalog](/bigquery/docs/blms-rest-catalog) ([Preview](/products#product-launch-stages)), which connects lakehouse data across multiple runtimes.\n\nLimitations\n\nThe following limitations apply to tables in BigLake metastore:\n\n- You can't create or modify BigLake metastore tables with DDL or DML statements using the BigQuery engine. You can modify BigLake metastore tables using the BigQuery API (with the bq command-line tool or client libraries), but doing so risks making changes that are incompatible with the external engine.\n- BigLake metastore tables don't support [renaming operations](/bigquery/docs/managing-tables#renaming-table) or `ALTER TABLE ... RENAME TO` Spark SQL statements.\n- BigLake metastore tables are subject to the same [quotas and limits](/bigquery/quotas#standard_tables) as standard tables.\n- Query performance for BigLake metastore tables from the BigQuery engine might be slow compared to querying data in a standard BigQuery table. In general, the query performance for a BigLake metastore table should be equivalent to reading the data directly from Cloud Storage.\n- A [dry run](/bigquery/docs/running-queries#dry-run) of a query that uses a BigLake metastore table might report a lower bound of 0 bytes of data, even if rows are returned. This result occurs because the amount of data that is processed from the table can't be determined until the actual query completes. Running the query incurs a cost for processing this data.\n- You can't reference a BigLake metastore table in a [wildcard table](/bigquery/docs/querying-wildcard-tables) query.\n- You can't use the [`tabledata.list` method](/bigquery/docs/reference/rest/v2/tabledata/list) to retrieve data from BigLake metastore tables. Instead, you can save query results to a destination table, then use the `tabledata.list` method on that table.\n- BigLake metastore tables don't support [clustering](/bigquery/docs/clustered-tables).\n- BigLake metastore tables don't support [flexible column names](/bigquery/docs/schemas#flexible-column-names).\n- The display of table storage statistics for BigLake metastore tables isn't supported.\n- BigLake metastore doesn't support Iceberg views.\n\nWhat's next\n\n- [Use BigLake metastore with Dataproc](/bigquery/docs/blms-use-dataproc)\n- [Use BigLake metastore with Dataproc Serverless](/bigquery/docs/blms-use-dataproc-serverless)"]]