To customize your BigLake metastore configuration, you can use the following
additional features:
- Apache Spark Iceberg procedures
- The filter option for unsupported tables
- BigQuery connection overrides
- Access control policies for BigLake metastore Iceberg
tables
Use Iceberg Spark procedures
To use Iceberg Spark procedures,
you must include Iceberg SQL extensions
in your Spark configuration. For example, you can
create a procedure to roll back to a previous state.
Use interactive Spark-SQL to roll back to a previous state
You can use an Iceberg Spark
procedure to create, modify, and roll back a table to its previous state. For
example:
Create a Spark table:
spark-sql \
--jars https://storage-download.googleapis.com/maven-central/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.6.1/iceberg-spark-runtime-3.5_2.12-1.6.1.jar,gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.CATALOG_NAME.catalog-impl=org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog \
--conf spark.sql.catalog.CATALOG_NAME.gcp_project=PROJECT_ID \
--conf spark.sql.catalog.CATALOG_NAME.warehouse=WAREHOUSE_DIRECTORY
Replace the following:
USE `CATALOG_NAME`;
CREATE NAMESPACE NAMESPACE_NAME;
USE NAMESPACE NAMESPACE_NAME;
CREATE TABLE NAMESPACE_NAME.TABLE_NAME (id int, data string) USING ICEBERG LOCATION 'WAREHOUSE_DIRECTORY';
INSERT INTO NAMESPACE_NAME.TABLE_NAME VALUES (1, "first row");
DESCRIBE EXTENDED TABLE_NAME;
Replace the following:
NAMESPACE_NAME
: the namespace name that
references your Spark table.
TABLE_NAME
: a table name that references
your Spark table.
The output contains details about the table configuration:
...
Table Properties [current-snapshot-id=1659239298328512231,format=iceberg/parquet,format-version=2,write.parquet.compression-codec=zstd]
...
Alter the table again, and then roll it back to the previously created
snapshot 1659239298328512231
:
ALTER TABLE TABLE_NAME ADD COLUMNS (newDoubleCol double);
INSERT INTO TABLE_NAME VALUES (2, "second row", 2.5);
SELECT * FROM TABLE_NAME;
CALL CATALOG_NAME.system.set_current_snapshot('NAMESPACE_NAME.TABLE_NAME', SNAPSHOT_ID);
SELECT * FROM TABLE_NAME;
Replace the following:
SNAPSHOT_ID
: the ID of the snapshot you are
rolling back to.
The output is similar to the following:
1 first row
Time taken: 0.997 seconds, Fetched 1 row(s)
Filter unsupported tables from table listing functions
When you use Spark SQL with the BigLake metastore
catalog, the SHOW TABLES
command shows all the tables in the specified
namespace, even those that aren't compatible with
Spark.
To only display supported tables, turn on the filter_unsupported_tables
option:
spark-sql
--jars https://storage-download.googleapis.com/maven-central/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.6.1/iceberg-spark-runtime-3.5_2.12-1.6.1.jar,gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar \
--conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.CATALOG_NAME.catalog-impl=org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog \
--conf spark.sql.catalog.CATALOG_NAME.gcp_project=PROJECT_ID \
--conf spark.sql.catalog.CATALOG_NAME.gcp_location=LOCATION \
--conf spark.sql.catalog.CATALOG_NAME.warehouse=WAREHOUSE_DIRECTORY \
--conf spark.sql.catalog.CATALOG_NAME.filter_unsupported_tables="true"
Replace the following:
CATALOG_NAME
: the name of the
Spark catalog to use.
PROJECT_ID
: the ID of the Trusted Cloud project
to use.
LOCATION
: the location of the BigQuery
resources.
WAREHOUSE_DIRECTORY
: the Cloud Storage
folder to use as the data warehouse.
Set a BigQuery connection override
You can use BigQuery connections to access data stored
outside of BigQuery, such as in Cloud Storage.
To set a BigQuery connection
override that provides access to a Cloud Storage bucket, complete
the following steps:
In your BigQuery project, create a new connection to your
Cloud Storage resource. This connection defines how
BigQuery accesses your data.
Grant the user or service account accessing the data the
roles/bigquery.connectionUser
role on the connection.
Make sure that the connection resource shares the same location as the
target resources in BigQuery. For more information, see
Manage connections.
Specify the connection in your Iceberg table with the
bq_connection
property:
CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG LOCATION 'WAREHOUSE_DIRECTORY' TBLPROPERTIES ('bq_connection'='projects/PROJECT_ID/locations/LOCATION/connections/CONNECTION_ID');
Replace the following:
TABLE_NAME
: a table name for
your Spark table.
WAREHOUSE_DIRECTORY
: the URI of the
Cloud Storage bucket that stores your data.
PROJECT_ID
: the ID of the Trusted Cloud project
to use.
LOCATION
: the location
of the connection.
CONNECTION_ID
: the ID of the connection.
Set access control policies
You can enable fine-grained access control (FGAC) on BigLake metastore
Iceberg tables by configuring access control policies. You
can only set access control policies on tables that use a
BigQuery connection
override. You can set these
policies in the following ways:
After you configure your FGAC policies, you can query the table from
Spark using the following example:
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder \
.appName("BigLake Metastore Iceberg") \
.config("spark.sql.catalog.CATALOG_NAME", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.CATALOG_NAME.catalog-impl", "org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog") \
.config("spark.sql.catalog.CATALOG_NAME.gcp_project", "PROJECT_ID") \
.config("spark.sql.catalog.CATALOG_NAME.gcp_location", "LOCATION") \
.config("spark.sql.catalog.CATALOG_NAME.warehouse", "WAREHOUSE_DIRECTORY") \
.getOrCreate()
spark.sql("USE `CATALOG_NAME`;")
# Configure spark for storing temp results
spark.conf.set("viewsEnabled","true")
spark.sql("CREATE namespace if not exists MATERIALIZATION_NAMESPACE");
spark.conf.set("materializationDataset","MATERIALIZATION_NAMESPACE")
spark.sql("USE NAMESPACE DATASET_NAME;")
sql = """SELECT * FROM DATASET_NAME.ICEBERG_TABLE_NAME"""
df = spark.read.format("bigquery").load(sql)
df.show()
Replace the following:
CATALOG_NAME
: the name of your catalog.
PROJECT_ID
: the ID of the project that contains your
BigQuery resources.
LOCATION
: the
location of the BigQuery resources.
WAREHOUSE_DIRECTORY
: the URI of the
Cloud Storage folder that contains your data warehouse.
MATERIALIZATION_NAMESPACE
: the namespace where you
want to store temporary results.
DATASET_NAME
: the name of your dataset that contains
the table that you are querying.
ICEBERG_TABLE_NAME
: the name the table that you are
querying.
What's next
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-26 UTC."],[],[],null,["Additional BigLake metastore features\n\nTo customize your BigLake metastore configuration, you can use the following\nadditional features:\n\n- Apache Spark Iceberg procedures\n- The filter option for unsupported tables\n- BigQuery connection overrides\n- Access control policies for BigLake metastore Iceberg tables\n\nUse Iceberg Spark procedures\n\nTo use [Iceberg Spark procedures](https://iceberg.apache.org/docs/1.5.1/spark-procedures/),\nyou must include [Iceberg SQL extensions](https://iceberg.apache.org/docs/1.5.1/spark-configuration/#sql-extensions)\nin your Spark configuration. For example, you can\ncreate a procedure to roll back to a previous state.\n\nUse interactive Spark-SQL to roll back to a previous state\n\nYou can use an Iceberg Spark\nprocedure to create, modify, and roll back a table to its previous state. For\nexample:\n\n1. Create a Spark table:\n\n ```bash\n spark-sql \\\n --jars https://storage-download.googleapis.com/maven-central/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.6.1/iceberg-spark-runtime-3.5_2.12-1.6.1.jar,gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar \\\n --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\\n --conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.catalog-impl=org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.gcp_project=PROJECT_ID \\\n --conf spark.sql.catalog.CATALOG_NAME.warehouse=WAREHOUSE_DIRECTORY\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e: the catalog name that references your Spark table.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project.\n\n - \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the URI of the\n Cloud Storage folder where your data warehouse is stored.\n\n ```googlesql\n USE `\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e`;\n CREATE NAMESPACE NAMESPACE_NAME;\n USE NAMESPACE NAMESPACE_NAME;\n CREATE TABLE NAMESPACE_NAME.TABLE_NAME (id int, data string) USING ICEBERG LOCATION '\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e';\n INSERT INTO NAMESPACE_NAME.TABLE_NAME VALUES (1, \"first row\");\n DESCRIBE EXTENDED TABLE_NAME;\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eNAMESPACE_NAME\u003c/var\u003e: the namespace name that references your Spark table.\n - \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e: a table name that references your Spark table.\n\n The output contains details about the table configuration: \n\n ```bash\n ...\n Table Properties [current-snapshot-id=1659239298328512231,format=iceberg/parquet,format-version=2,write.parquet.compression-codec=zstd]\n ...\n ```\n2. Alter the table again, and then roll it back to the previously created\n snapshot `1659239298328512231`:\n\n ```bash\n ALTER TABLE TABLE_NAME ADD COLUMNS (newDoubleCol double);\n INSERT INTO TABLE_NAME VALUES (2, \"second row\", 2.5);\n SELECT * FROM TABLE_NAME;\n CALL CATALOG_NAME.system.set_current_snapshot('\u003cvar translate=\"no\"\u003eNAMESPACE_NAME\u003c/var\u003e.\u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e', SNAPSHOT_ID);\n SELECT * FROM TABLE_NAME;\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eSNAPSHOT_ID\u003c/var\u003e: the ID of the snapshot you are rolling back to.\n\n The output is similar to the following: \n\n ```bash\n 1 first row\n Time taken: 0.997 seconds, Fetched 1 row(s)\n ```\n\nFilter unsupported tables from table listing functions\n\nWhen you use Spark SQL with the BigLake metastore\ncatalog, the `SHOW TABLES` command shows all the tables in the specified\nnamespace, even those that aren't compatible with\nSpark.\n\nTo only display supported tables, turn on the `filter_unsupported_tables`\noption: \n\n```bash\nspark-sql\n --jars https://storage-download.googleapis.com/maven-central/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.6.1/iceberg-spark-runtime-3.5_2.12-1.6.1.jar,gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar \\\n --conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.catalog-impl=org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.gcp_project=PROJECT_ID \\\n --conf spark.sql.catalog.CATALOG_NAME.gcp_location=LOCATION \\\n --conf spark.sql.catalog.CATALOG_NAME.warehouse=WAREHOUSE_DIRECTORY \\\n --conf spark.sql.catalog.CATALOG_NAME.filter_unsupported_tables=\"true\"\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e: the name of the Spark catalog to use.\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project to use.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the BigQuery resources.\n- \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the Cloud Storage folder to use as the data warehouse.\n\nSet a BigQuery connection override\n\nYou can use BigQuery connections to access data stored\noutside of BigQuery, such as in Cloud Storage.\n\nTo set a [BigQuery connection\noverride](/bigquery/docs/create-cloud-resource-connection#create-cloud-resource-connection) that provides access to a Cloud Storage bucket, complete\nthe following steps:\n\n1. In your BigQuery project, create a new connection to your\n Cloud Storage resource. This connection defines how\n BigQuery accesses your data.\n\n2. Grant the user or service account accessing the data the\n `roles/bigquery.connectionUser` role on the connection.\n\n Make sure that the connection resource shares the same location as the\n target resources in BigQuery. For more information, see\n [Manage connections](/bigquery/docs/working-with-connections).\n3. Specify the connection in your Iceberg table with the\n `bq_connection` property:\n\n ```googlesql\n CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG LOCATION '\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e' TBLPROPERTIES ('bq_connection'='projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/connections/\u003cvar translate=\"no\"\u003eCONNECTION_ID\u003c/var\u003e');\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e: a table name for your Spark table.\n - \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the URI of the Cloud Storage bucket that stores your data.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project to use.\n - \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the [location](/bigquery/docs/locations) of the connection.\n - \u003cvar translate=\"no\"\u003eCONNECTION_ID\u003c/var\u003e: the ID of the connection.\n\nSet access control policies\n\nYou can enable fine-grained access control (FGAC) on BigLake metastore\nIceberg tables by configuring access control policies. You\ncan only set access control policies on tables that use a\n[BigQuery connection\noverride](/bigquery/docs/blms-features#connection-override). You can set these\npolicies in the following ways:\n\n- [Column-level security](/bigquery/docs/column-level-security)\n- [Row-level security](/bigquery/docs/managing-row-level-security)\n- [Data masking](/bigquery/docs/column-data-masking)\n\nAfter you configure your FGAC policies, you can query the table from\nSpark using the following example: \n\n```python\nfrom pyspark.sql import SparkSession\n\n# Create a Spark session\nspark = SparkSession.builder \\\n.appName(\"BigLake Metastore Iceberg\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e\", \"org.apache.iceberg.spark.SparkCatalog\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.catalog-impl\", \"org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_project\", \"\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_location\", \"\u003cvar translate=\"no\"\u003eLOCATION\u003cvar translate=\"no\"\u003e\"\u003c/var\u003e\u003c/var\u003e) \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.warehouse\", \"\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e\") \\\n.getOrCreate()\n\nspark.sql(\"USE `\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e`;\")\n\n# Configure spark for storing temp results\nspark.conf.set(\"viewsEnabled\",\"true\")\nspark.sql(\"CREATE namespace if not exists \u003cvar translate=\"no\"\u003eMATERIALIZATION_NAMESPACE\u003c/var\u003e\");\nspark.conf.set(\"materializationDataset\",\"\u003cvar translate=\"no\"\u003eMATERIALIZATION_NAMESPACE\u003c/var\u003e\")\n\nspark.sql(\"USE NAMESPACE \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e;\")\n\nsql = \"\"\"SELECT * FROM \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e.\u003cvar translate=\"no\"\u003eICEBERG_TABLE_NAME\u003c/var\u003e\"\"\"\ndf = spark.read.format(\"bigquery\").load(sql)\ndf.show()\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eCATALOG_NAME\u003c/code\u003e\u003c/var\u003e: the name of your catalog.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003ePROJECT_ID\u003c/code\u003e\u003c/var\u003e: the ID of the project that contains your BigQuery resources.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eLOCATION\u003c/code\u003e\u003c/var\u003e: the [location](/bigquery/docs/locations) of the BigQuery resources.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eWAREHOUSE_DIRECTORY\u003c/code\u003e\u003c/var\u003e: the URI of the Cloud Storage folder that contains your data warehouse.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eMATERIALIZATION_NAMESPACE\u003c/code\u003e\u003c/var\u003e: the namespace where you want to store temporary results.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eDATASET_NAME\u003c/code\u003e\u003c/var\u003e: the name of your dataset that contains the table that you are querying.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eICEBERG_TABLE_NAME\u003c/code\u003e\u003c/var\u003e: the name the table that you are querying.\n\nWhat's next\n\n- [Use BigLake metastore with Dataproc](/bigquery/docs/blms-use-dataproc)\n- [Use BigLake metastore with Dataproc Serverless](/bigquery/docs/blms-use-dataproc-serverless)"]]