This document explains how to use Apache Spark stored procedures with
BigLake metastore.
Before you begin
- Enable billing for your Trusted Cloud project. Learn how to
check if billing is enabled on a project.
Enable the BigQuery and Dataflow APIs.
Enable the APIs
Optional: Learn more about the following:
Required roles
To use Spark stored procedures, review the required roles for stored procedures
and grant the necessary roles.
To get the permissions that
you need to use Spark and stored procedures with BigLake metastore as a metadata store,
ask your administrator to grant you the
following IAM roles:
-
Create BigLake metastore tables in Spark:
-
BigQuery Data Editor (
roles/bigquery.dataEditor
)
on the Spark Connection service account in the project
-
Storage Object Admin (
roles/storage.objectAdmin
)
on the Spark Connection service account in the project
-
Query BigLake metastore tables in BigQuery:
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get
the required permissions through custom
roles or other predefined
roles.
Create and run a stored procedure
The following example shows you how to create and run a stored procedure with
BigLake metastore.
Go to the BigQuery page.
Go to BigQuery
In the query editor, add the following sample code for the CREATE PROCEDURE
statement.
CREATE OR REPLACE PROCEDURE
`PROJECT_ID.BQ_DATASET_ID.PROCEDURE_NAME`()
WITH CONNECTION `PROJECT_ID.REGION.SPARK_CONNECTION_ID` OPTIONS (engine='SPARK',
runtime_version='1.1',
properties=[("spark.sql.catalog.CATALOG_NAME.warehouse",
"WAREHOUSE_DIRECTORY"),
("spark.sql.catalog.CATALOG_NAME.gcp_location",
"LOCATION"),
("spark.sql.catalog.CATALOG_NAME.gcp_project",
"PROJECT_ID"),
("spark.sql.catalog.CATALOG_NAME",
"org.apache.iceberg.spark.SparkCatalog"),
("spark.sql.catalog.CATALOG_NAME.catalog-impl",
"org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog"),
("spark.jars.packages",
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1")],
jar_uris=["gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar"])
LANGUAGE python AS R"""
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("BigLake Metastore Iceberg") \
.getOrCreate()
spark.sql("USE CATALOG_NAME;")
spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;")
spark.sql("USE NAMESPACE_NAME;")
spark.sql("CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG LOCATION 'WAREHOUSE_DIRECTORY'")
spark.sql("DESCRIBE TABLE_NAME;")
spark.sql("INSERT INTO TABLE_NAME VALUES (1, \"first row\");")
spark.sql("SELECT * from TABLE_NAME;")
spark.sql("ALTER TABLE TABLE_NAME ADD COLUMNS (newDoubleCol double);")
spark.sql("DESCRIBE TABLE_NAME;")
""";
CALL `PROJECT_ID.BQ_DATASET_ID.PROCEDURE_NAME`();
Replace the following:
PROJECT_ID
: the ID of your Trusted Cloud project.
BQ_DATASET_ID
: the ID of the dataset in
BigQuery that contains the procedure.
PROCEDURE_NAME
: the name of the procedure
that you're creating or replacing.
REGION
: the location of your
Spark connection.
LOCATION
: the location of your
BigQuery resources.
SPARK_CONNECTION_ID
: the ID of your
Spark connection.
CATALOG_NAME
: the name of the catalog that you're
using.
WAREHOUSE_DIRECTORY
: the URI of the
Cloud Storage folder that contains your data warehouse.
NAMESPACE_NAME
: the namespace that you're using.
What's next
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[],[],null,["# Use BigLake metastore with Spark stored procedures\n==================================================\n\nThis document explains how to use [Apache Spark stored procedures](/bigquery/docs/spark-procedures) with\nBigLake metastore.\n\nBefore you begin\n----------------\n\n1. Enable billing for your Google Cloud project. Learn how to [check if billing is enabled on a project](/billing/docs/how-to/verify-billing-enabled).\n2. Enable the BigQuery and Dataflow APIs.\n\n [Enable the APIs](/billing/docs/how-to/verify-billing-enabled)\n3. Optional: Learn more about the following:\n\n - Understand how [BigLake metastore works](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,dataproc.googleapis.com) and why you should use it.\n - Learn how [BigQuery Spark stored procedures work](/bigquery/docs/spark-procedures) and complete the before you begin tasks.\n\n### Required roles\n\nTo use Spark stored procedures, review the required roles for [stored procedures](/bigquery/docs/spark-procedures#required_roles)\nand grant the necessary roles.\n\n\nTo get the permissions that\nyou need to use Spark and stored procedures with BigLake metastore as a metadata store,\n\nask your administrator to grant you the\nfollowing IAM roles:\n\n- Create BigLake metastore tables in Spark:\n - [BigQuery Data Editor](/iam/docs/roles-permissions/bigquery#bigquery.dataEditor) (`roles/bigquery.dataEditor`) on the Spark Connection service account in the project\n - [Storage Object Admin](/iam/docs/roles-permissions/storage#storage.objectAdmin) (`roles/storage.objectAdmin`) on the Spark Connection service account in the project\n- Query BigLake metastore tables in BigQuery:\n - [BigQuery Data Viewer](/iam/docs/roles-permissions/bigquery#bigquery.dataViewer) (`roles/bigquery.dataViewer`) on the project\n - [BigQuery User](/iam/docs/roles-permissions/bigquery#bigquery.user) (`roles/bigquery.user`) on the project\n - [Storage Object Viewer](/iam/docs/roles-permissions/storage#storage.objectViewer) (`roles/storage.objectViewer`) on the project\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nCreate and run a stored procedure\n---------------------------------\n\nThe following example shows you how to create and run a stored procedure with\nBigLake metastore.\n\n1. Go to the **BigQuery** page.\n\n [Go to BigQuery](https://console.cloud.google.com/bigquery)\n2. In the query editor, add the following sample code for the [`CREATE PROCEDURE`\n statement](/bigquery/docs/reference/standard-sql/data-definition-language#create_procedure).\n\n ```googlesql\n CREATE OR REPLACE PROCEDURE\n `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e`()\n WITH CONNECTION `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e.\u003cvar translate=\"no\"\u003eSPARK_CONNECTION_ID\u003c/var\u003e` OPTIONS (engine='SPARK',\n runtime_version='1.1',\n properties=[(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.warehouse\",\n \"\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_location\",\n \"\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_project\",\n \"\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e\",\n \"org.apache.iceberg.spark.SparkCatalog\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.catalog-impl\",\n \"org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog\"),\n (\"spark.jars.packages\",\n \"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1\")],\n jar_uris=[\"gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar\"])\n LANGUAGE python AS R\"\"\"\n from pyspark.sql import SparkSession\n spark = SparkSession \\\n .builder \\\n .appName(\"BigLake Metastore Iceberg\") \\\n .getOrCreate()\n spark.sql(\"USE CATALOG_NAME;\")\n spark.sql(\"CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;\")\n spark.sql(\"USE NAMESPACE_NAME;\")\n spark.sql(\"CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG LOCATION '\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e'\")\n spark.sql(\"DESCRIBE TABLE_NAME;\")\n spark.sql(\"INSERT INTO TABLE_NAME VALUES (1, \\\"first row\\\");\")\n spark.sql(\"SELECT * from \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e;\")\n spark.sql(\"ALTER TABLE \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e ADD COLUMNS (newDoubleCol double);\")\n spark.sql(\"DESCRIBE \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e;\")\n \"\"\";\n CALL `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e`();\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of your Google Cloud project.\n - \u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e: the ID of the dataset in BigQuery that contains the procedure.\n - \u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e: the name of the procedure that you're creating or replacing.\n - \u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e: the location of your Spark connection.\n - \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of your BigQuery resources.\n - \u003cvar translate=\"no\"\u003eSPARK_CONNECTION_ID\u003c/var\u003e: the ID of your Spark connection.\n - \u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e: the name of the catalog that you're using.\n - \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the URI of the Cloud Storage folder that contains your data warehouse.\n - \u003cvar translate=\"no\"\u003eNAMESPACE_NAME\u003c/var\u003e: the namespace that you're using.\n\nWhat's next\n-----------\n\n- Set up [optional BigLake metastore features](/bigquery/docs/blms-features).\n- [View and query tables from Spark in the BigQuery console](/bigquery/docs/blms-query-tables)."]]