In this tutorial, you query data from a
BigQuery public dataset
and explore the query results in a notebook.
Objectives
Create and run a query in BigQuery.
Explore query results in a notebook.
Costs
This tutorial uses a dataset available through the
Trusted Cloud by S3NS Public Datasets Program.
Google pays for the storage of these datasets and provides public access to the
data. You incur charges for the queries that you perform on the data. For
more information, see
BigQuery pricing.
Before you begin
In the Trusted Cloud console, on the project selector page,
select or create a Trusted Cloud project.
For new projects, BigQuery is automatically enabled.
Set the default region for code assets
If this is the first time you are creating a code asset, you should set the
default region for code assets. You can't change the region for a code asset
after it is created.
All code assets in BigQuery Studio use the same default region.
To set the default region for code assets, follow these steps:
You can run a SQL query and then use a notebook to explore the data. This
approach is useful if you want to modify the data in BigQuery
before working with it, or if you need only a subset of the fields in the table.
In the Trusted Cloud console, go to the BigQuery page.
In the Type to search field, enter bigquery-public-data.
If the project is not shown, enter bigquery in the search field, and then
click Search to all projects to match the search string with the
existing projects.
In the Setup code block, click
play_circleRun cell.
Explore the data
To load the penguins data into a BigQuery DataFrame
and show the results, click play_circleRun cell in the code block in the
Result set loaded from BigQuery job as a DataFrame section.
To get descriptive metrics for the data, click
play_circleRun cell in the code block
in the Show descriptive statistics using describe() section.
Optional: Use other Python functions or packages to explore and analyze
the data.
The following code sample shows using
bigframes.pandas
to analyze data, and bigframes.ml
to create a linear regression model from penguins data in a
BigQuery DataFrame:
importbigframes.pandasasbpd# Load data from BigQueryquery_or_table="bigquery-public-data.ml_datasets.penguins"bq_df=bpd.read_gbq(query_or_table)# Inspect one of the columns (or series) of the DataFrame:bq_df["body_mass_g"]# Compute the mean of this series:average_body_mass=bq_df["body_mass_g"].mean()print(f"average_body_mass: {average_body_mass}")# Find the heaviest species using the groupby operation to calculate the# mean body_mass_g:(bq_df["body_mass_g"].groupby(by=bq_df["species"]).mean().sort_values(ascending=False).head(10))# Create the Linear Regression modelfrombigframes.ml.linear_modelimportLinearRegression# Filter down to the data we want to analyzeadelie_data=bq_df[bq_df.species=="Adelie Penguin (Pygoscelis adeliae)"]# Drop the columns we don't care aboutadelie_data=adelie_data.drop(columns=["species"])# Drop rows with nulls to get our training datatraining_data=adelie_data.dropna()# Pick feature columns and label columnX=training_data[["island","culmen_length_mm","culmen_depth_mm","flipper_length_mm","sex",]]y=training_data[["body_mass_g"]]model=LinearRegression(fit_intercept=False)model.fit(X,y)model.score(X,y)
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this
tutorial, either delete the project that contains the resources, or keep the project and
delete the individual resources.
The easiest way to eliminate billing is to delete the Trusted Cloud project
that you created for this tutorial.
In the Trusted Cloud console, go to the Manage resources page.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-26 UTC."],[[["\u003cp\u003eYou can utilize Colab Enterprise notebooks within BigQuery to explore and analyze data from your query results.\u003c/p\u003e\n"],["\u003cp\u003eThis tutorial guides you through querying a public BigQuery dataset and then exploring its results within a notebook environment.\u003c/p\u003e\n"],["\u003cp\u003eBefore you can create and run notebooks, you must have a Google Cloud project with billing enabled, along with enabling the BigQuery API and BigQuery Studio, as well as having the necessary IAM roles.\u003c/p\u003e\n"],["\u003cp\u003eAfter running a SQL query, the data can be further explored by loading it into a BigQuery DataFrame, running descriptive statistics and implementing other python packages.\u003c/p\u003e\n"],["\u003cp\u003eWhen you are finished with this tutorial you can delete the project to avoid any further costs, or you can keep the project but delete the resources within it.\u003c/p\u003e\n"]]],[],null,["# Explore query results in notebooks\n\n*** ** * ** ***\n\nYou can explore BigQuery query results by using\n[Colab Enterprise notebooks](/colab/docs/introduction) in\nBigQuery.\n\nIn this tutorial, you query data from a\n[BigQuery public dataset](/bigquery/public-data)\nand explore the query results in a notebook.\n\nObjectives\n----------\n\n- Create and run a query in BigQuery.\n- Explore query results in a notebook.\n\nCosts\n-----\n\nThis tutorial uses a dataset available through the\n[Google Cloud Public Datasets Program](https://cloud.google.com/blog/products/data-analytics/big-data-analytics-in-the-cloud-with-free-public-datasets).\nGoogle pays for the storage of these datasets and provides public access to the\ndata. You incur charges for the queries that you perform on the data. For\nmore information, see\n[BigQuery pricing](/bigquery/pricing).\n\nBefore you begin\n----------------\n\n1. In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n2.\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n3.\n\n\n Enable the BigQuery API.\n\n\n [Enable the API](https://console.cloud.google.com/flows/enableapi?apiid=bigquery)\n\n For new projects, BigQuery is automatically enabled.\n\nSet the default region for code assets\n--------------------------------------\n\nIf this is the first time you are creating a code asset, you should set the\ndefault region for code assets. You can't change the region for a code asset\nafter it is created.\n| **Note:** If you create a code asset and choose a different default region than the one you have been using for code assets---for example, choosing `us-west1` when you have been using `us-central1`---then that code asset and all code assets you create afterwards use that new region by default. Existing code assets continue to use the region they were assigned when they were created.\n\nAll code assets in BigQuery Studio use the same default region.\nTo set the default region for code assets, follow these steps:\n\n1. Go to the **BigQuery** page.\n\n [Go to BigQuery](https://console.cloud.google.com/bigquery)\n2. In the **Explorer** pane, find the project in which you have enabled code\n assets.\n\n3. Click more_vert\n **View actions** next to the project, and then click\n **Change my default code region**.\n\n4. For **Region**, select the region that you want to use for code assets.\n\n5. Click **Select**.\n\nFor a list of supported regions, see [BigQuery Studio locations](/bigquery/docs/locations#bqstudio-loc). \n\n### Required permissions\n\nTo create and run notebooks, you need the following Identity and Access Management (IAM)\nroles:\n\n- [BigQuery User (`roles/bigquery.user`)](/bigquery/docs/access-control#bigquery.user)\n- [Notebook Runtime User (`roles/aiplatform.notebookRuntimeUser`)](/vertex-ai/docs/general/access-control#aiplatform.notebookRuntimeUser)\n- [Code Creator (`roles/dataform.codeCreator`)](/dataform/docs/access-control#dataform.codeCreator)\n\nOpen query results in a notebook\n--------------------------------\n\nYou can run a SQL query and then use a notebook to explore the data. This\napproach is useful if you want to modify the data in BigQuery\nbefore working with it, or if you need only a subset of the fields in the table.\n\n1. In the Google Cloud console, go to the **BigQuery** page.\n\n [Go to BigQuery](https://console.cloud.google.com/bigquery)\n2. In the **Type to search** field, enter `bigquery-public-data`.\n\n If the project is not shown, enter `bigquery` in the search field, and then\n click **Search to all projects** to match the search string with the\n existing projects.\n3. Select **bigquery-public-data \\\u003e ml_datasets \\\u003e penguins**.\n\n4. For the **penguins** table,\n click more_vert **View actions** ,\n and then click **Query**.\n\n5. Add an asterisk (`*`) for field selection to the generated query, so that\n it reads like the following example:\n\n ```googlesql\n SELECT * FROM `bigquery-public-data.ml_datasets.penguins` LIMIT 1000;\n ```\n6. Click play_circle **Run**.\n\n7. In the **Query results** section, click **Explore data** , and then click\n **Explore with Python notebook**.\n\nPrepare the notebook for use\n----------------------------\n\nPrepare the notebook for use by connecting to a runtime and setting application\ndefault values.\n\n1. In the notebook header, click **Connect** to [connect to the default runtime](/bigquery/docs/create-notebooks#connect_to_the_default_runtime).\n2. In the **Setup** code block, click play_circle **Run cell**.\n\nExplore the data\n----------------\n\n1. To load the **penguins** data into a [BigQuery DataFrame](/bigquery/docs/reference/bigquery-dataframes) and show the results, click play_circle **Run cell** in the code block in the **Result set loaded from BigQuery job as a DataFrame** section.\n2. To get descriptive metrics for the data, click play_circle **Run cell** in the code block in the **Show descriptive statistics using describe()** section.\n3. Optional: Use other Python functions or packages to explore and analyze the data.\n\nThe following code sample shows using\n[`bigframes.pandas`](/bigquery/docs/use-bigquery-dataframes)\nto analyze data, and [`bigframes.ml`](/bigquery/docs/use-bigquery-dataframes#ml-capabilities)\nto create a linear regression model from **penguins** data in a\nBigQuery DataFrame: \n\n import bigframes.pandas as bpd\n\n # Load data from BigQuery\n query_or_table = \"bigquery-public-data.ml_datasets.penguins\"\n bq_df = bpd.read_gbq(query_or_table)\n\n # Inspect one of the columns (or series) of the DataFrame:\n bq_df[\"body_mass_g\"]\n\n # Compute the mean of this series:\n average_body_mass = bq_df[\"body_mass_g\"].mean()\n print(f\"average_body_mass: {average_body_mass}\")\n\n # Find the heaviest species using the groupby operation to calculate the\n # mean body_mass_g:\n (\n bq_df[\"body_mass_g\"]\n .groupby(by=bq_df[\"species\"])\n .mean()\n .sort_values(ascending=False)\n .head(10)\n )\n\n # Create the Linear Regression model\n from bigframes.ml.linear_model import LinearRegression\n\n # Filter down to the data we want to analyze\n adelie_data = bq_df[bq_df.species == \"Adelie Penguin (Pygoscelis adeliae)\"]\n\n # Drop the columns we don't care about\n adelie_data = adelie_data.drop(columns=[\"species\"])\n\n # Drop rows with nulls to get our training data\n training_data = adelie_data.dropna()\n\n # Pick feature columns and label column\n X = training_data[\n [\n \"island\",\n \"culmen_length_mm\",\n \"culmen_depth_mm\",\n \"flipper_length_mm\",\n \"sex\",\n ]\n ]\n y = training_data[[\"body_mass_g\"]]\n\n model = LinearRegression(fit_intercept=False)\n model.fit(X, y)\n model.score(X, y)\n\nClean up\n--------\n\n\nTo avoid incurring charges to your Google Cloud account for the resources used in this\ntutorial, either delete the project that contains the resources, or keep the project and\ndelete the individual resources.\n\nThe easiest way to eliminate billing is to delete the Google Cloud project\nthat you created for this tutorial.\n\n| **Caution** : Deleting a project has the following effects:\n|\n| - **Everything in the project is deleted.** If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.\n| - **Custom project IDs are lost.** When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an `appspot.com` URL, delete selected resources inside the project instead of deleting the whole project.\n|\n|\n| If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects\n| can help you avoid exceeding project quota limits.\n1. In the Google Cloud console, go to the **Manage resources** page.\n\n [Go to Manage resources](https://console.cloud.google.com/iam-admin/projects)\n2. In the project list, select the project that you want to delete, and then click **Delete**.\n3. In the dialog, type the project ID, and then click **Shut down** to delete the project.\n\nWhat's next\n-----------\n\n- Learn more about [creating notebooks in BigQuery](/bigquery/docs/create-notebooks).\n- Learn more about [exploring data with BigQuery DataFrames](/bigquery/docs/use-bigquery-dataframes)."]]