importbigframes.pandasasbpd# Set BigQuery DataFrames options# Note: The project option is not required in all environments.# On BigQuery Studio, the project ID is automatically detected.bpd.options.bigquery.project=your_gcp_project_id# Use "partial" ordering mode to generate more efficient queries, but the# order of the rows in DataFrames may not be deterministic if you have not# explictly sorted it. Some operations that depend on the order, such as# head() will not function until you explictly order the DataFrame. Set the# ordering mode to "strict" (default) for more pandas compatibility.bpd.options.bigquery.ordering_mode="partial"# Create a DataFrame from a BigQuery tablequery_or_table="bigquery-public-data.ml_datasets.penguins"df=bpd.read_gbq(query_or_table)# Efficiently preview the results using the .peek() method.df.peek()
Modify the bpd.options.bigquery.project = your_gcp_project_id line to
specify your Trusted Cloud project ID. For example,
bpd.options.bigquery.project = "myProjectID".
Run the code cell.
The code returns a DataFrame object with data about penguins.
Create a new code cell in the notebook and add the following code:
# Use the DataFrame just as you would a pandas DataFrame, but calculations# happen in the BigQuery query engine instead of the local system.average_body_mass=df["body_mass_g"].mean()print(f"average_body_mass: {average_body_mass}")
Run the code cell.
The code calculates the average body mass of the penguins and prints it to the
Trusted Cloud console.
Create a new code cell in the notebook and add the following code:
# Create the Linear Regression modelfrombigframes.ml.linear_modelimportLinearRegression# Filter down to the data we want to analyzeadelie_data=df[df.species=="Adelie Penguin (Pygoscelis adeliae)"]# Drop the columns we don't care aboutadelie_data=adelie_data.drop(columns=["species"])# Drop rows with nulls to get our training datatraining_data=adelie_data.dropna()# Pick feature columns and label columnX=training_data[["island","culmen_length_mm","culmen_depth_mm","flipper_length_mm","sex",]]y=training_data[["body_mass_g"]]model=LinearRegression(fit_intercept=False)model.fit(X,y)model.score(X,y)
Run the code cell.
The code returns the model's evaluation metrics.
Clean up
The easiest way to eliminate billing is to delete the project that you
created for the tutorial.
To delete the project:
In the Trusted Cloud console, go to the Manage resources page.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eThis quickstart demonstrates how to use the BigQuery DataFrames API within a BigQuery notebook to perform data analysis and machine learning tasks on a public dataset of penguins.\u003c/p\u003e\n"],["\u003cp\u003eYou'll learn how to create a DataFrame from a BigQuery table, calculate the average body mass of penguins, and build a linear regression model.\u003c/p\u003e\n"],["\u003cp\u003eThe tutorial covers essential steps like creating and configuring a BigQuery notebook, setting up project-specific options, and handling data subsets, as well as cleaning up training data by removing null values and irrelevant columns.\u003c/p\u003e\n"],["\u003cp\u003eThe quickstart guides you through fitting and scoring a linear regression model using the penguin dataset and provides instructions on how to clean up resources and delete the project to eliminate billing after completing the tutorial.\u003c/p\u003e\n"],["\u003cp\u003eBefore creating and running the notebook, ensure that the BigQuery API is enabled for your project and that you have the proper IAM roles including BigQuery User, Notebook Runtime User and Code Creator.\u003c/p\u003e\n"]]],[],null,["# Try BigQuery DataFrames\n=======================\n\nUse this quickstart to perform the following analysis and machine learning (ML)\ntasks by using the\n[BigQuery DataFrames API](/python/docs/reference/bigframes/latest) in a\n[BigQuery notebook](/bigquery/docs/notebooks-introduction):\n\n- Create a DataFrame over the `bigquery-public-data.ml_datasets.penguins` public dataset.\n- Calculate the average body mass of a penguin.\n- Create a [linear regression model](/python/docs/reference/bigframes/latest/bigframes.ml.linear_model.LinearRegression).\n- Create a DataFrame over a subset of the penguin data to use as training data.\n- Clean up the training data.\n- Set the model parameters.\n- [Fit](/python/docs/reference/bigframes/latest/bigframes.ml.linear_model.LinearRegression#bigframes_ml_linear_model_LinearRegression_fit) the model.\n- [Score](/python/docs/reference/bigframes/latest/bigframes.ml.linear_model.LinearRegression#bigframes_ml_linear_model_LinearRegression_score) the model.\n\nBefore you begin\n----------------\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n1.\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n2. Verify that the BigQuery API is enabled.\n\n [Enable the API](https://console.cloud.google.com/flows/enableapi?apiid=bigquery)\n\n If you created a new project, the BigQuery API is automatically\n enabled.\n\n\u003cbr /\u003e\n\n### Required permissions\n\nTo create and run notebooks, you need the following Identity and Access Management (IAM)\nroles:\n\n- [BigQuery User (`roles/bigquery.user`)](/bigquery/docs/access-control#bigquery.user)\n- [Notebook Runtime User (`roles/aiplatform.notebookRuntimeUser`)](/vertex-ai/docs/general/access-control#aiplatform.notebookRuntimeUser)\n- [Code Creator (`roles/dataform.codeCreator`)](/dataform/docs/access-control#dataform.codeCreator)\n\nCreate a notebook\n-----------------\n\nFollow the instructions in [Create a notebook from the BigQuery editor](/bigquery/docs/create-notebooks#create-notebook-console) to create a new notebook.\n\nTry BigQuery DataFrames\n-----------------------\n\nTry BigQuery DataFrames by following these steps:\n\n1. Create a new code cell in the notebook.\n2. Add the following code to the code cell:\n\n import bigframes.pandas as bpd\n\n # Set BigQuery DataFrames options\n # Note: The project option is not required in all environments.\n # On BigQuery Studio, the project ID is automatically detected.\n bpd.options.bigquery.project = your_gcp_project_id\n\n # Use \"partial\" ordering mode to generate more efficient queries, but the\n # order of the rows in DataFrames may not be deterministic if you have not\n # explictly sorted it. Some operations that depend on the order, such as\n # head() will not function until you explictly order the DataFrame. Set the\n # ordering mode to \"strict\" (default) for more pandas compatibility.\n bpd.options.bigquery.ordering_mode = \"partial\"\n\n # Create a DataFrame from a BigQuery table\n query_or_table = \"bigquery-public-data.ml_datasets.penguins\"\n df = bpd.read_gbq(query_or_table)\n\n # Efficiently preview the results using the .peek() method.\n df.peek()\n\n3. Modify the `bpd.options.bigquery.project = your_gcp_project_id` line to\n specify your Google Cloud project ID. For example,\n `bpd.options.bigquery.project = \"myProjectID\"`.\n\n4. Run the code cell.\n\n The code returns a `DataFrame` object with data about penguins.\n5. Create a new code cell in the notebook and add the following code:\n\n # Use the DataFrame just as you would a pandas DataFrame, but calculations\n # happen in the BigQuery query engine instead of the local system.\n average_body_mass = df[\"body_mass_g\"].mean()\n print(f\"average_body_mass: {average_body_mass}\")\n\n6. Run the code cell.\n\n The code calculates the average body mass of the penguins and prints it to the\n Google Cloud console.\n7. Create a new code cell in the notebook and add the following code:\n\n # Create the Linear Regression model\n from bigframes.ml.linear_model import LinearRegression\n\n # Filter down to the data we want to analyze\n adelie_data = df[df.species == \"Adelie Penguin (Pygoscelis adeliae)\"]\n\n # Drop the columns we don't care about\n adelie_data = adelie_data.drop(columns=[\"species\"])\n\n # Drop rows with nulls to get our training data\n training_data = adelie_data.dropna()\n\n # Pick feature columns and label column\n X = training_data[\n [\n \"island\",\n \"culmen_length_mm\",\n \"culmen_depth_mm\",\n \"flipper_length_mm\",\n \"sex\",\n ]\n ]\n y = training_data[[\"body_mass_g\"]]\n\n model = LinearRegression(fit_intercept=False)\n model.fit(X, y)\n model.score(X, y)\n\n8. Run the code cell.\n\n The code returns the model's evaluation metrics.\n\nClean up\n--------\n\n\nThe easiest way to eliminate billing is to delete the project that you\ncreated for the tutorial.\n\nTo delete the project:\n\n| **Caution** : Deleting a project has the following effects:\n|\n| - **Everything in the project is deleted.** If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.\n| - **Custom project IDs are lost.** When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an `appspot.com` URL, delete selected resources inside the project instead of deleting the whole project.\n|\n|\n| If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects\n| can help you avoid exceeding project quota limits.\n1. In the Google Cloud console, go to the **Manage resources** page.\n\n [Go to Manage resources](https://console.cloud.google.com/iam-admin/projects)\n2. In the project list, select the project that you want to delete, and then click **Delete**.\n3. In the dialog, type the project ID, and then click **Shut down** to delete the project.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nWhat's next\n-----------\n\n- Continue learning how to [use BigQuery DataFrames](/bigquery/docs/use-bigquery-dataframes).\n- Learn how to [visualize graphs using BigQuery DataFrames](/bigquery/docs/dataframes-visualizations).\n- Learn how to [use a BigQuery DataFrames notebook](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks/getting_started/getting_started_bq_dataframes.ipynb)."]]