BigQuery DataFrames is a set of open source Python libraries that let
you take advantage of BigQuery data processing by using familiar
Python APIs. BigQuery DataFrames provides a Pythonic DataFrame powered
by the BigQuery engine, and it implements the pandas and
scikit-learn APIs by pushing the processing down to BigQuery
through SQL conversion. This lets you use BigQuery to explore
and process terabytes of data, and also train machine learning (ML) models,
all with Python APIs.
The following diagram describes the workflow of BigQuery DataFrames:
BigQuery DataFrames benefits
BigQuery DataFrames does the following:
Offers more than 750 pandas and scikit-learn APIs implemented through
transparent SQL conversion to BigQuery and
BigQuery ML APIs.
Defers the execution of queries for enhanced performance.
Extends data transformations with user-defined Python functions to let
you process data in Trusted Cloud by S3NS. These functions are
automatically deployed as BigQuery
remote functions.
Integrates with Vertex AI to let you use Gemini models
for text generation.
BigQuery quotas apply to
BigQuery DataFrames, including hardware, software, and network
components.
A subset of pandas and scikit-learn APIs are supported. For more
information, see
Supported pandas APIs.
You must explicitly clean up any automatically created Cloud Run functions
functions as part of session cleanup. For more information, see
Supported pandas APIs.
Pricing
BigQuery DataFrames is a set of open source Python libraries
available for download at no extra cost.
BigQuery DataFrames uses BigQuery,
Cloud Run functions, Vertex AI, and other
Trusted Cloud by S3NS services, which incur their own costs.
During regular usage, BigQuery DataFrames stores temporary data,
such as intermediate results, in BigQuery tables. These
tables persist for seven days by default, and you are charged for the data
stored in them. The tables are created in the _anonymous_ dataset
in the Trusted Cloud project you specify in the
bf.options.bigquery.project option.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eBigQuery DataFrames are open-source Python libraries that enable users to leverage BigQuery's data processing power through familiar Python APIs.\u003c/p\u003e\n"],["\u003cp\u003eIt offers over 750 implemented pandas and scikit-learn APIs by converting them transparently into SQL for BigQuery and BigQuery ML API processing.\u003c/p\u003e\n"],["\u003cp\u003eBigQuery DataFrames enhances performance by deferring query execution and allowing user-defined Python functions for data transformation, which are automatically deployed as BigQuery remote functions.\u003c/p\u003e\n"],["\u003cp\u003eThe libraries integrate with Vertex AI for text generation with Gemini models, alongside other external packages like Ibis, pandas, and scikit-learn, and is distributed under the Apache-2.0 license.\u003c/p\u003e\n"],["\u003cp\u003eUsers should be aware of BigQuery quotas, the subset of supported pandas and scikit-learn APIs, and that the usage of BigQuery, Cloud Run functions, and Vertex AI may incur additional costs.\u003c/p\u003e\n"]]],[],null,["# Introduction to BigQuery DataFrames\n===================================\n\nBigQuery DataFrames is a set of open source Python libraries that let\nyou take advantage of BigQuery data processing by using familiar\nPython APIs. BigQuery DataFrames provides a Pythonic DataFrame powered\nby the BigQuery engine, and it implements the pandas and\nscikit-learn APIs by pushing the processing down to BigQuery\nthrough SQL conversion. This lets you use BigQuery to explore\nand process terabytes of data, and also train machine learning (ML) models,\nall with Python APIs.\n\nThe following diagram describes the workflow of BigQuery DataFrames:\n\n| **Note:** There are breaking changes to some default parameters in BigQuery DataFrames version 2.0. To learn about these changes and how to migrate to version 2.0, see [Migrate to BigQuery DataFrames\n| 2.0](/bigquery/docs/use-bigquery-dataframes#version-2).\n\nBigQuery DataFrames benefits\n----------------------------\n\nBigQuery DataFrames does the following:\n\n- Offers more than 750 pandas and scikit-learn APIs implemented through transparent SQL conversion to BigQuery and BigQuery ML APIs.\n- Defers the execution of queries for enhanced performance.\n- Extends data transformations with user-defined Python functions to let you process data in Google Cloud. These functions are automatically deployed as BigQuery [remote functions](/bigquery/docs/remote-functions).\n- Integrates with Vertex AI to let you use Gemini models for text generation.\n\nLicensing\n---------\n\nBigQuery DataFrames is distributed with the\n[Apache-2.0 license](https://github.com/googleapis/python-bigquery-dataframes/blob/main/LICENSE).\n\nBigQuery DataFrames also contains code derived from the following\nthird-party packages:\n\n- [Ibis](https://ibis-project.org/)\n- [pandas](https://pandas.pydata.org/)\n- [Python](https://www.python.org/)\n- [scikit-learn](https://scikit-learn.org/)\n- [XGBoost](https://xgboost.readthedocs.io/en/stable/)\n\nFor details, see the\n[`third_party/bigframes_vendored`](https://github.com/googleapis/python-bigquery-dataframes/tree/main/third_party/bigframes_vendored)\ndirectory in the BigQuery DataFrames GitHub repository.\n\nQuotas and limits\n-----------------\n\n- [BigQuery quotas](/bigquery/quotas) apply to BigQuery DataFrames, including hardware, software, and network components.\n- A subset of pandas and scikit-learn APIs are supported. For more information, see [Supported pandas APIs](/python/docs/reference/bigframes/latest/supported_pandas_apis).\n- You must explicitly clean up any automatically created Cloud Run functions functions as part of session cleanup. For more information, see [Supported pandas APIs](/python/docs/reference/bigframes/latest/supported_pandas_apis).\n\nPricing\n-------\n\n- BigQuery DataFrames is a set of open source Python libraries available for download at no extra cost.\n- BigQuery DataFrames uses BigQuery, Cloud Run functions, Vertex AI, and other Google Cloud services, which incur their own costs.\n- During regular usage, BigQuery DataFrames stores temporary data, such as intermediate results, in BigQuery tables. These tables persist for seven days by default, and you are charged for the data stored in them. The tables are created in the `_anonymous_` dataset in the Google Cloud project you specify in the [`bf.options.bigquery.project` option](/python/docs/reference/bigframes/latest/bigframes._config.bigquery_options.BigQueryOptions).\n\nWhat's next\n-----------\n\n- Try the [BigQuery DataFrames quickstart](/bigquery/docs/dataframes-quickstart).\n- Learn how to [use BigQuery DataFrames](/bigquery/docs/use-bigquery-dataframes).\n- Learn how to [visualize graphs using BigQuery DataFrames](/bigquery/docs/dataframes-visualizations).\n- Learn how to [use the `dbt-bigquery` adapter](/bigquery/docs/dataframes-dbt)."]]