- 2.29.0 (latest)
- 2.28.0
- 2.27.0
- 2.26.0
- 2.25.0
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
This module integrates BigQuery built-in AI functions for use with Series/DataFrame objects, such as AI.GENERATE_BOOL: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool
Modules Functions
classify
classify(
input: typing.Union[
str,
bigframes.series.Series,
pandas.core.series.Series,
typing.List[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series]
],
typing.Tuple[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series], ...
],
],
categories: tuple[str, ...] | list[str],
*,
connection_id: str | None = None
) -> bigframes.series.SeriesClassifies a given input into one of the specified categories. It will always return one of the provided categories best fit the prompt input.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> df = bpd.DataFrame({'creature': ['Cat', 'Salmon']})
>>> df['type'] = bbq.ai.classify(df['creature'], ['Mammal', 'Fish'])
>>> df
creature type
0 Cat Mammal
1 Salmon Fish
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description |
input |
str Series List[str|Series] Tuple[str|Series, ...]
A mixture of Series and string literals that specifies the input to send to the model. The Series can be BigFrames Series or pandas Series. |
categories |
tuple[str, ...] list[str]
Categories to classify the input into. |
connection_id |
str, optional
Specifies the connection to use to communicate with the model. For example, |
| Returns | |
|---|---|
| Type | Description |
bigframes.series.Series |
A new series of strings. |
forecast
forecast(
df: bigframes.dataframe.DataFrame | pandas.core.frame.DataFrame,
*,
data_col: str,
timestamp_col: str,
model: str = "TimesFM 2.0",
id_cols: typing.Optional[typing.Iterable[str]] = None,
horizon: int = 10,
confidence_level: float = 0.95,
context_window: int | None = None
) -> bigframes.dataframe.DataFrameForecast time series at future horizon. Using Google Research's open source TimesFM(https://github.com/google-research/timesfm) model.
| Parameters | |
|---|---|
| Name | Description |
df |
DataFrame
The dataframe that contains the data that you want to forecast. It could be either a BigFrames Dataframe or a pandas DataFrame. If it's a pandas DataFrame, the global BigQuery session will be used to load the data. |
data_col |
str
A str value that specifies the name of the data column. The data column contains the data to forecast. The data column must use one of the following data types: INT64, NUMERIC and FLOAT64 |
timestamp_col |
str
A str value that specified the name of the time points column. The time points column provides the time points used to generate the forecast. The time points column must use one of the following data types: TIMESTAMP, DATE and DATETIME |
model |
str, default "TimesFM 2.0"
A str value that specifies the name of the model. TimesFM 2.0 is the only supported value, and is the default value. |
id_cols |
Iterable[str], optional
An iterable of str value that specifies the names of one or more ID columns. Each ID identifies a unique time series to forecast. Specify one or more values for this argument in order to forecast multiple time series using a single query. The columns that you specify must use one of the following data types: STRING, INT64, ARRAY
|
horizon |
int, default 10
An int value that specifies the number of time points to forecast. The default value is 10. The valid input range is [1, 10,000]. |
confidence_level |
float, default 0.95
A FLOAT64 value that specifies the percentage of the future values that fall in the prediction interval. The default value is 0.95. The valid input range is [0, 1). |
context_window |
int, optional
An int value that specifies the context window length used by BigQuery ML's built-in TimesFM model. The context window length determines how many of the most recent data points from the input time series are use by the model. If you don't specify a value, the AI.FORECAST function automatically chooses the smallest possible context window length to use that is still large enough to cover the number of time series data points in your input data. |
| Exceptions | |
|---|---|
| Type | Description |
ValueError |
when any column ID does not exist in the dataframe. |
| Returns | |
|---|---|
| Type | Description |
DataFrame |
The forecast dataframe matches that of the BigQuery AI.FORECAST function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-forecast |
generate
generate(
prompt: typing.Union[
str,
bigframes.series.Series,
pandas.core.series.Series,
typing.List[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series]
],
typing.Tuple[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series], ...
],
],
*,
connection_id: str | None = None,
endpoint: str | None = None,
request_type: typing.Literal["dedicated", "shared", "unspecified"] = "unspecified",
model_params: typing.Optional[typing.Mapping[typing.Any, typing.Any]] = None,
output_schema: typing.Optional[typing.Mapping[str, str]] = None
) -> bigframes.series.SeriesReturns the AI analysis based on the prompt, which can be any combination of text and unstructured data.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> country = bpd.Series(["Japan", "Canada"])
>>> bbq.ai.generate(("What's the capital city of ", country, " one word only"))
0 {'result': 'Tokyo\n', 'full_response': '{"cand...
1 {'result': 'Ottawa\n', 'full_response': '{"can...
dtype: struct<result: string, full_response: extension<dbjson<JSONArrowType>>, status: string>[pyarrow]
>>> bbq.ai.generate(("What's the capital city of ", country, " one word only")).struct.field("result")
0 Tokyo\n
1 Ottawa\n
Name: result, dtype: string
You get structured output when the output_schema parameter is set:
>>> animals = bpd.Series(["Rabbit", "Spider"])
>>> bbq.ai.generate(animals, output_schema={"number_of_legs": "INT64", "is_herbivore": "BOOL"})
0 {'is_herbivore': True, 'number_of_legs': 4, 'f...
1 {'is_herbivore': False, 'number_of_legs': 8, '...
dtype: struct<is_herbivore: bool, number_of_legs: int64, full_response: extension<dbjson<JSONArrowType>>, status: string>[pyarrow]
| Parameters | |
|---|---|
| Name | Description |
prompt |
str Series List[str|Series] Tuple[str|Series, ...]
A mixture of Series and string literals that specifies the prompt to send to the model. The Series can be BigFrames Series or pandas Series. |
connection_id |
str, optional
Specifies the connection to use to communicate with the model. For example, |
endpoint |
str, optional
Specifies the Vertex AI endpoint to use for the model. For example |
request_type |
Literal["dedicated", "shared", "unspecified"]
Specifies the type of inference request to send to the Gemini model. The request type determines what quota the request uses. * "dedicated": function only uses Provisioned Throughput quota. The function returns the error Provisioned throughput is not purchased or is not active if Provisioned Throughput quota isn't available. * "shared": the function only uses dynamic shared quota (DSQ), even if you have purchased Provisioned Throughput quota. * "unspecified": If you haven't purchased Provisioned Throughput quota, the function uses DSQ quota. If you have purchased Provisioned Throughput quota, the function uses the Provisioned Throughput quota first. If requests exceed the Provisioned Throughput quota, the overflow traffic uses DSQ quota. |
model_params |
Mapping[Any, Any]
Provides additional parameters to the model. The MODEL_PARAMS value must conform to the generateContent request body format. |
output_schema |
Mapping[str, str]
A mapping value that specifies the schema of the output, in the form {field_name: data_type}. Supported data types include |
| Returns | |
|---|---|
| Type | Description |
bigframes.series.Series |
A new struct Series with the result data. The struct contains these fields: * "result": a STRING value containing the model's response to the prompt. The result is None if the request fails or is filtered by responsible AI. If you specify an output schema then result is replaced by your custom schema. * "full_response": a JSON value containing the response from the projects.locations.endpoints.generateContent call to the model. The generated text is in the text element. * "status": a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful. |
generate_bool
generate_bool(
prompt: typing.Union[
str,
bigframes.series.Series,
pandas.core.series.Series,
typing.List[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series]
],
typing.Tuple[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series], ...
],
],
*,
connection_id: str | None = None,
endpoint: str | None = None,
request_type: typing.Literal["dedicated", "shared", "unspecified"] = "unspecified",
model_params: typing.Optional[typing.Mapping[typing.Any, typing.Any]] = None
) -> bigframes.series.SeriesReturns the AI analysis based on the prompt, which can be any combination of text and unstructured data.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> df = bpd.DataFrame({
... "col_1": ["apple", "bear", "pear"],
... "col_2": ["fruit", "animal", "animal"]
... })
>>> bbq.ai.generate_bool((df["col_1"], " is a ", df["col_2"]))
0 {'result': True, 'full_response': '{"candidate...
1 {'result': True, 'full_response': '{"candidate...
2 {'result': False, 'full_response': '{"candidat...
dtype: struct<result: bool, full_response: extension<dbjson<JSONArrowType>>, status: string>[pyarrow]
>>> bbq.ai.generate_bool((df["col_1"], " is a ", df["col_2"])).struct.field("result")
0 True
1 True
2 False
Name: result, dtype: boolean
| Parameters | |
|---|---|
| Name | Description |
prompt |
str Series List[str|Series] Tuple[str|Series, ...]
A mixture of Series and string literals that specifies the prompt to send to the model. The Series can be BigFrames Series or pandas Series. |
connection_id |
str, optional
Specifies the connection to use to communicate with the model. For example, |
endpoint |
str, optional
Specifies the Vertex AI endpoint to use for the model. For example |
request_type |
Literal["dedicated", "shared", "unspecified"]
Specifies the type of inference request to send to the Gemini model. The request type determines what quota the request uses. * "dedicated": function only uses Provisioned Throughput quota. The function returns the error Provisioned throughput is not purchased or is not active if Provisioned Throughput quota isn't available. * "shared": the function only uses dynamic shared quota (DSQ), even if you have purchased Provisioned Throughput quota. * "unspecified": If you haven't purchased Provisioned Throughput quota, the function uses DSQ quota. If you have purchased Provisioned Throughput quota, the function uses the Provisioned Throughput quota first. If requests exceed the Provisioned Throughput quota, the overflow traffic uses DSQ quota. |
model_params |
Mapping[Any, Any]
Provides additional parameters to the model. The MODEL_PARAMS value must conform to the generateContent request body format. |
| Returns | |
|---|---|
| Type | Description |
bigframes.series.Series |
A new struct Series with the result data. The struct contains these fields: * "result": a BOOL value containing the model's response to the prompt. The result is None if the request fails or is filtered by responsible AI. * "full_response": a JSON value containing the response from the projects.locations.endpoints.generateContent call to the model. The generated text is in the text element. * "status": a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful. |
generate_double
generate_double(
prompt: typing.Union[
str,
bigframes.series.Series,
pandas.core.series.Series,
typing.List[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series]
],
typing.Tuple[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series], ...
],
],
*,
connection_id: str | None = None,
endpoint: str | None = None,
request_type: typing.Literal["dedicated", "shared", "unspecified"] = "unspecified",
model_params: typing.Optional[typing.Mapping[typing.Any, typing.Any]] = None
) -> bigframes.series.SeriesReturns the AI analysis based on the prompt, which can be any combination of text and unstructured data.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> animal = bpd.Series(["Kangaroo", "Rabbit", "Spider"])
>>> bbq.ai.generate_double(("How many legs does a ", animal, " have?"))
0 {'result': 2.0, 'full_response': '{"candidates...
1 {'result': 4.0, 'full_response': '{"candidates...
2 {'result': 8.0, 'full_response': '{"candidates...
dtype: struct<result: double, full_response: extension<dbjson<JSONArrowType>>, status: string>[pyarrow]
>>> bbq.ai.generate_double(("How many legs does a ", animal, " have?")).struct.field("result")
0 2.0
1 4.0
2 8.0
Name: result, dtype: Float64
| Parameters | |
|---|---|
| Name | Description |
prompt |
str Series List[str|Series] Tuple[str|Series, ...]
A mixture of Series and string literals that specifies the prompt to send to the model. The Series can be BigFrames Series or pandas Series. |
connection_id |
str, optional
Specifies the connection to use to communicate with the model. For example, |
endpoint |
str, optional
Specifies the Vertex AI endpoint to use for the model. For example |
request_type |
Literal["dedicated", "shared", "unspecified"]
Specifies the type of inference request to send to the Gemini model. The request type determines what quota the request uses. * "dedicated": function only uses Provisioned Throughput quota. The function returns the error Provisioned throughput is not purchased or is not active if Provisioned Throughput quota isn't available. * "shared": the function only uses dynamic shared quota (DSQ), even if you have purchased Provisioned Throughput quota. * "unspecified": If you haven't purchased Provisioned Throughput quota, the function uses DSQ quota. If you have purchased Provisioned Throughput quota, the function uses the Provisioned Throughput quota first. If requests exceed the Provisioned Throughput quota, the overflow traffic uses DSQ quota. |
model_params |
Mapping[Any, Any]
Provides additional parameters to the model. The MODEL_PARAMS value must conform to the generateContent request body format. |
| Returns | |
|---|---|
| Type | Description |
bigframes.series.Series |
A new struct Series with the result data. The struct contains these fields: * "result": an DOUBLE value containing the model's response to the prompt. The result is None if the request fails or is filtered by responsible AI. * "full_response": a JSON value containing the response from the projects.locations.endpoints.generateContent call to the model. The generated text is in the text element. * "status": a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful. |
generate_int
generate_int(
prompt: typing.Union[
str,
bigframes.series.Series,
pandas.core.series.Series,
typing.List[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series]
],
typing.Tuple[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series], ...
],
],
*,
connection_id: str | None = None,
endpoint: str | None = None,
request_type: typing.Literal["dedicated", "shared", "unspecified"] = "unspecified",
model_params: typing.Optional[typing.Mapping[typing.Any, typing.Any]] = None
) -> bigframes.series.SeriesReturns the AI analysis based on the prompt, which can be any combination of text and unstructured data.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> animal = bpd.Series(["Kangaroo", "Rabbit", "Spider"])
>>> bbq.ai.generate_int(("How many legs does a ", animal, " have?"))
0 {'result': 2, 'full_response': '{"candidates":...
1 {'result': 4, 'full_response': '{"candidates":...
2 {'result': 8, 'full_response': '{"candidates":...
dtype: struct<result: int64, full_response: extension<dbjson<JSONArrowType>>, status: string>[pyarrow]
>>> bbq.ai.generate_int(("How many legs does a ", animal, " have?")).struct.field("result")
0 2
1 4
2 8
Name: result, dtype: Int64
| Parameters | |
|---|---|
| Name | Description |
prompt |
str Series List[str|Series] Tuple[str|Series, ...]
A mixture of Series and string literals that specifies the prompt to send to the model. The Series can be BigFrames Series or pandas Series. |
connection_id |
str, optional
Specifies the connection to use to communicate with the model. For example, |
endpoint |
str, optional
Specifies the Vertex AI endpoint to use for the model. For example |
request_type |
Literal["dedicated", "shared", "unspecified"]
Specifies the type of inference request to send to the Gemini model. The request type determines what quota the request uses. * "dedicated": function only uses Provisioned Throughput quota. The function returns the error Provisioned throughput is not purchased or is not active if Provisioned Throughput quota isn't available. * "shared": the function only uses dynamic shared quota (DSQ), even if you have purchased Provisioned Throughput quota. * "unspecified": If you haven't purchased Provisioned Throughput quota, the function uses DSQ quota. If you have purchased Provisioned Throughput quota, the function uses the Provisioned Throughput quota first. If requests exceed the Provisioned Throughput quota, the overflow traffic uses DSQ quota. |
model_params |
Mapping[Any, Any]
Provides additional parameters to the model. The MODEL_PARAMS value must conform to the generateContent request body format. |
| Returns | |
|---|---|
| Type | Description |
bigframes.series.Series |
A new struct Series with the result data. The struct contains these fields: * "result": an integer (INT64) value containing the model's response to the prompt. The result is None if the request fails or is filtered by responsible AI. * "full_response": a JSON value containing the response from the projects.locations.endpoints.generateContent call to the model. The generated text is in the text element. * "status": a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful. |
if_
if_(
prompt: typing.Union[
str,
bigframes.series.Series,
pandas.core.series.Series,
typing.List[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series]
],
typing.Tuple[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series], ...
],
],
*,
connection_id: str | None = None
) -> bigframes.series.SeriesEvaluates the prompt to True or False. Compared to ai.generate_bool(), this function
provides optimization such that not all rows are evaluated with the LLM.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> us_state = bpd.Series(["Massachusetts", "Illinois", "Hawaii"])
>>> bbq.ai.if_((us_state, " has a city called Springfield"))
0 True
1 True
2 False
dtype: boolean
>>> us_state[bbq.ai.if_((us_state, " has a city called Springfield"))]
0 Massachusetts
1 Illinois
dtype: string
| Parameters | |
|---|---|
| Name | Description |
prompt |
str Series List[str|Series] Tuple[str|Series, ...]
A mixture of Series and string literals that specifies the prompt to send to the model. The Series can be BigFrames Series or pandas Series. |
connection_id |
str, optional
Specifies the connection to use to communicate with the model. For example, |
| Returns | |
|---|---|
| Type | Description |
bigframes.series.Series |
A new series of bools. |
score
score(
prompt: typing.Union[
str,
bigframes.series.Series,
pandas.core.series.Series,
typing.List[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series]
],
typing.Tuple[
typing.Union[str, bigframes.series.Series, pandas.core.series.Series], ...
],
],
*,
connection_id: str | None = None
) -> bigframes.series.SeriesComputes a score based on rubrics described in natural language. It will return a double value. There is no fixed range for the score returned. To get high quality results, provide a scoring rubric with examples in the prompt.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> animal = bpd.Series(["Tiger", "Rabbit", "Blue Whale"])
>>> bbq.ai.score(("Rank the relative weights of ", animal, " on the scale from 1 to 3")) # doctest: +SKIP
0 2.0
1 1.0
2 3.0
dtype: Float64
| Parameters | |
|---|---|
| Name | Description |
prompt |
str Series List[str|Series] Tuple[str|Series, ...]
A mixture of Series and string literals that specifies the prompt to send to the model. The Series can be BigFrames Series or pandas Series. |
connection_id |
str, optional
Specifies the connection to use to communicate with the model. For example, |
| Returns | |
|---|---|
| Type | Description |
bigframes.series.Series |
A new series of double (float) values. |