本页面上的部分或全部信息可能不适用于 Cloud de Confiance by S3NS。如需了解详情，请参阅与 Google Cloud 的区别。

在检索增强生成流水线中解析 PDF

本教程将指导您完成基于解析的 PDF 内容创建检索增强生成 (RAG) 流水线的流程。

由于 PDF 文件（例如财务文档）的结构复杂，并且包含文本、图表和表格，因此很难在 RAG 流水线中使用。本教程介绍如何结合使用 BigQuery ML 功能和 Document AI 的布局解析器，基于从 PDF 文件中提取的关键信息构建 RAG 流水线。

您也可以使用 Colab Enterprise 笔记本执行本教程。

目标

本教程介绍了以下任务：

创建 Cloud Storage 存储桶并上传示例 PDF 文件。
创建 Cloud 资源连接，以便您从 BigQuery 连接到 Cloud Storage 和 Vertex AI。
基于 PDF 文件创建对象表，以便在 BigQuery 中使用该 PDF 文件。
创建 Document AI 处理器，以便您可以使用该处理器来解析 PDF 文件。
创建远程模型，以便您使用 Document AI API 从 BigQuery 访问文档处理器。
将远程模型与 ML.PROCESS_DOCUMENT 函数搭配使用，以将 PDF 内容解析为块，然后将该内容写入 BigQuery 表。
从 ML.PROCESS_DOCUMENT 函数返回的 JSON 数据中提取 PDF 内容，然后将该内容写入 BigQuery 表中。
创建一个远程模型，以便您从 BigQuery 使用 Vertex AI text-embedding-004 嵌入生成模型。
将远程模型与 AI.GENERATE_EMBEDDING 函数搭配使用，以根据解析的 PDF 内容生成嵌入，然后将这些嵌入写入 BigQuery 表。嵌入是 PDF 内容的数值表示形式，可让您对 PDF 内容执行语义搜索和检索。
在嵌入上使用 VECTOR_SEARCH 函数来识别语义相似的 PDF 内容。
创建一个远程模型，以便您在 BigQuery 中使用 Gemini 文本生成模型。
通过将远程模型与 AI.GENERATE_TEXT 函数搭配使用来生成文本，并使用向量搜索结果增强提示输入和改善结果，从而执行检索增强生成 (RAG)。

费用

在本文档中，您将使用 Cloud de Confiance by S3NS的以下收费组件：

BigQuery: You incur costs for the data that you process in BigQuery.
Vertex AI: You incur costs for calls to Vertex AI models.
Document AI: You incur costs for calls to the Document AI API.
Cloud Storage: You incur costs for object storage in Cloud Storage.

如需了解详情，请参阅以下价格页面：

准备工作

In the Cloud de Confiance console, on the project selector page, select or create a Cloud de Confiance project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector
Verify that billing is enabled for your Cloud de Confiance project.
Enable the BigQuery, BigQuery Connection, Vertex AI, Document AI, and Cloud Storage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.
Enable the APIs

所需的角色

如需运行本教程，您需要拥有以下 Identity and Access Management (IAM) 角色：

创建 Cloud Storage 存储桶和对象：Storage Admin (roles/storage.storageAdmin)
创建文档处理器：Document AI Editor (roles/documentai.editor)
创建和使用 BigQuery 数据集、连接和模型：BigQuery Admin (roles/bigquery.admin)
向连接的服务账号授予权限：Project IAM Admin (roles/resourcemanager.projectIamAdmin)

这些预定义角色包含执行本文档中的任务所需的权限。如需查看所需的确切权限，请展开所需权限部分：

所需权限

创建数据集：bigquery.datasets.create
创建、委托和使用连接：bigquery.connections.*
设置默认连接：bigquery.config.*
设置服务账号权限：resourcemanager.projects.getIamPolicy 和 resourcemanager.projects.setIamPolicy
创建对象表：bigquery.tables.create 和 bigquery.tables.update
创建 Cloud Storage 存储桶和对象：storage.buckets.* 和 storage.objects.*
创建模型并运行推断：
- bigquery.jobs.create
- bigquery.models.create
- bigquery.models.getData
- bigquery.models.updateData
- bigquery.models.updateMetadata
创建文档处理器：
- documentai.processors.create
- documentai.processors.update
- documentai.processors.delete

您也可以使用自定义角色或其他预定义角色来获取这些权限。

创建数据集

创建 BigQuery 数据集以存储机器学习模型。

控制台

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery 页面
在探索器窗格中，点击您的项目名称。
点击 查看操作 > 创建数据集
在 创建数据集 页面上，执行以下操作：
- 在数据集 ID 部分，输入 bqml_tutorial。
- 在位置类型部分，选择多区域，然后选择 US (multiple regions in United States)（美国[美国的多个区域]）。
- 保持其余默认设置不变，然后点击创建数据集。

bq

如需创建新数据集，请使用带有 --location 标志的 bq mk 命令。如需查看完整的潜在参数列表，请参阅 bq mk --dataset 命令参考文档。

创建一个名为 bqml_tutorial 的数据集，并将数据位置设置为 US，说明为 BigQuery ML tutorial dataset：
```
bq --location=US mk -d \
 --description "BigQuery ML tutorial dataset." \
 bqml_tutorial
```
该命令使用的不是 --dataset 标志，而是 -d 快捷方式。如果省略 -d 和 --dataset，该命令会默认创建一个数据集。
确认已创建数据集：
```
bq ls
```

API

使用已定义的数据集资源调用 datasets.insert 方法。

{
  "datasetReference": {
     "datasetId": "bqml_tutorial"
  }
}

BigQuery DataFrame

在尝试此示例之前，请按照《BigQuery 快速入门：使用 BigQuery DataFrames》中的 BigQuery DataFrames 设置说明进行操作。如需了解详情，请参阅 BigQuery DataFrames 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置 ADC。

import google.cloud.bigquery

bqclient = google.cloud.bigquery.Client()
bqclient.create_dataset("bqml_tutorial", exists_ok=True)

创建连接

创建 Cloud 资源连接并获取连接的服务账号。在同一位置创建连接。

如果您已配置默认连接，或者您具有 BigQuery Admin 角色，则可以跳过此步骤。

从下列选项中选择一项：

控制台

前往 BigQuery 页面。

转到 BigQuery
在左侧窗格中，点击 探索器：

如果您没有看到左侧窗格，请点击 展开左侧窗格以打开该窗格。
在探索器窗格中，展开您的项目名称，然后点击连接。
在连接页面上，点击创建连接。
对于连接类型，请选择 Vertex AI 远程模型、远程函数、BigLake 和 Spanner（Cloud 资源）。
在连接 ID 字段中，输入连接的名称。
对于位置类型，为连接选择一个位置。连接应与数据集等其他资源位于同一位置。
点击创建连接。
点击转到连接。
在连接信息窗格中，复制服务账号 ID 以在后续步骤中使用。

bq

在命令行环境中，创建连接：
```
bq mk --connection --location=REGION --project_id=PROJECT_ID \
    --connection_type=CLOUD_RESOURCE CONNECTION_ID
```
--project_id 参数会替换默认项目。

请替换以下内容：
- REGION：您的连接区域
- PROJECT_ID：您的 Cloud de Confiance 项目 ID
- CONNECTION_ID：您的连接的 ID
当您创建连接资源时，BigQuery 会创建一个唯一的系统服务账号，并将其与该连接相关联。

问题排查：如果您收到以下连接错误，请更新 Google Cloud SDK：
```
Flags parsing error: flag --connection_type=CLOUD_RESOURCE: value should be one of...
```

检索并复制服务账号 ID 以在后续步骤中使用：

bq show --connection PROJECT_ID.REGION.CONNECTION_ID

输出类似于以下内容：

name                          properties
1234.REGION.CONNECTION_ID     {"serviceAccountId": "connection-1234-9u56h9@gcp-sa-bigquery-condel.s3ns-system.iam.gserviceaccount.com"}

Python

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Python 设置说明进行操作。如需了解详情，请参阅 BigQuery Python API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

在运行代码示例之前，请将 GOOGLE_CLOUD_UNIVERSE_DOMAIN 环境变量设置为 s3nsapis.fr。

import google.api_core.exceptions
from google.cloud import bigquery_connection_v1

client = bigquery_connection_v1.ConnectionServiceClient()


def create_connection(
    project_id: str,
    location: str,
    connection_id: str,
):
    """Creates a BigQuery connection to a Cloud Resource.

    Cloud Resource connection creates a service account which can then be
    granted access to other Google Cloud resources for federated queries.

    Args:
        project_id: The Google Cloud project ID.
        location: The location of the connection (for example, "us-central1").
        connection_id: The ID of the connection to create.
    """

    parent = client.common_location_path(project_id, location)

    connection = bigquery_connection_v1.Connection(
        friendly_name="Example Connection",
        description="A sample connection for a Cloud Resource.",
        cloud_resource=bigquery_connection_v1.CloudResourceProperties(),
    )

    try:
        created_connection = client.create_connection(
            parent=parent, connection_id=connection_id, connection=connection
        )
        print(f"Successfully created connection: {created_connection.name}")
        print(f"Friendly name: {created_connection.friendly_name}")
        print(
            f"Service Account: {created_connection.cloud_resource.service_account_id}"
        )

    except google.api_core.exceptions.AlreadyExists:
        print(f"Connection with ID '{connection_id}' already exists.")
        print("Please use a different connection ID.")
    except Exception as e:
        print(f"An unexpected error occurred while creating the connection: {e}")

Node.js

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Node.js 设置说明进行操作。如需了解详情，请参阅 BigQuery Node.js API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

在运行代码示例之前，请将 GOOGLE_CLOUD_UNIVERSE_DOMAIN 环境变量设置为 s3nsapis.fr。

const {ConnectionServiceClient} =
  require('@google-cloud/bigquery-connection').v1;
const {status} = require('@grpc/grpc-js');

const client = new ConnectionServiceClient();

/**
 * Creates a new BigQuery connection to a Cloud Resource.
 *
 * A Cloud Resource connection creates a service account that can be granted access
 * to other Google Cloud resources.
 *
 * @param {string} projectId The Google Cloud project ID. for example, 'example-project-id'
 * @param {string} location The location of the project to create the connection in. for example, 'us-central1'
 * @param {string} connectionId The ID of the connection to create. for example, 'example-connection-id'
 */
async function createConnection(projectId, location, connectionId) {
  const parent = client.locationPath(projectId, location);

  const connection = {
    friendlyName: 'Example Connection',
    description: 'A sample connection for a Cloud Resource',
    // The service account for this cloudResource will be created by the API.
    // Its ID will be available in the response.
    cloudResource: {},
  };

  const request = {
    parent,
    connectionId,
    connection,
  };

  try {
    const [response] = await client.createConnection(request);

    console.log(`Successfully created connection: ${response.name}`);
    console.log(`Friendly name: ${response.friendlyName}`);

    console.log(`Service Account: ${response.cloudResource.serviceAccountId}`);
  } catch (err) {
    if (err.code === status.ALREADY_EXISTS) {
      console.log(`Connection '${connectionId}' already exists.`);
    } else {
      console.error(`Error creating connection: ${err.message}`);
    }
  }
}

Terraform

使用 google_bigquery_connection 资源。

如需向 BigQuery 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为客户端库设置身份验证。

以下示例在 US 区域中创建一个名为 my_cloud_resource_connection 的 Cloud 资源连接：


# This queries the provider for project information.
data "google_project" "default" {}

# This creates a cloud resource connection in the US region named my_cloud_resource_connection.
# Note: The cloud resource nested object has only one output field - serviceAccountId.
resource "google_bigquery_connection" "default" {
  connection_id = "my_cloud_resource_connection"
  project       = data.google_project.default.project_id
  location      = "US"
  cloud_resource {}
}

如需在 Cloud de Confiance 项目中应用 Terraform 配置，请完成以下部分中的步骤。

准备 Cloud Shell

启动 Cloud Shell。
设置要应用 Terraform 配置的默认 Cloud de Confiance 项目。

您只需为每个项目运行一次以下命令，即可在任何目录中运行它。
```
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
```
如果您在 Terraform 配置文件中设置显式值，则环境变量会被替换。

准备目录

每个 Terraform 配置文件都必须有自己的目录（也称为“根模块”）。

在 Cloud Shell 中，创建一个目录，并在该目录中创建一个新文件。文件名必须具有 .tf 扩展名，例如 main.tf。在本教程中，该文件称为 main.tf。
```
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
```
如果您按照教程进行操作，可以在每个部分或步骤中复制示例代码。

将示例代码复制到新创建的 main.tf 中。

（可选）从 GitHub 中复制代码。如果端到端解决方案包含 Terraform 代码段，则建议这样做。
查看和修改要应用到您的环境的示例参数。
保存更改。
初始化 Terraform。您只需为每个目录执行一次此操作。
```
terraform init
```
（可选）如需使用最新的 Google 提供程序版本，请添加 -upgrade 选项：
```
terraform init -upgrade
```

应用更改

查看配置并验证 Terraform 将创建或更新的资源是否符合您的预期：
```
terraform plan
```
根据需要更正配置。
通过运行以下命令并在提示符处输入 yes 来应用 Terraform 配置：
```
terraform apply
```
等待 Terraform 显示“应用完成！”消息。
打开您的 Cloud de Confiance 项目以查看结果。在 Cloud de Confiance 控制台的界面中找到资源，以确保 Terraform 已创建或更新它们。

向服务账号授予访问权限

从下列选项中选择一项：

控制台

前往 IAM 和管理页面。

转到“IAM 和管理”
点击 授予访问权限。

系统随即会打开添加主账号对话框。
在新的主账号字段中，输入您之前复制的服务账号 ID。
在选择角色字段中，选择 Document AI，然后选择 Document AI Viewer。
点击添加其他角色。
在选择角色字段中，选择 Cloud Storage，然后选择 Storage Object Viewer。
点击添加其他角色。
在选择角色字段中，选择 Vertex AI，然后选择 Vertex AI User。
点击保存。

gcloud

使用 gcloud projects add-iam-policy-binding 命令：

gcloud projects add-iam-policy-binding 'PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/documentai.viewer' --condition=None
gcloud projects add-iam-policy-binding 'PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/storage.objectViewer' --condition=None
gcloud projects add-iam-policy-binding 'PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/aiplatform.user' --condition=None

替换以下内容：

PROJECT_NUMBER：您的项目编号。
MEMBER：您之前复制的服务账号 ID。

将示例 PDF 上传到 Cloud Storage

如需将示例 PDF 上传到 Cloud Storage，请按照以下步骤操作：

下载 scf23.pdf 示例 PDF，方法是前往 https://www.federalreserve.gov/publications/files/scf23.pdf，然后点击“下载”。
创建 Cloud Storage 存储桶。
将 scf23.pdf 文件上传到存储桶。

创建对象表

通过 Cloud Storage 中的 PDF 文件创建对象表：

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery
在查询编辑器中，运行以下语句：
```
CREATE OR REPLACE EXTERNAL TABLE `bqml_tutorial.pdf`
WITH CONNECTION `LOCATION.CONNECTION_ID`
OPTIONS(
  object_metadata = 'SIMPLE',
  uris = ['gs://BUCKET/scf23.pdf']);
```
替换以下内容：
- LOCATION：连接位置。
- CONNECTION_ID：BigQuery 连接的 ID。
  当您在 Cloud de Confiance 控制台中查看连接详细信息时，CONNECTION_ID 是连接 ID 中显示的完全限定连接 ID 的最后一部分的值，例如 projects/myproject/locations/connection_location/connections/myconnection。
- BUCKET：包含 scf23.pdf 文件的 Cloud Storage 存储桶。完整的 uri 选项值应类似于 ['gs://mybucket/scf23.pdf']。

创建文档处理器

在 us 多区域中根据布局解析器处理器创建文档处理器。

为文档处理器创建远程模型

创建远程模型以访问 Document AI 处理器：

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery
在查询编辑器中，运行以下语句：
```
CREATE OR REPLACE MODEL `bqml_tutorial.parser_model`
REMOTE WITH CONNECTION `LOCATION.CONNECTION_ID`
  OPTIONS(remote_service_type = 'CLOUD_AI_DOCUMENT_V1', document_processor = 'PROCESSOR_ID');
```
替换以下内容：
- LOCATION：连接位置。
- CONNECTION_ID：BigQuery 连接的 ID。
  当您在 Cloud de Confiance 控制台中查看连接详细信息时，CONNECTION_ID 是连接 ID 中显示的完全限定连接 ID 的最后一部分的值，例如 projects/myproject/locations/connection_location/connections/myconnection。
- PROCESSOR_ID：文档处理器 ID。如需查找此值，请查看处理器详细信息，然后查看基本信息部分中的 ID 行。

将 PDF 文件解析为块

使用文档处理器和 ML.PROCESS_DOCUMENT 函数将 PDF 文件解析为块，然后将该内容写入表中。ML.PROCESS_DOCUMENT 函数以 JSON 格式返回 PDF 块。

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，运行以下语句：

CREATE or REPLACE TABLE bqml_tutorial.chunked_pdf AS (
  SELECT * FROM ML.PROCESS_DOCUMENT(
  MODEL bqml_tutorial.parser_model,
  TABLE bqml_tutorial.pdf,
  PROCESS_OPTIONS => (JSON '{"layout_config": {"chunking_config": {"chunk_size": 250}}}')
  )
);

将 PDF 块数据解析为单独的列

从 ML.PROCESS_DOCUMENT 函数返回的 JSON 数据中提取 PDF 内容和元数据信息，然后将该内容写入表中：

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，运行以下语句以解析 PDF 内容：

CREATE OR REPLACE TABLE bqml_tutorial.parsed_pdf AS (
SELECT
  uri,
  JSON_EXTRACT_SCALAR(json , '$.chunkId') AS id,
  JSON_EXTRACT_SCALAR(json , '$.content') AS content,
  JSON_EXTRACT_SCALAR(json , '$.pageFooters[0].text') AS page_footers_text,
  JSON_EXTRACT_SCALAR(json , '$.pageSpan.pageStart') AS page_span_start,
  JSON_EXTRACT_SCALAR(json , '$.pageSpan.pageEnd') AS page_span_end
FROM bqml_tutorial.chunked_pdf, UNNEST(JSON_EXTRACT_ARRAY(ml_process_document_result.chunkedDocument.chunks, '$')) json
);

在查询编辑器中，运行以下语句以查看已解析的 PDF 内容的子集：

SELECT *
FROM `bqml_tutorial.parsed_pdf`
ORDER BY id
LIMIT 5;

输出类似于以下内容：

+-----------------------------------+------+------------------------------------------------------------------------------------------------------+-------------------+-----------------+---------------+
|                uri                |  id  |                                                 content                                              | page_footers_text | page_span_start | page_span_end |
+-----------------------------------+------+------------------------------------------------------------------------------------------------------+-------------------+-----------------+---------------+
| gs://mybucket/scf23.pdf           | c1   | •BOARD OF OF FEDERAL GOVERN NOR RESERVE SYSTEM RESEARCH & ANALYSIS                                   | NULL              | 1               | 1             |
| gs://mybucket/scf23.pdf           | c10  | • In 2022, 20 percent of all families, 14 percent of families in the bottom half of the usual ...    | NULL              | 8               | 9             |
| gs://mybucket/scf23.pdf           | c100 | The SCF asks multiple questions intended to capture whether families are credit constrained, ...     | NULL              | 48              | 48            |
| gs://mybucket/scf23.pdf           | c101 | Bankruptcy behavior over the past five years is based on a series of retrospective questions ...     | NULL              | 48              | 48            |
| gs://mybucket/scf23.pdf           | c102 | # Percentiles of the Distributions of Income and Net Worth                                           | NULL              | 48              | 49            |
+-----------------------------------+------+------------------------------------------------------------------------------------------------------+-------------------+-----------------+---------------+

创建用于嵌入生成的远程模型

创建表示托管式 Vertex AI 文本嵌入生成模型的远程模型：

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery
在查询编辑器中，运行以下语句：
```
CREATE OR REPLACE MODEL `bqml_tutorial.embedding_model`
  REMOTE WITH CONNECTION `LOCATION.CONNECTION_ID`
  OPTIONS (ENDPOINT = 'text-embedding-005');
```
替换以下内容：
- LOCATION：连接位置。
- CONNECTION_ID：BigQuery 连接的 ID。
  当您在 Cloud de Confiance 控制台中查看连接详细信息时，CONNECTION_ID 是连接 ID 中显示的完全限定连接 ID 的最后一部分的值，例如 projects/myproject/locations/connection_location/connections/myconnection。

生成嵌入

为解析的 PDF 内容生成嵌入，然后将其写入表中：

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，运行以下语句：

CREATE OR REPLACE TABLE `bqml_tutorial.embeddings` AS
SELECT * FROM AI.GENERATE_EMBEDDING(
  MODEL `bqml_tutorial.embedding_model`,
  TABLE `bqml_tutorial.parsed_pdf`
);

运行向量搜索

针对解析的 PDF 内容运行向量搜索。

以下查询会获取文本输入，使用 AI.GENERATE_EMBEDDING 函数为该输入创建嵌入，然后使用 VECTOR_SEARCH 函数将输入嵌入与最相似的 PDF 内容嵌入进行匹配。结果是与输入在语义上最相似的前 10 个 PDF 块。

前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，运行以下 SQL 语句：

SELECT query.query, base.id AS pdf_chunk_id, base.content, distance
FROM
  VECTOR_SEARCH( TABLE `bqml_tutorial.embeddings`,
    'embedding',
    (
    SELECT
      embedding,
      content AS query
    FROM
      AI.GENERATE_EMBEDDING( MODEL `bqml_tutorial.embedding_model`,
        ( SELECT 'Did the typical family net worth increase? If so, by how much?' AS content)
      )
    ),
    top_k => 10,
    OPTIONS => '{"fraction_lists_to_search": 0.01}')
ORDER BY distance DESC;

输出类似于以下内容：

+-------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------+---------------------+
|                query                            | pdf_chunk_id |                                                 content                                              | distance            |
+-------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------+---------------------+
| Did the typical family net worth increase? ,... | c9           | ## Assets                                                                                            | 0.31113668174119469 |
|                                                 |              |                                                                                                      |                     |
|                                                 |              | The homeownership rate increased slightly between 2019 and 2022, to 66.1 percent. For ...            |                     |
+-------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------+---------------------+
| Did the typical family net worth increase? ,... | c50          | # Box 3. Net Housing Wealth and Housing Affordability                                                | 0.30973592073929113 |
|                                                 |              |                                                                                                      |                     |
|                                                 |              | For families that own their primary residence ...                                                    |                     |
+-------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------+---------------------+
| Did the typical family net worth increase? ,... | c50          | 3 In the 2019 SCF, a small portion of the data collection overlapped with early months of            | 0.29270064592817646 |
|                                                 |              | the COVID- ...                                                                                       |                     |
+-------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------+---------------------+

创建用于文本生成的远程模型

创建表示托管式 Vertex AI 文本生成模型的远程模型：

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery
在查询编辑器中，运行以下语句：
```
CREATE OR REPLACE MODEL `bqml_tutorial.text_model`
  REMOTE WITH CONNECTION `LOCATION.CONNECTION_ID`
  OPTIONS (ENDPOINT = 'gemini-2.0-flash-001');
```
替换以下内容：
- LOCATION：连接位置。
- CONNECTION_ID：BigQuery 连接的 ID。
  当您在 Cloud de Confiance 控制台中查看连接详细信息时，CONNECTION_ID 是连接 ID 中显示的完全限定连接 ID 的最后一部分的值，例如 projects/myproject/locations/connection_location/connections/myconnection。

生成由向量搜索结果增强的文本

对嵌入执行向量搜索，以识别语义上相似的 PDF 内容，然后将 AI.GENERATE_TEXT 函数与向量搜索结果搭配使用，以增强提示输入并改进文本生成结果。在此示例中，查询使用 PDF 块中的信息来回答有关过去十年家庭净资产变化的问题。

在 Cloud de Confiance 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，运行以下语句：

SELECT
  result AS generated
  FROM
  AI.GENERATE_TEXT( MODEL `bqml_tutorial.text_model`,
    (
    SELECT
    CONCAT( 'Did the typical family net worth change? How does this compare the SCF survey a decade earlier? Be concise and use the following context:',
    STRING_AGG(FORMAT("context: %s and reference: %s", base.content, base.uri), ',\n')) AS prompt,
    FROM
      VECTOR_SEARCH( TABLE
        `bqml_tutorial.embeddings`,
        'embedding',
        (
        SELECT
          embedding,
          content AS query
        FROM
          AI.GENERATE_EMBEDDING( MODEL `bqml_tutorial.embedding_model`,
            (
            SELECT
              'Did the typical family net worth change? How does this compare the SCF survey a decade earlier?' AS content
            )
          )
        ),
        top_k => 10,
        OPTIONS => '{"fraction_lists_to_search": 0.01}')
      ),
      STRUCT(512 AS max_output_tokens)
  );

输出类似于以下内容：

+-------------------------------------------------------------------------------+
|               generated                                                       |
+-------------------------------------------------------------------------------+
| Between the 2019 and 2022 Survey of Consumer Finances (SCF), real median      |
| family net worth surged 37 percent to $192,900, and real mean net worth       |
| increased 23 percent to $1,063,700.  This represents the largest three-year   |
| increase in median net worth in the history of the modern SCF, exceeding the  |
| next largest by more than double.  In contrast, between 2010 and 2013, real   |
| median net worth decreased 2 percent, and real mean net worth remained        |
| unchanged.                                                                    |
+-------------------------------------------------------------------------------+

清理

In the Cloud de Confiance console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.