QueryData 概览

上下文集是特定于数据库的信息集合,可让 QueryData 等工具生成高准确度的查询。上下文集包括模板、方面和值搜索,可帮助 QueryData 了解您的数据库架构和应用的业务逻辑。

支持以下数据库:

  • AlloyDB for PostgreSQL
  • Cloud SQL for MySQL
  • Cloud SQL for PostgreSQL
  • Spanner

什么是上下文集

为了构建有效的智能体应用,QueryData 等工具必须了解您的数据组织和业务逻辑。您以上下文集的形式提供此信息。

您可以在包含每种上下文类型的 JSON 对象的文件中定义上下文。您可以在 Gemini CLI 的帮助下创建这些上下文文件。然后,您将上下文的描述文件上传到在 Cloud de Confiance by S3NS控制台中创建的上下文集中。此流程可让 QueryData 等工具了解数据库的特定架构和应用的业务逻辑。

上下文的描述文件类似于以下内容:

代理上下文的描述文件类似于以下内容:

{
  "templates": [
    {
      "nl_query": "Count prague loan accounts",
      "sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = 'Prague'",
      "intent": "How many accounts associated with loans are located in the Prague region?",
      "manifest": "How many accounts associated with loans are located in a given city?",
      "parameterized": {
        "parameterized_intent": "How many accounts associated with loans are located in $1",
        "parameterized_sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = $1"
      }
    }
  ],
  "facets": [
    {
      "sql_snippet": "T.\"A11\" BETWEEN 6000 AND 10000",
      "intent": "Average salary between 6000 and 10000",
      "manifest": "Average salary between a given number and a given number",
      "parameterized": {
         "parameterized_intent": "Average salary between $1 and $2",
         "parameterized_sql_snippet": "T.\"A11\" BETWEEN $1 AND $2"
      }
    }
  ]
}

上下文集文件格式

上下文集文件包含一组精心挑选的 JSON 格式的模板和分面,可指导 QueryData 等工具将自然语言问题转换为针对特定数据库的查询。定义上下文可确保针对常见查询模式生成高准确度的 SQL。

确保上下文集准确无误,并全面涵盖预期应用查询,以最大限度地提高准确性。

可以在 us-central1us-east1europe-west4asia-southeast1 区域中创建上下文集。

查询模板

查询模板是一组精选的代表性自然语言问题,其中包含相应的 SQL 查询。它们还包含说明,以便为自然语言到 SQL 生成提供声明性依据。

查询模板对象类似于以下内容:

{
  "templates": [
    {
      "nl_query": "Count prague loan accounts",
      "sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = 'Prague'",
      "intent": "How many accounts associated with loans are located in the Prague region?",
      "manifest": "How many accounts associated with loans are located in a given city?",
      "parameterized": {
        "parameterized_intent": "How many accounts associated with loans are located in $1",
        "parameterized_sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = $1"
      }
    }
  ]
},
...

查询模板 JSON 对象的主要组成部分如下:

  • nl_query:QueryData 等工具可处理的自然语言查询示例。
  • sql:自然语言查询对应的 SQL 查询。
  • intent:自然语言查询的目标或用途。如果未设置,此值默认为自然语言查询。
  • manifest:一种泛化的自动生成意图形式。
  • parameterized_intent:一种自动生成的模板化意图,实体值已替换为参数。
  • parameterized_sql:与参数化 intent 对应的 SQL 查询的模板化自动生成形式。

查询分面

查询方面是一组精选的代表性自然语言条件,其中包含相应的 SQL 谓词。分面可管理过滤条件和条件,从而使查询模板能够执行分面搜索。

查询分面对象类似于以下内容:

{
...
"facets": [
    {
      "sql_snippet": "T.\"A11\" BETWEEN 6000 AND 10000",
      "intent": "Average salary between 6000 and 10000",
      "manifest": "Average salary between a given number and a given number",
      "parameterized": {
         "parameterized_intent": "Average salary between $1 and $2",
         "parameterized_sql_snippet": "T.\"A11\" BETWEEN $1 AND $2"
      }
    }
  ]
}

分面 JSON 对象的主要组成部分如下所示:

  • sql_snippet:SQL 代码段。
  • intent:SQL 谓词的说明。
  • manifest:一种泛化的自动生成意图形式。
  • parameterized_intent:一种自动生成的模板化意图,实体值已替换为参数。
  • parameterized_sql_snippet:与参数化 intent 对应的 sql_snippet 的模板化自动生成形式。

值搜索查询

[值搜索查询](/sql/docs/mysql/build-context-gemini-cli#generate-value-searches)是开发者定义的查询,使用匹配函数在数据库中查找值及其上下文。值关联功能会使用这些查询的结果来确定哪些表和列包含匹配的值,了解值的概念类型,并更正拼写错误。

QueryData API 使用值关联来更准确地将自然语言转换为 SQL。通过使用值搜索查询,API 可以根据数据库值更正拼写错误并解析值类型,从而提高转化准确性。

值关联可提高自然语言到 SQL 的转换准确性。例如,如果用户问“Are there any flights out of Heathrow?”,数据库可能会将机场名称存储为“London Heathrow”。如果不进行值关联,生成的 SQL 可能会按 WHERE name = 'Heathrow' 进行过滤,从而不返回任何结果。值搜索查询可引导代理将“希思罗”映射到正确的数据库值“伦敦希思罗”及其架构位置 (airports.name),从而确保生成的 SQL 准确无误。

以下是一个值搜索查询示例:

{
  ...
  "value_searches": [
    {
      "query": "SELECT $value as value, 'airports.iata' as columns, 'Airport IATA Code' as concept_type, 0 as distance, '{}'::text as context FROM \"airports\" T WHERE T.\"iata\" = $value",
      "concept_type": "Airport IATA Code",
      "description": "Exact match (Standard SQL) for 3-letter airport codes"
    },
    {
      "query": "WITH TrigramMetrics AS ( SELECT T.\"name\" AS original_value, (T.\"name\" <-> $value::text) AS normalized_dist FROM \"airports\" T WHERE T.\"name\" % $value::text ) SELECT original_value AS value, 'airports.name' AS columns, 'Airport Name' AS concept_type, normalized_dist AS distance, '{}'::text AS context FROM TrigramMetrics",
      "concept_type": "Airport Name",
      "description": "Fuzzy match using standard trigram for partial airport names"
    },
    {
      "query": "WITH SemanticMetrics AS ( SELECT T.\"city\" AS original_value, ( (google_ml.embedding('gemini-embedding-001', $value)::vector <=> google_ml.embedding('gemini-embedding-001', T.\"city\")::vector) / 2.0 ) AS normalized_dist FROM \"airports\" T WHERE T.\"city\" IS NOT NULL ) SELECT original_value AS value, 'airports.city' AS columns, 'Airport City' AS concept_type, normalized_dist AS distance, '{}'::text AS context FROM SemanticMetrics",
      "concept_type": "Airport City",
      "description": "Semantic search on string values for airport city names"
    }
  ]
}

值搜索 JSON 对象的主要组成部分如下:

  • query:一种参数化 SQL 语句,用于定义将值短语与数据库中某个表的一列中存储的值进行匹配的逻辑。结果集通常会投影匹配的值、架构位置、概念类型和归一化距离指标(介于 0 和 1 之间)。
  • concept_type:分配给值的语义标签,例如 districtloan_status。此标签有助于进行值关联,并最终帮助 QueryData API 等工具了解值短语在架构中的作用。它还有助于生成以值所对应的概念为目标的 SQL 语句,以及值短语所在的表和列。
  • description:搜索逻辑的说明。

限制

上下文集具有以下限制:

  • Cloud SQL for SQL Server 不受支持。
  • 数据库的上下文集仅支持模板、分面和值搜索。
  • 数据库的上下文集仅由 Conversational Analytics API 中的 QueryData 端点使用。

后续步骤