本頁面中的部分或全部資訊可能不適用於 Trusted Cloud by S3NS。
選擇文件處理函式
本文件比較了 BigQuery ML 提供的文件處理函式,分別是 ML.GENERATE_TEXT
和 ML.PROCESS_DOCUMENT
。
您可以參考本文件中的資訊,在函式功能重疊時決定要使用哪個函式。
這些函式的差異如下:
ML.GENERATE_TEXT
是執行自然語言處理 (NLP) 工作時的理想選擇,因為部分內容會位於文件中。這項功能具備下列優點:
- 降低費用
- 支援更多語言
- 更快的處理量
- 模型調整功能
- 多模態模型的可用性
如需搭配此方法最合適的文件處理工作範例,請參閱「探索 Gemini API 的文件處理功能」。
ML.PROCESS_DOCUMENT
是執行需要文件剖析和預先定義結構化回應的文件處理工作時的理想選擇。
支援的模型
支援的型號如下:
ML.GENERATE_TEXT
:您可以使用 Vertex AI Gemini 模型的子集生成文字。如要進一步瞭解支援的模型,請參閱 ML.GENERATE_TEXT
語法。
ML.PROCESS_DOCUMENT
:您使用 Document AI API 的預設模型。使用 Document AI API 可讓您存取許多不同的文件處理器,例如發票剖析器、版面配置剖析器和表單剖析器。您可以使用這些文件處理器,處理具有多種不同結構的 PDF 檔案。
支援的工作
支援的任務如下:
ML.GENERATE_TEXT
:您可以執行任何輸入內容為文件的 NLP 任務。舉例來說,如果您有一份公司的財務文件,可以提供 What is
the quarterly revenue for each division?
等提示來擷取文件資訊。
ML.PROCESS_DOCUMENT
:您可以針對不同文件類型 (例如發票、稅務表單和財務報表) 執行專門的文件處理作業。您也可以執行文件分割作業。如要進一步瞭解如何使用 ML.PROCESS_DOCUMENT
函式執行此項工作,請參閱「在檢索增強生成管道中剖析 PDF 檔案」。
定價
定價如下:
監督式調整
監督式調整支援功能如下:
ML.GENERATE_TEXT
:部分模型支援監督式調整。
ML.PROCESS_DOCUMENT
:不支援監督式調整。
每分鐘查詢次數 (QPM) 上限
QPM 限制如下:
ML.GENERATE_TEXT
:gemini-1.5-pro
模型的預設 us-central1
區域為 60 QPM,gemini-1.5-flash
模型的預設 us-central1
區域為 200 QPM。詳情請參閱「Vertex AI 生成式 AI 配額」。
ML.PROCESS_DOCUMENT
:每個處理器類型 120 個查詢/分鐘,每個專案的整體上限為 600 個查詢/分鐘。詳情請參閱配額清單。
如要提高配額,請參閱「要求調整配額」一文。
詞元數量上限
符記限制如下:
ML.GENERATE_TEXT
:700 個輸入符記和 8196 個輸出符記。
ML.PROCESS_DOCUMENT
:沒有符記限制。不過,這項功能的頁面限制會因您使用的處理器而異。詳情請參閱「限制」一節。
支援的語言
支援的語言如下:
ML.GENERATE_TEXT
:支援與 Gemini 相同的語言。
ML.PROCESS_DOCUMENT
:語言支援功能取決於文件處理工具類型,大多數工具只支援英文。詳情請參閱處理器清單。
適用地區
適用地區如下:
ML.GENERATE_TEXT
:適用於所有 Vertex AI 生成式 AI 區域。
ML.PROCESS_DOCUMENT
:適用於所有處理器的 EU
和 US
多地區。某些處理器也適用於特定單一區域。詳情請參閱「單一區域與多區域支援」。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-08-17 (世界標準時間)。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-08-17 (世界標準時間)。"],[[["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e is suitable for natural language processing tasks within documents, offering benefits like lower costs, broader language support, faster processing, model tuning, and multimodal model options.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e excels in document processing tasks requiring structured responses and document parsing, and also supports working with different PDF file structures.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e uses a subset of Vertex AI Gemini models and supports a wide array of natural language processing tasks, while \u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e utilizes the Document AI API, with specialized document processing for tasks like parsing invoices or tax forms.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e supports supervised tuning for certain models, while \u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e does not have supervised tuning support.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e has higher language support and lower token limits, whereas \u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e depends on the document processor for language support and has no token limit, only page limits.\u003c/p\u003e\n"]]],[],null,["# Choose a document processing function\n=====================================\n\nThis document provides a comparison of the document processing functions\navailable in BigQuery ML, which are\n[`ML.GENERATE_TEXT`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-text)\nand\n[`ML.PROCESS_DOCUMENT`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-process-document).\n\nYou can use the information in this document to help you decide which function\nto use in cases where the functions have overlapping capabilities.\n\nAt a high level, the difference between these functions is as follows:\n\n- `ML.GENERATE_TEXT` is a good choice for performing natural\n language processing (NLP) tasks where some of the content resides in\n documents. This function offers the following benefits:\n\n - Lower costs\n - More language support\n - Faster throughput\n - Model tuning capability\n - Availability of multimodal models\n\n For examples of document processing tasks that work best with this\n approach, see\n [Explore document processing capabilities with the Gemini API](https://ai.google.dev/gemini-api/docs/document-processing).\n- `ML.PROCESS_DOCUMENT` is a good choice for performing document processing\n tasks that require document parsing and a predefined, structured response.\n\nSupported models\n----------------\n\nSupported models are as follows:\n\n- `ML.GENERATE_TEXT`: you can use a subset of the Vertex AI [Gemini](/vertex-ai/generative-ai/docs/learn/models#gemini-models) models to generate text. For more information on supported models, see the [`ML.GENERATE_TEXT` syntax](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-text#syntax).\n- `ML.PROCESS_DOCUMENT`: you use the default model of the [Document AI API](/document-ai). Using the Document AI API gives you access to many different document processors, such as the invoice parser, layout parser, and form parser. You can use these document processors to work with PDF files with many different structures.\n\nSupported tasks\n---------------\n\nSupported tasks are as follows:\n\n- `ML.GENERATE_TEXT`: you can perform any NLP task where the input is a document. For example, given a financial document for a company, you can retrieve document information by providing a prompt such as `What is\n the quarterly revenue for each division?`.\n- `ML.PROCESS_DOCUMENT`: you can perform specialized document processing for different document types, such as invoices, tax forms, and financial statements. You can also perform document chunking. For more information, on how to use the `ML.PROCESS_DOCUMENT` function fo this task, see [Parse PDFs in a retrieval-augmented generation pipeline](/bigquery/docs/rag-pipeline-pdf).\n\nPricing\n-------\n\nPricing is as follows:\n\n- `ML.GENERATE_TEXT`: For pricing of the Vertex AI models that you use with this function, see [Vertex AI pricing](/vertex-ai/generative-ai/pricing). Supervised tuning of supported models is charged at dollars per node hour. For more information, see [Vertex AI custom training pricing](/vertex-ai/pricing#custom-trained_models).\n- `ML.PROCESS_DOCUMENT`: For pricing of the Cloud AI service that you use with this function, see [Document AI API pricing](/document-ai/pricing).\n\nSupervised tuning\n-----------------\n\nSupervised tuning support is as follows:\n\n- `ML.GENERATE_TEXT`: [supervised tuning](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model#supervised_tuning) is supported for some models.\n- `ML.PROCESS_DOCUMENT`: supervised tuning isn't supported.\n\nQueries per minute (QPM) limit\n------------------------------\n\nQPM limits are as follows:\n\n- `ML.GENERATE_TEXT`: 60 QPM in the default `us-central1` region for `gemini-1.5-pro` models, and 200 QPM in the default `us-central1` region for `gemini-1.5-flash` models. For more information, see [Generative AI on Vertex AI quotas](/vertex-ai/generative-ai/docs/quotas).\n- `ML.PROCESS_DOCUMENT`: 120 QPM per processor type, with an overall limit of 600 QPM per project. For more information, see [Quotas list](/document-ai/quotas#quotas_list).\n\nTo increase your quota, see\n[Request a quota adjustment](/docs/quotas/help/request_increase).\n\nToken limit\n-----------\n\nToken limits are as follows:\n\n- `ML.GENERATE_TEXT`: 700 input tokens, and 8196 output tokens.\n- `ML.PROCESS_DOCUMENT`: No token limit. However, this function does have different page limits depending on the processor you use. For more information, see [Limits](/document-ai/limits).\n\nSupported languages\n-------------------\n\nSupported languages are as follows:\n\n- `ML.GENERATE_TEXT`: supports the same languages as [Gemini](/vertex-ai/generative-ai/docs/learn/models#languages-gemini).\n- `ML.PROCESS_DOCUMENT`: language support depends on the document processor type; most only support English. For more information, see [Processor list](/document-ai/docs/processors-list).\n\nRegion availability\n-------------------\n\nRegion availability is as follows:\n\n- `ML.GENERATE_TEXT`: available in all Generative AI for Vertex AI [regions](/vertex-ai/generative-ai/docs/learn/locations#available-regions).\n- `ML.PROCESS_DOCUMENT`: available in the `EU` and `US` [multi-regions](/bigquery/docs/locations#multi-regions) for all processors. Some processors are also available in certain single regions. For more information, see [Regional and multi-regional support](/document-ai/docs/regions)."]]