Wrappers for Document AI Document type.
Classes
Document
Document(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
gcs_bucket_name: typing.Optional[str] = None,
gcs_prefix: typing.Optional[str] = None,
gcs_input_uri: typing.Optional[str] = None,
)Represents a wrapped Document.
This class hides away the complexities of using the Document protobuf
response outputted by BatchProcessDocuments or ProcessDocument
methods and implements convenient methods for searching and
extracting information within the Document.
Modules Functions
_apply_text_offset
_apply_text_offset(
documentai_object: typing.Union[typing.Dict[str, typing.Dict], typing.List],
text_offset: int,
) -> NoneApplies a text offset to all text_segments in documentai_object.
| Parameters | |
|---|---|
| Name | Description |
documentai_object |
object
Required. Document AI object to apply |
text_offset |
int
Required. Text offset to apply. From |
_bigquery_column_name
_bigquery_column_name(input_string: str) -> strConverts a string into a BigQuery column name. https://cloud.google.com/bigquery/docs/schemas#column_names
| Parameter | |
|---|---|
| Name | Description |
input_string |
str
Required: The string to convert. |
_dict_to_bigquery
_dict_to_bigquery(
dic: typing.Dict[str, typing.Union[str, typing.List[str]]],
dataset_name: str,
table_name: str,
project_id: typing.Optional[str],
) -> google.cloud.bigquery.job.load.LoadJobLoads dictionary to a BigQuery table.
| Parameters | |
|---|---|
| Name | Description |
dic |
Dict[str, Union[str, List[str]]]
Required: The dictionary to insert. |
dataset_name |
str
Required. Name of the BigQuery dataset. |
table_name |
str
Required. Name of the BigQuery table. |
project_id |
Optional[str]
Optional. Project ID containing the BigQuery table. If not passed, falls back to the default inferred from the environment. |
| Returns | |
|---|---|
| Type | Description |
bigquery.job.LoadJob |
The BigQuery LoadJob for adding the dictionary. |
_entities_from_shards
_entities_from_shards(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
) -> typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]Returns a list of Entities and Properties from a list of documentai.Document shards.
| Parameter | |
|---|---|
| Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
| Returns | |
|---|---|
| Type | Description |
List[Entity] |
a list of Entities. |
_get_batch_process_metadata
_get_batch_process_metadata(
operation_name: str, timeout: typing.Optional[float] = None
) -> google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadataGet BatchProcessMetadata from a batch_process_documents() long-running operation.
| Parameters | |
|---|---|
| Name | Description |
operation_name |
str
Required. The fully qualified operation name for a |
timeout |
float
Optional. Default None. Time in seconds to wait for operation to complete. If None, will wait indefinitely. |
| Returns | |
|---|---|
| Type | Description |
documentai.BatchProcessMetadata |
Metadata from batch process. |
_get_shards
_get_shards(
gcs_bucket_name: str, gcs_prefix: str
) -> typing.List[google.cloud.documentai_v1.types.document.Document]Returns a list of documentai.Document shards from a Cloud Storage folder.
| Parameters | |
|---|---|
| Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder. Format: |
| Returns | |
|---|---|
| Type | Description |
List[google.cloud.documentai.Document] |
A list of documentai.Documents. |
_insert_into_dictionary_with_list
_insert_into_dictionary_with_list(
dic: typing.Dict[str, typing.Union[str, typing.List[str]]], key: str, value: str
) -> typing.Dict[str, typing.Union[str, typing.List[str]]]Inserts value into a dictionary that can contain lists.
| Parameters | |
|---|---|
| Name | Description |
dic |
Dict[str, Union[str, List[str]]]
Required. The dictionary to insert into. |
key |
str
Required. The key to be created or inserted into. |
value |
str
Required. The value to be inserted. |
| Returns | |
|---|---|
| Type | Description |
Dict[str, Union[str, List[str]]] |
The dictionary after adding the key-value pair. |
_pages_from_shards
_pages_from_shards(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
) -> typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]Returns a list of Pages from a list of documentai.Document shards.
| Parameter | |
|---|---|
| Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
| Returns | |
|---|---|
| Type | Description |
List[Page] |
A list of Pages. |