PySparkBatch(mapping=None, *, ignore_unknown_fields=False, **kwargs)A configuration for running an Apache
PySpark <https://spark.apache.org/docs/latest/api/python/getting_started/quickstart.html>__
batch workload.
Attributes |
|
|---|---|
| Name | Description |
main_python_file_uri |
str
Required. The HCFS URI of the main Python file to use as the Spark driver. Must be a .py file. |
args |
MutableSequence[str]
Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an
incorrect batch submission.
|
python_file_uris |
MutableSequence[str]
Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg,
and .zip.
|
jar_file_uris |
MutableSequence[str]
Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks. |
file_uris |
MutableSequence[str]
Optional. HCFS URIs of files to be placed in the working directory of each executor. |
archive_uris |
MutableSequence[str]
Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
|