A Flyte SDK (v2) version of this plugin is available as flyteplugins-spark.
Spark
flytekitplugins-spark
Flyte can execute Spark jobs natively on a Kubernetes Cluster, which manages a virtual cluster’s lifecycle, spin-up, and tear down. It leverages the open-sourced Spark On K8s Operator and can be enabled without signing up for any service. This is like running a transient spark cluster — a type of cluster spun up for a specific Spark job and torn down after completion.
pip install flytekitplugins-sparkQuick Start(example, may need adjustment)
See full examplespip install flytekitplugins-spark
from flytekit import task, workflow
from flytekitplugins.spark import DatabricksConnector, GenericSparkConf, PySparkPipelineModelTransformer, SparkDataFrameSchemaReader
@task
def my_task() -> None:
new_spark_session(...)
@workflow
def my_workflow() -> None:
my_task()Available Imports (12)
Add DatabricksConnectorV2 to support running the k8s spark and databricks spark together in the same workflow.
from flytekitplugins.spark import DatabricksConnector
Configuration type for Spark.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.spark import GenericSparkConf
Configuration type for Spark.
extends TypeTransformer — converts python types to/from flyte-native types
from flytekitplugins.spark import PySparkPipelineModelTransformer
Implements how SparkDataFrame should be read using the ``open`` method of FlyteSchema.
from flytekitplugins.spark import SparkDataFrameSchemaReader
Implements how SparkDataFrame should be written to using ``open`` method of FlyteSchema.
from flytekitplugins.spark import SparkDataFrameSchemaWriter
Transforms Spark DataFrame's to and from a Schema (typed/untyped).
extends TypeTransformer — converts python types to/from flyte-native types
from flytekitplugins.spark import SparkDataFrameTransformer
Configuration type for Spark.
from flytekitplugins.spark import ParquetToSparkDecodingHandler
Configuration type for Spark.
from flytekitplugins.spark import SparkToParquetEncodingHandler
Add DatabricksConnectorV2 to support running the k8s spark and databricks spark together in the same workflow.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.spark import Databricks
Use this to configure a Databricks task.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.spark import DatabricksV2
Implements how SparkDataFrame should be read using the ``open`` method of FlyteSchema.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.spark import Spark
Task for Spark.
from flytekitplugins.spark import new_spark_session
Dependencies
Related Plugins
Spark
Union can execute Spark jobs natively on a Kubernetes Cluster, which manages a virtual cluster’s lifecycle, spin-up, and tear down. It leverages the open-sourced Spark On K8s Operator and can be enabled without signing up for any service. This is like running a transient spark cluster — a type of cluster spun up for a specific Spark job and torn down after completion.
Dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle
Dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle
Kubeflow MPI
This plugin uses the Kubeflow MPI Operator and provides an extremely simplified interface for executing distributed training.