ML Training
Distributed training frameworks and compute engines · 10 plugins
Dask
Flytekitflytekitplugins-dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle
Dask
v2Flyte SDK (v2)flyteplugins-dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle
Kubeflow MPI
Flytekitflytekitplugins-kfmpi
This plugin uses the Kubeflow MPI Operator and provides an extremely simplified interface for executing distributed training.
Kubeflow PyTorch
Flytekitflytekitplugins-kfpytorch
This plugin uses the Kubeflow PyTorch Operator and provides an extremely simplified interface for executing distributed training using various PyTorch backends.
Kubeflow TensorFlow
Flytekitflytekitplugins-kftensorflow
This plugin uses the Kubeflow TensorFlow Operator and provides an extremely simplified interface for executing distributed training using various TensorFlow backends.
pytorch
v2Flyte SDK (v2)flyteplugins-pytorch
Union can execute PyTorch distributed training jobs natively on a Kubernetes Cluster, which manages the lifecycle of worker pods, rendezvous coordination, spin-up, and tear down. It leverages the open-sourced TorchElastic (torch.distributed.elastic) launcher and the Kubeflow PyTorch Operator, enabling fault-tolerant and elastic training across multiple nodes.
Ray
Flytekitflytekitplugins-ray
Flyte backend can be connected with Ray. Once enabled, it allows you to run flyte task on Ray cluster
Ray
v2Flyte SDK (v2)flyteplugins-ray
Union can execute Ray jobs natively on a Kubernetes Cluster,
Spark
Flytekitflytekitplugins-spark
Flyte can execute Spark jobs natively on a Kubernetes Cluster, which manages a virtual cluster’s lifecycle, spin-up, and tear down. It leverages the open-sourced Spark On K8s Operator and can be enabled without signing up for any service. This is like running a transient spark cluster — a type of cluster spun up for a specific Spark job and torn down after completion.
Spark
v2Flyte SDK (v2)flyteplugins-spark
Union can execute Spark jobs natively on a Kubernetes Cluster, which manages a virtual cluster’s lifecycle, spin-up, and tear down. It leverages the open-sourced Spark On K8s Operator and can be enabled without signing up for any service. This is like running a transient spark cluster — a type of cluster spun up for a specific Spark job and torn down after completion.