pytorch
flyteplugins-pytorch
Union can execute PyTorch distributed training jobs natively on a Kubernetes Cluster, which manages the lifecycle of worker pods, rendezvous coordination, spin-up, and tear down. It leverages the open-sourced TorchElastic (torch.distributed.elastic) launcher and the Kubeflow PyTorch Operator, enabling fault-tolerant and elastic training across multiple nodes.
pip install flyteplugins-pytorchQuick Start(example, may need adjustment)
See full examplespip install flyteplugins-pytorch
from flytekit import task, workflow
from flyteplugins.pytorch import Elastic
config = Elastic(...)
@task
def my_task() -> None:
...
@workflow
def my_workflow() -> None:
my_task()Available Imports (1)
Elastic defines the configuration for running a PyTorch elastic job using torch.distributed.
extends dataclass — configuration or data structure for plugin setup
from flyteplugins.pytorch import Elastic
Dependencies
Related Plugins
Kubeflow PyTorch
This plugin uses the Kubeflow PyTorch Operator and provides an extremely simplified interface for executing distributed training using various PyTorch backends.
Dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle
Dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle
Kubeflow MPI
This plugin uses the Kubeflow MPI Operator and provides an extremely simplified interface for executing distributed training.