Kubeflow PyTorch
flytekitplugins-kfpytorch
This plugin uses the Kubeflow PyTorch Operator and provides an extremely simplified interface for executing distributed training using various PyTorch backends.
pip install flytekitplugins-kfpytorchQuick Start(example, may need adjustment)
See full examplespip install flytekitplugins-kfpytorch
from flytekit import task, workflow
from flytekitplugins.kfpytorch import CleanPodPolicy, Elastic, Master, PyTorch
config = CleanPodPolicy(...)
@task
def my_task() -> None:
...
@workflow
def my_workflow() -> None:
my_task()Available Imports (7)
CleanPodPolicy describes how to deal with pods when the job is finished.
extends Enum — enumeration of predefined options
from flytekitplugins.kfpytorch import CleanPodPolicy
Configuration for [`torch elastic training`](https://pytorch.org/docs/stable/elastic/run.html).
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.kfpytorch import Elastic
Configuration for master replica group.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.kfpytorch import Master
Configuration for an executable [`PyTorch Job`](https://github.com/kubeflow/pytorch-operator).
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.kfpytorch import PyTorch
RestartPolicy describes how the replicas should be restarted.
extends Enum — enumeration of predefined options
from flytekitplugins.kfpytorch import RestartPolicy
RunPolicy describes some policy to apply to the execution of a kubeflow job.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.kfpytorch import RunPolicy
Configuration type for Kubeflow PyTorch.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.kfpytorch import Worker
Dependencies
Related Plugins
Kubeflow TensorFlow
This plugin uses the Kubeflow TensorFlow Operator and provides an extremely simplified interface for executing distributed training using various TensorFlow backends.
pytorch
Union can execute PyTorch distributed training jobs natively on a Kubernetes Cluster, which manages the lifecycle of worker pods, rendezvous coordination, spin-up, and tear down. It leverages the open-sourced TorchElastic (torch.distributed.elastic) launcher and the Kubeflow PyTorch Operator, enabling fault-tolerant and elastic training across multiple nodes.
Kubeflow MPI
This plugin uses the Kubeflow MPI Operator and provides an extremely simplified interface for executing distributed training.
Dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle