vLLM

flyteplugins-vllm

Flyte SDK (v2)Model Servingvllminferencellmservinggpu

Serve large language models using vLLM with Flyte Apps.

Install

pip install flyteplugins-vllm

Quick Start(example, may need adjustment)

pip install flyteplugins-vllm

from flytekit import task, workflow
from flyteplugins.vllm import DEFAULT_VLLM_IMAGE, VLLMAppEnvironment

config = DEFAULT_VLLM_IMAGE(...)

@task
def my_task() -> None:
    ...

@workflow
def my_workflow() -> None:
    my_task()

Available Imports (2)

typeDEFAULT_VLLM_IMAGE

Configuration type for vLLM.

from flyteplugins.vllm import DEFAULT_VLLM_IMAGE

configVLLMAppEnvironment

App environment backed by vLLM for serving large language models.

extends dataclass — configuration or data structure for plugin setup

from flyteplugins.vllm import VLLMAppEnvironment

Related Plugins

SGLang

Model Serving

Serve large language models using SGLang with Flyte Apps.

Inference

Model Serving

Serve models natively in Flyte tasks using inference providers like NIM, Ollama, and others.

Dgxc-lepton

Model Serving

A professional Flytekit plugin that enables seamless deployment and management of AI inference endpoints using Lepton AI infrastructure within Flyte workflows.