VLLMAppEnvironment
Package: flyteplugins.vllm
App environment backed by vLLM for serving large language models.
This environment sets up a vLLM server with the specified model and configuration.
class VLLMAppEnvironment(
name: str,
depends_on: List[Environment],
pod_template: Optional[Union[str, PodTemplate]],
description: Optional[str],
secrets: Optional[SecretRequest],
env_vars: Optional[Dict[str, str]],
resources: Optional[Resources],
interruptible: bool,
args: *args,
command: Optional[Union[List[str], str]],
requires_auth: bool,
scaling: Scaling,
domain: Domain | None,
links: List[Link],
include: List[str],
parameters: List[Parameter],
cluster_pool: str,
image: str | Image | Literal['auto'],
type: str,
port: int | Port,
extra_args: str | list[str],
model_path: str | RunOutput,
model_hf_path: str,
model_id: str,
stream_model: bool,
)| Parameter | Type | Description |
|---|---|---|
name |
str |
The name of the application. |
depends_on |
List[Environment] |
|
pod_template |
Optional[Union[str, PodTemplate]] |
|
description |
Optional[str] |
|
secrets |
Optional[SecretRequest] |
Secrets that are requested for application. |
env_vars |
Optional[Dict[str, str]] |
Environment variables to set for the application. |
resources |
Optional[Resources] |
|
interruptible |
bool |
|
args |
*args |
|
command |
Optional[Union[List[str], str]] |
|
requires_auth |
bool |
Whether the public URL requires authentication. |
scaling |
Scaling |
Scaling configuration for the app environment. |
domain |
Domain | None |
Domain to use for the app. |
links |
List[Link] |
|
include |
List[str] |
|
parameters |
List[Parameter] |
|
cluster_pool |
str |
The target cluster_pool where the app should be deployed. |
image |
str | Image | Literal['auto'] |
|
type |
str |
Type of app. |
port |
int | Port |
Port application listens to. Defaults to 8000 for vLLM. |
extra_args |
str | list[str] |
Extra args to pass to vllm serve. See
https://docs.vllm.ai/en/stable/configuration/engine_args or run vllm serve --help for details. |
model_path |
str | RunOutput |
Remote path to model (e.g., s3 |
model_hf_path |
str |
Hugging Face path to model (e.g., Qwen/Qwen3-0.6B). |
model_id |
str |
Model id that is exposed by vllm. |
stream_model |
bool |
Set to True to stream model from blob store to the GPU directly. If False, the model will be downloaded to the local file system first and then loaded into the GPU. |
Properties
| Property | Type | Description |
|---|---|---|
endpoint |
None |
Methods
| Method | Description |
|---|---|
add_dependency() |
Add a dependency to the environment. |
clone_with() |
|
container_args() |
Return the container arguments for vLLM. |
container_cmd() |
|
get_port() |
|
on_shutdown() |
Decorator to define the shutdown function for the app environment. |
on_startup() |
Decorator to define the startup function for the app environment. |
server() |
Decorator to define the server function for the app environment. |
add_dependency()
def add_dependency(
env: Environment,
)Add a dependency to the environment.
| Parameter | Type | Description |
|---|---|---|
env |
Environment |
clone_with()
def clone_with(
name: str,
image: Optional[Union[str, Image, Literal['auto']]],
resources: Optional[Resources],
env_vars: Optional[dict[str, str]],
secrets: Optional[SecretRequest],
depends_on: Optional[list[Environment]],
description: Optional[str],
interruptible: Optional[bool],
kwargs: **kwargs,
) -> VLLMAppEnvironment| Parameter | Type | Description |
|---|---|---|
name |
str |
|
image |
Optional[Union[str, Image, Literal['auto']]] |
|
resources |
Optional[Resources] |
|
env_vars |
Optional[dict[str, str]] |
|
secrets |
Optional[SecretRequest] |
|
depends_on |
Optional[list[Environment]] |
|
description |
Optional[str] |
|
interruptible |
Optional[bool] |
|
kwargs |
**kwargs |
container_args()
def container_args(
serialization_context: SerializationContext,
) -> list[str]Return the container arguments for vLLM.
| Parameter | Type | Description |
|---|---|---|
serialization_context |
SerializationContext |
container_cmd()
def container_cmd(
serialize_context: SerializationContext,
parameter_overrides: list[Parameter] | None,
) -> List[str]| Parameter | Type | Description |
|---|---|---|
serialize_context |
SerializationContext |
|
parameter_overrides |
list[Parameter] | None |
get_port()
def get_port()on_shutdown()
def on_shutdown(
fn: Callable[..., None],
) -> Callable[..., None]Decorator to define the shutdown function for the app environment.
This function is called after the server function is called.
This decorated function can be a sync or async function, and accepts input parameters based on the Parameters defined in the AppEnvironment definition.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[..., None] |
on_startup()
def on_startup(
fn: Callable[..., None],
) -> Callable[..., None]Decorator to define the startup function for the app environment.
This function is called before the server function is called.
The decorated function can be a sync or async function, and accepts input parameters based on the Parameters defined in the AppEnvironment definition.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[..., None] |
server()
def server(
fn: Callable[..., None],
) -> Callable[..., None]Decorator to define the server function for the app environment.
This decorated function can be a sync or async function, and accepts input parameters based on the Parameters defined in the AppEnvironment definition.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[..., None] |