SGLangAppEnvironment
Package: flyteplugins.sglang
App environment backed by SGLang for serving large language models.
This environment sets up an SGLang server with the specified model and configuration.
class SGLangAppEnvironment(
name: str,
depends_on: List[Environment],
pod_template: Optional[Union[str, PodTemplate]],
description: Optional[str],
secrets: Optional[SecretRequest],
env_vars: Optional[Dict[str, str]],
resources: Optional[Resources],
interruptible: bool,
args: *args,
command: Optional[Union[List[str], str]],
requires_auth: bool,
scaling: Scaling,
domain: Domain | None,
links: List[Link],
include: List[str],
parameters: List[Parameter],
cluster_pool: str,
image: str | Image | Literal['auto'],
type: str,
port: int | Port,
extra_args: str | list[str],
model_path: str | RunOutput,
model_hf_path: str,
model_id: str,
stream_model: bool,
)| Parameter | Type | Description |
|---|---|---|
name |
str |
The name of the application. |
depends_on |
List[Environment] |
|
pod_template |
Optional[Union[str, PodTemplate]] |
|
description |
Optional[str] |
|
secrets |
Optional[SecretRequest] |
Secrets that are requested for application. |
env_vars |
Optional[Dict[str, str]] |
Environment variables to set for the application. |
resources |
Optional[Resources] |
|
interruptible |
bool |
|
args |
*args |
|
command |
Optional[Union[List[str], str]] |
|
requires_auth |
bool |
Whether the public URL requires authentication. |
scaling |
Scaling |
Scaling configuration for the app environment. |
domain |
Domain | None |
Domain to use for the app. |
links |
List[Link] |
|
include |
List[str] |
|
parameters |
List[Parameter] |
|
cluster_pool |
str |
The target cluster_pool where the app should be deployed. |
image |
str | Image | Literal['auto'] |
|
type |
str |
Type of app. |
port |
int | Port |
Port application listens to. Defaults to 8000 for SGLang. |
extra_args |
str | list[str] |
Extra args to pass to python -m sglang.launch_server. See
https://docs.sglang.io/advanced_features/server_arguments.html for details. |
model_path |
str | RunOutput |
Remote path to model (e.g., s3 |
model_hf_path |
str |
Hugging Face path to model (e.g., Qwen/Qwen3-0.6B). |
model_id |
str |
Model id that is exposed by SGLang. |
stream_model |
bool |
Set to True to stream model from blob store to the GPU directly. If False, the model will be downloaded to the local file system first and then loaded into the GPU. |
Properties
| Property | Type | Description |
|---|---|---|
endpoint |
None |
Methods
| Method | Description |
|---|---|
add_dependency() |
Add a dependency to the environment. |
clone_with() |
|
container_args() |
Return the container arguments for SGLang. |
container_cmd() |
|
get_port() |
|
on_shutdown() |
Decorator to define the shutdown function for the app environment. |
on_startup() |
Decorator to define the startup function for the app environment. |
server() |
Decorator to define the server function for the app environment. |
add_dependency()
def add_dependency(
env: Environment,
)Add a dependency to the environment.
| Parameter | Type | Description |
|---|---|---|
env |
Environment |
clone_with()
def clone_with(
name: str,
image: Optional[Union[str, Image, Literal['auto']]],
resources: Optional[Resources],
env_vars: Optional[dict[str, str]],
secrets: Optional[SecretRequest],
depends_on: Optional[list[Environment]],
description: Optional[str],
interruptible: Optional[bool],
kwargs: **kwargs,
) -> SGLangAppEnvironment| Parameter | Type | Description |
|---|---|---|
name |
str |
|
image |
Optional[Union[str, Image, Literal['auto']]] |
|
resources |
Optional[Resources] |
|
env_vars |
Optional[dict[str, str]] |
|
secrets |
Optional[SecretRequest] |
|
depends_on |
Optional[list[Environment]] |
|
description |
Optional[str] |
|
interruptible |
Optional[bool] |
|
kwargs |
**kwargs |
container_args()
def container_args(
serialization_context: SerializationContext,
) -> list[str]Return the container arguments for SGLang.
| Parameter | Type | Description |
|---|---|---|
serialization_context |
SerializationContext |
container_cmd()
def container_cmd(
serialize_context: SerializationContext,
parameter_overrides: list[Parameter] | None,
) -> List[str]| Parameter | Type | Description |
|---|---|---|
serialize_context |
SerializationContext |
|
parameter_overrides |
list[Parameter] | None |
get_port()
def get_port()on_shutdown()
def on_shutdown(
fn: Callable[..., None],
) -> Callable[..., None]Decorator to define the shutdown function for the app environment.
This function is called after the server function is called.
This decorated function can be a sync or async function, and accepts input parameters based on the Parameters defined in the AppEnvironment definition.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[..., None] |
on_startup()
def on_startup(
fn: Callable[..., None],
) -> Callable[..., None]Decorator to define the startup function for the app environment.
This function is called before the server function is called.
The decorated function can be a sync or async function, and accepts input parameters based on the Parameters defined in the AppEnvironment definition.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[..., None] |
server()
def server(
fn: Callable[..., None],
) -> Callable[..., None]Decorator to define the server function for the app environment.
This decorated function can be a sync or async function, and accepts input parameters based on the Parameters defined in the AppEnvironment definition.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[..., None] |