The page you navigated to () does not exist, so we brought you to the closest page to it.
You have switched from the to the variant of this site. There is no equivalent of . We have taken you to the closest page in the variant.
2.1.9
VLLMShardArgs
Package: flyte.prefetch
Arguments for sharding a model using vLLM.
Parameters
class VLLMShardArgs(
tensor_parallel_size: int,
dtype: str,
trust_remote_code: bool,
max_model_len: int | None,
file_pattern: str | None,
max_file_size: int,
)Create a new model by parsing and validating input data from keyword arguments.
Raises
ValidationError if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
| Parameter | Type | Description |
|---|---|---|
tensor_parallel_size |
int |
Number of tensor parallel workers. |
dtype |
str |
Data type for model weights. |
trust_remote_code |
bool |
Whether to trust remote code from HuggingFace. |
max_model_len |
int | None |
Maximum model context length. |
file_pattern |
str | None |
Pattern for sharded weight files. |
max_file_size |
int |
Maximum size for each sharded file. |
Methods
| Method | Description |
|---|---|
get_vllm_args() |
Get arguments dict for vLLM LLM constructor. |
get_vllm_args()
def get_vllm_args(
model_path: str,
) -> dict[str, Any]Get arguments dict for vLLM LLM constructor.
| Parameter | Type | Description |
|---|---|---|
model_path |
str |