vllm.v1.core.sched.output ¶
CachedRequestData dataclass ¶
Source code in vllm/v1/core/sched/output.py
_req_id_to_num_output_tokens cached property ¶
Cache mapping of req_id to num_output_tokens for O(1) lookup.
This cached property is safe because CachedRequestData instances are created fresh each scheduling iteration and not mutated during computation of iteration details.
__init__ ¶
__init__(
req_ids: list[str],
resumed_req_ids: set[str],
new_token_ids: list[list[int]],
all_token_ids: dict[str, list[int]],
new_block_ids: list[tuple[list[int], ...] | None],
num_computed_tokens: list[int],
num_output_tokens: list[int],
) -> None
anon_repr ¶
anon_repr() -> str
Source code in vllm/v1/core/sched/output.py
is_context_phase ¶
make_empty classmethod ¶
make_empty() -> CachedRequestData
GrammarOutput dataclass ¶
Source code in vllm/v1/core/sched/output.py
NewRequestData dataclass ¶
Source code in vllm/v1/core/sched/output.py
__init__ ¶
__init__(
req_id: str,
prompt_token_ids: list[int] | None,
mm_features: list[MultiModalFeatureSpec],
sampling_params: SamplingParams | None,
pooling_params: PoolingParams | None,
block_ids: tuple[list[int], ...],
num_computed_tokens: int,
lora_request: LoRARequest | None,
prompt_embeds: Tensor | None = None,
prefill_token_ids: list[int] | None = None,
) -> None
__repr__ ¶
__repr__() -> str
Source code in vllm/v1/core/sched/output.py
anon_repr ¶
anon_repr() -> str
Source code in vllm/v1/core/sched/output.py
from_request classmethod ¶
from_request(
request: Request,
block_ids: tuple[list[int], ...],
prefill_token_ids: list[int] | None = None,
) -> NewRequestData
Source code in vllm/v1/core/sched/output.py
SchedulerOutput dataclass ¶
Source code in vllm/v1/core/sched/output.py
ec_connector_metadata class-attribute instance-attribute ¶
ec_connector_metadata: ECConnectorMetadata | None = None
has_structured_output_requests class-attribute instance-attribute ¶
has_structured_output_requests: bool = False
kv_connector_metadata class-attribute instance-attribute ¶
kv_connector_metadata: KVConnectorMetadata | None = None
num_invalid_spec_tokens class-attribute instance-attribute ¶
pending_structured_output_tokens class-attribute instance-attribute ¶
pending_structured_output_tokens: bool = False
scheduled_spec_decode_tokens instance-attribute ¶
__init__ ¶
__init__(
scheduled_new_reqs: list[NewRequestData],
scheduled_cached_reqs: CachedRequestData,
num_scheduled_tokens: dict[str, int],
total_num_scheduled_tokens: int,
scheduled_spec_decode_tokens: dict[str, list[int]],
scheduled_encoder_inputs: dict[str, list[int]],
num_common_prefix_blocks: list[int],
finished_req_ids: set[str],
free_encoder_mm_hashes: list[str],
preempted_req_ids: set[str] | None = None,
has_structured_output_requests: bool = False,
pending_structured_output_tokens: bool = False,
num_invalid_spec_tokens: dict[str, int] | None = None,
kv_connector_metadata: KVConnectorMetadata
| None = None,
ec_connector_metadata: ECConnectorMetadata
| None = None,
) -> None
make_empty classmethod ¶
make_empty() -> SchedulerOutput