vllm.v1.attention.backends.short_conv_attn ¶
ShortConvAttentionBackend ¶
Bases: AttentionBackend
Source code in vllm/v1/attention/backends/short_conv_attn.py
ShortConvAttentionMetadata dataclass ¶
Bases: BaseMambaAttentionMetadata
Source code in vllm/v1/attention/backends/short_conv_attn.py
__init__ ¶
__init__(
num_prefills: int,
num_prefill_tokens: int,
num_decodes: int,
num_decode_tokens: int,
num_reqs: int,
has_initial_states_p: Tensor | None,
query_start_loc_p: Tensor | None,
num_computed_tokens_p: Tensor | None,
state_indices_tensor: Tensor,
block_idx_last_scheduled_token: Tensor | None,
block_idx_first_scheduled_token_p: Tensor | None,
block_idx_last_computed_token: Tensor | None,
nums_dict: dict | None = None,
batch_ptr: Tensor | None = None,
token_chunk_offset_ptr: Tensor | None = None,
) -> None
ShortConvAttentionMetadataBuilder ¶
Bases: BaseMambaAttentionMetadataBuilder[ShortConvAttentionMetadata]