vllm.v1.attention.backends.mamba1_attn ¶
Mamba1AttentionBackend ¶
Bases: AttentionBackend
Source code in vllm/v1/attention/backends/mamba1_attn.py
Mamba1AttentionMetadata dataclass ¶
Bases: BaseMambaAttentionMetadata
Source code in vllm/v1/attention/backends/mamba1_attn.py
__init__ ¶
__init__(
num_prefills: int,
num_prefill_tokens: int,
num_decodes: int,
num_decode_tokens: int,
num_reqs: int,
has_initial_states_p: Tensor | None,
query_start_loc_p: Tensor | None,
num_computed_tokens_p: Tensor | None,
state_indices_tensor: Tensor,
block_idx_last_scheduled_token: Tensor | None,
block_idx_first_scheduled_token_p: Tensor | None,
block_idx_last_computed_token: Tensor | None,
nums_dict: dict | None = None,
batch_ptr: Tensor | None = None,
token_chunk_offset_ptr: Tensor | None = None,
) -> None
Mamba1AttentionMetadataBuilder ¶
Bases: BaseMambaAttentionMetadataBuilder[Mamba1AttentionMetadata]