Skip to content

vllm.v1.attention.ops

Modules:

Name Description
chunked_prefill_paged_decode
common
flashmla
merge_attn_states
paged_attn
pallas_kv_cache_update
prefix_prefill
rocm_aiter_mla_sparse
triton_decode_attention

Memory-efficient attention for decoding.

triton_merge_attn_states
triton_prefill_attention

Memory-efficient attention for prefill.

triton_reshape_and_cache_flash
triton_unified_attention
vit_attn_wrappers

This file contains ops for ViT attention to be compatible with torch.compile