vllm.compilation.rocm_aiter_fusion ¶
AiterFusedAddRMSFp8GroupQuantPattern ¶
Bases: AiterRMSNormQuantPattern
This pattern fuses aiter rms_norm_with_add & group fp8 quant custom ops into a aiter rms_norm_with_add_group_fp8_quant op.
Source code in vllm/compilation/rocm_aiter_fusion.py
__init__ ¶
__init__(
epsilon: float,
quant_dtype: dtype,
group_shape: GroupShape,
match_aiter_quant: bool = True,
symmetric: bool = True,
) -> None
Source code in vllm/compilation/rocm_aiter_fusion.py
register ¶
Source code in vllm/compilation/rocm_aiter_fusion.py
AiterFusedAddRMSNormDynamicQuantPattern ¶
Bases: AiterRMSNormQuantPattern
AITER RMSNorm Fused Add + Dynamic Quantization pattern.
Source code in vllm/compilation/rocm_aiter_fusion.py
__init__ ¶
__init__(
epsilon: float,
quant_dtype: dtype,
match_aiter_quant: bool = True,
group_shape: GroupShape = PER_TOKEN,
symmetric: bool = True,
) -> None
Source code in vllm/compilation/rocm_aiter_fusion.py
register ¶
Source code in vllm/compilation/rocm_aiter_fusion.py
AiterRMSFp8GroupQuantPattern ¶
Bases: AiterRMSNormQuantPattern
This pattern fuses aiter rms_norm & group fp8 quant custom ops into an aiter rms_norm_group_fp8_quant op.
Source code in vllm/compilation/rocm_aiter_fusion.py
__init__ ¶
__init__(
epsilon: float,
quant_dtype: dtype,
group_shape: GroupShape,
match_aiter_quant: bool = True,
symmetric: bool = True,
) -> None
Source code in vllm/compilation/rocm_aiter_fusion.py
register ¶
Source code in vllm/compilation/rocm_aiter_fusion.py
AiterRMSNormDynamicQuantPattern ¶
Bases: AiterRMSNormQuantPattern
AITER RMSNorm + Dynamic Quantization pattern.
Source code in vllm/compilation/rocm_aiter_fusion.py
__init__ ¶
__init__(
epsilon: float,
quant_dtype: dtype,
match_aiter_quant: bool = True,
group_shape: GroupShape = PER_TOKEN,
symmetric: bool = True,
) -> None
Source code in vllm/compilation/rocm_aiter_fusion.py
register ¶
Source code in vllm/compilation/rocm_aiter_fusion.py
AiterRMSNormQuantPattern ¶
Source code in vllm/compilation/rocm_aiter_fusion.py
quant_matcher instance-attribute ¶
quant_matcher = MatcherQuantFP8(
quant, match_rocm_aiter=match_aiter_quant
)
rmsnorm_matcher instance-attribute ¶
rmsnorm_matcher = (
MatcherRMSNorm(epsilon, match_rocm_aiter=True)
if not fused_add
else MatcherFusedAddRMSNorm(
epsilon, match_rocm_aiter=True
)
)
__init__ ¶
__init__(
epsilon: float,
key: FusedRMSQuantKey,
match_aiter_quant: bool = True,
)
Source code in vllm/compilation/rocm_aiter_fusion.py
AiterSiluMulFp8GroupQuantPattern ¶
Bases: ActivationQuantPattern
This pattern fuses aiter silu_and_mul & group fp8 quant custom ops into an aiter silu_and_mul_group_fp8_quant op.
Source code in vllm/compilation/rocm_aiter_fusion.py
FUSED_SILU_MUL_QUANT_OP class-attribute instance-attribute ¶
__init__ ¶
get_inputs ¶
register ¶
Source code in vllm/compilation/rocm_aiter_fusion.py
RocmAiterRMSNormFusionPass ¶
Bases: VllmPatternMatcherPass
This pass fuses aiter rms_norm & vllm/aiter quant custom ops into a fused rms_norm_quant op. It also supports fused_add_rms_norm.
Source code in vllm/compilation/rocm_aiter_fusion.py
patterns instance-attribute ¶
patterns: PatternMatcherPass = PatternMatcherPass(
pass_name="rocm_aiter_rms_norm_quant_fusion_pass"
)
__init__ ¶
__init__(config: VllmConfig) -> None
Source code in vllm/compilation/rocm_aiter_fusion.py
RocmAiterSiluMulFp8GroupQuantFusionPass ¶
Bases: VllmPatternMatcherPass
This pass fuses a pre-defined set of custom ops into fused ops. It uses the torch pattern matcher to find the patterns and replace them.
Because patterns can only be registered once, the pass is a singleton. This will be addressed in a future version of PyTorch: https://github.com/pytorch/pytorch/pull/139321#issuecomment-2452354980
Source code in vllm/compilation/rocm_aiter_fusion.py
AITER_GROUP_FP8_QUANT_OP class-attribute instance-attribute ¶
QUANT_OPS class-attribute instance-attribute ¶
QUANT_OPS = [
AITER_GROUP_FP8_QUANT_OP,
TRITON_GROUP_FP8_QUANT_OP,
]
patterns instance-attribute ¶
patterns: PatternMatcherPass = PatternMatcherPass(
pass_name="rocm_aiter_silu_mul_fp8_group_quant_fusion_pass"
)
__init__ ¶
__init__(config: VllmConfig) -> None