vllm.config.model_arch ¶
ModelArchitectureConfig ¶
Configuration for model architecture that required by vLLM runtime
Source code in vllm/config/model_arch.py
architectures instance-attribute ¶
List of model architecture class names (e.g., ['LlamaForCausalLM']). It can be None upon calling vllm_config.with_hf_config(config.text_config)
derived_max_model_len_and_key instance-attribute ¶
Derived maximum model length and key from the hf config.
is_deepseek_mla instance-attribute ¶
is_deepseek_mla: bool
Whether the model is a DeepSeek MLA model.
quantization_config instance-attribute ¶
Quantization configuration dictionary containing quantization parameters.
text_model_type instance-attribute ¶
text_model_type: str | None
Text model type identifier (e.g., 'llama4_text').
total_num_attention_heads instance-attribute ¶
total_num_attention_heads: int
Number of attention heads in the model.
total_num_hidden_layers instance-attribute ¶
total_num_hidden_layers: int
Number of hidden layers in the model.