vllm.model_executor.models.glmasr_utils ¶
_as_list_chunk_counts ¶
Source code in vllm/model_executor/models/glmasr_utils.py
_calculate_conv_output_length ¶
_calculate_conv_output_length(
input_length: Tensor,
padding: int,
kernel_size: int,
stride: int,
) -> Tensor
Calculate Conv1d output length using standard formula.
Source code in vllm/model_executor/models/glmasr_utils.py
_extract_mask_for_item ¶
_extract_mask_for_item(
feature_attention_mask: Tensor | list[Tensor],
chunk_counts: Tensor | list[int] | None,
item_idx: int,
) -> Tensor
Extract attention mask for a specific audio item.
Source code in vllm/model_executor/models/glmasr_utils.py
_flatten_audio_features_by_length ¶
Source code in vllm/model_executor/models/glmasr_utils.py
_get_audio_output_lengths_for_tower ¶
_get_audio_output_lengths_for_tower(
audio_tower: Module,
audio_lengths: Tensor,
merge_factor: int,
conv_params: list[tuple[int, int, int]],
) -> Tensor
Calculate the output lengths after audio processing.
The output length accounts for: 1. Convolution layers (downsampling) 2. Merge factor (further downsampling during projection)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_tower | Module | The audio encoder module | required |
audio_lengths | Tensor | Input feature lengths [batch_size] | required |
merge_factor | int | Factor for merging adjacent features | required |
conv_params | list[tuple[int, int, int]] | List of (padding, kernel_size, stride) for each conv layer | required |
Returns:
| Type | Description |
|---|---|
Tensor | Output lengths after all processing [batch_size] |
Source code in vllm/model_executor/models/glmasr_utils.py
_get_audio_output_lengths_from_lengths ¶
_get_audio_output_lengths_from_lengths(
audio_lengths: Tensor,
merge_factor: int,
conv_params: list[tuple[int, int, int]],
) -> Tensor
Source code in vllm/model_executor/models/glmasr_utils.py
_get_audio_output_lengths_from_mask ¶
_get_audio_output_lengths_from_mask(
mask: Tensor,
merge_factor: int,
conv_params: list[tuple[int, int, int]],
) -> Tensor
Source code in vllm/model_executor/models/glmasr_utils.py
_get_num_features_for_item ¶
_get_num_features_for_item(
feature_attention_mask: Tensor | None,
chunk_counts: Tensor | list[int] | None,
item_idx: int,
audio_embeds: list[Tensor] | None,
merge_factor: int,
conv_params: list[tuple[int, int, int]],
) -> int
Get number of features for a specific audio item.
Source code in vllm/model_executor/models/glmasr_utils.py
_group_audio_embeddings ¶
_group_audio_embeddings(
chunk_embeddings: Sequence[Tensor],
chunk_counts: Sequence[int],
) -> tuple[Tensor, ...]
Source code in vllm/model_executor/models/glmasr_utils.py
_normalize_chunk_counts ¶
_normalize_to_tensor ¶
Convert mask to tensor, handling both list and tensor formats.