Skip to content

KServe¶

vLLM can be deployed with KServe on Kubernetes for highly scalable distributed model serving.

You can use vLLM with KServe's Hugging Face serving runtime or via LLMInferenceService that uses llm-d.