KServe¶
vLLM can be deployed with KServe on Kubernetes for highly scalable distributed model serving.
You can use vLLM with KServe's Hugging Face serving runtime or via LLMInferenceService that uses llm-d.
vLLM can be deployed with KServe on Kubernetes for highly scalable distributed model serving.
You can use vLLM with KServe's Hugging Face serving runtime or via LLMInferenceService that uses llm-d.