vLLM is a fast and easy-to-use library for LLM inference and serving, designed for high-throughput and memory-efficient LLM serving.Documentation Index
Fetch the complete documentation index at: https://agno-v2-shaloo-ai-support-link.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Install vLLM and start serving a model:install vLLM
start vLLM server
The default vLLM server URL is
http://localhost:8000/Example
Basic AgentAdvanced Usage
With Tools
vLLM models work seamlessly with Agno tools:with_tools.py
View more examples here.
Params
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "microsoft/DialoGPT-medium" | The id of the model to use with vLLM |
name | str | "vLLM" | The name of the model |
provider | str | "vLLM" | The provider of the model |
api_key | Optional[str] | None | The API key (usually not needed for local vLLM) |
base_url | str | "http://localhost:8000/v1" | The base URL for the vLLM server |
VLLM is a subclass of the Model class and has access to the same params.