vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving, designed for high-throughput and memory-efficient LLM serving.

Prerequisites

Install vLLM and start serving a model:

install vLLM

uv pip install vllm

start vLLM server

vllm serve Qwen/Qwen2.5-7B-Instruct \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --dtype float16 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.9

This spins up the vLLM server with an OpenAI-compatible API.

The default vLLM server URL is http://localhost:8000/

Example

Basic Agent

from agno.agent import Agent
from agno.models.vllm import VLLM

agent = Agent(
    model=VLLM(
        id="meta-llama/Llama-3.1-8B-Instruct",
        base_url="http://localhost:8000/",
    ),
    markdown=True
)

agent.print_response("Share a 2 sentence horror story.")

Advanced Usage

With Tools

vLLM models work seamlessly with Agno tools:

with_tools.py

from agno.agent import Agent
from agno.models.vllm import VLLM
from agno.tools.hackernews import HackerNewsTools

agent = Agent(
    model=VLLM(id="meta-llama/Llama-3.1-8B-Instruct"),
    tools=[HackerNewsTools()],
    markdown=True
)

agent.print_response("What's the latest news about AI?")

View more examples here.

For the full list of supported models, see the vLLM documentation.

Params

Parameter	Type	Default	Description
`id`	`str`	`"microsoft/DialoGPT-medium"`	The id of the model to use with vLLM
`name`	`str`	`"vLLM"`	The name of the model
`provider`	`str`	`"vLLM"`	The provider of the model
`api_key`	`Optional[str]`	`None`	The API key (usually not needed for local vLLM)
`base_url`	`str`	`"http://localhost:8000/v1"`	The base URL for the vLLM server

VLLM is a subclass of the Model class and has access to the same params.

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Prerequisites

Example

Advanced Usage

With Tools

Params

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Documentation Index

​Prerequisites

​Example

​Advanced Usage

​With Tools

​Params

Prerequisites

Example

Advanced Usage

With Tools

Params