Basic Stream

Code
Usage

Code

cookbook/11_models/llama_cpp/basic_stream.py

from typing import Iterator  # noqa
from agno.agent import Agent, RunOutputEvent  # noqa
from agno.models.llama_cpp import LlamaCpp

agent = Agent(model=LlamaCpp(id="ggml-org/gpt-oss-20b-GGUF"), markdown=True)

# Get the response in a variable
# run_response: Iterator[RunOutputEvent] = agent.run("Share a 2 sentence horror story", stream=True)
# for chunk in run_response:
#     print(chunk.content)

# Print the response in the terminal
agent.print_response("Share a 2 sentence horror story", stream=True)

Usage

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate

Install LlamaCpp

Follow the LlamaCpp installation guide and start the server:

llama-server -hf ggml-org/gpt-oss-20b-GGUF --ctx-size 0 --jinja -ub 2048 -b 2048

Install dependencies

uv pip install -U agno

Run Agent

python cookbook/11_models/llama_cpp/basic_stream.py

Basic Structured Output

⌘I

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Basic Stream

Code

Usage

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Documentation Index

​Code

​Usage

Code

Usage