Documentation Index
Fetch the complete documentation index at: https://agno-v2-shaloo-ai-support-link.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Document chunking is a method of splitting documents into smaller chunks based on document structure like paragraphs and sections. It analyzes natural document boundaries rather than splitting at fixed character counts. This is useful when you want to process large documents while preserving semantic meaning and context.
Create a Python file
import asyncio
from agno.agent import Agent
from agno.knowledge.chunking.document import DocumentChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.vectordb.pgvector import PgVector
db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
knowledge = Knowledge(
vector_db=PgVector(table_name="recipes_document_chunking", db_url=db_url),
)
asyncio.run(knowledge.ainsert(
url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
reader=PDFReader(
name="Document Chunking Reader",
chunking_strategy=DocumentChunking(),
),
))
agent = Agent(
knowledge=knowledge,
search_knowledge=True,
)
agent.print_response("How to make Thai curry?", markdown=True)
Set up your virtual environment
uv venv --python 3.12
source .venv/bin/activate
Install dependencies
uv pip install -U agno sqlalchemy psycopg pgvector
Run PgVector
docker run -d \
-e POSTGRES_DB=ai \
-e POSTGRES_USER=ai \
-e POSTGRES_PASSWORD=ai \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v pgvolume:/var/lib/postgresql/data \
-p 5532:5432 \
--name pgvector \
agno/pgvector:16
Run the script
python document_chunking.py
Document Chunking Params
| Parameter | Type | Default | Description |
|---|
chunk_size | int | 5000 | The maximum size of each chunk. |
overlap | int | 0 | The number of characters to overlap between chunks. |