Performance Quick Wins

Most knowledge bases work great with Agno’s defaults. But if you’re seeing slow searches, memory issues, or poor results, a few strategic changes can make a big difference.

When to Optimize

Don’t prematurely optimize. Focus on performance when you notice:

Slow search - Queries taking more than 2-3 seconds
Memory issues - Out of memory errors during content loading
Poor results - Search returning irrelevant chunks or missing obvious matches
Slow loading - Content processing taking unusually long

If things are working fine, stick with the defaults and focus on building your application.

The 80/20 of Performance

These five changes give you the biggest performance boost for the least effort:

1. Pick the Right Vector Database

Your database choice has the biggest impact on performance at scale:

from agno.vectordb.lancedb import LanceDb
from agno.vectordb.pgvector import PgVector

# Development: Fast, local, zero setup
dev_db = LanceDb(
    table_name="dev_knowledge",
    uri="./local_db"
)

# Production: Scalable, battle-tested
prod_db = PgVector(
    table_name="prod_knowledge",
    db_url="postgresql+psycopg://user:pass@db:5432/knowledge"
)

Guidelines:

LanceDB for development and testing (no setup required)
PgVector for production (up to 1M documents, need SQL features)
Pinecone for managed services (no ops overhead, auto-scaling)

2. Skip Already-Processed Files

The single biggest speed-up for re-running your ingestion:

# Skip files you've already processed
knowledge.add_content(
    path="large_document.pdf",
    skip_if_exists=True,  # Don't reprocess existing files
    upsert=False          # Don't update existing
)

# For batch loading
knowledge.add_contents(
    paths=["docs/", "policies/"],
    skip_if_exists=True,
    include=["*.pdf", "*.md"],
    exclude=["*temp*", "*draft*"]
)

3. Use Metadata Filters

Narrow searches before vector comparison for faster, more accurate results:

# Slow: Search everything
results = knowledge.search("deployment process", max_results=10)

# Fast: Filter first, then search
results = knowledge.search(
    query="deployment process",
    max_results=10,
    filters={"department": "engineering", "type": "procedure"}
)

# Validate your filters to catch typos
valid_filters, invalid_keys = knowledge.validate_filters({
    "department": "engineering",
    "invalid_key": "value"  # This gets flagged
})

4. Match Chunking Strategy to Your Content

Different strategies have different performance characteristics:

Strategy	Speed	Quality	Best For
Fixed Size	Fast	Good	Uniform content, when speed matters
Semantic	Slower	Best	Complex docs, when quality matters
Recursive	Fast	Good	Structured docs, good balance

from agno.knowledge.chunking.fixed import FixedSizeChunking
from agno.knowledge.chunking.semantic import SemanticChunking

# Fast processing for simple content
fast_chunking = FixedSizeChunking(
    chunk_size=800,
    overlap=80
)

# Better quality for complex content (but slower)
quality_chunking = SemanticChunking(
    chunk_size=1200,
    similarity_threshold=0.5
)

Learn more about choosing chunking strategies.

5. Use Async for Batch Operations

Process multiple items concurrently:

import asyncio

async def load_knowledge_efficiently():
    # Load multiple content sources in parallel
    tasks = [
        knowledge.add_content_async(path="docs/hr/"),
        knowledge.add_content_async(path="docs/engineering/"),
        knowledge.add_content_async(url="https://company.com/api-docs"),
    ]
    await asyncio.gather(*tasks)

asyncio.run(load_knowledge_efficiently())

Common Performance Pitfalls

Issue: Search Returns Irrelevant Results

What’s happening: Chunks are too large, too small, or chunking strategy doesn’t match your content. Quick fixes:

Check your chunking strategy - try semantic chunking for better context
Verify content actually loaded: knowledge.get_content_status(content_id)
Increase max_results to see if relevant results are just ranked lower
Add metadata filters to narrow the search scope

# Debug search quality
results = knowledge.search("your query", max_results=10)
if not results:
    content_list, count = knowledge.get_content()
    print(f"Total content items: {count}")
    
    # Check for failed content
    for content in content_list[:5]:
        status, message = knowledge.get_content_status(content.id)
        print(f"{content.name}: {status}")

Issue: Content Loading is Slow

What’s happening: Processing large files without batching, or using semantic chunking on huge datasets. Quick fixes:

Use skip_if_exists=True to avoid reprocessing
Switch to fixed-size chunking for faster processing
Process in batches instead of all at once
Use file filters to only process what you need

# Batch processing for large datasets
import os

def load_content_in_batches(knowledge, content_dir, batch_size=10):
    files = [f for f in os.listdir(content_dir) if f.endswith('.pdf')]
    
    for i in range(0, len(files), batch_size):
        batch_files = files[i:i+batch_size]
        print(f"Processing batch {i//batch_size + 1}")
        
        for file in batch_files:
            knowledge.add_content(
                path=os.path.join(content_dir, file),
                skip_if_exists=True
            )

Issue: Running Out of Memory

What’s happening: Loading too many large files at once, or chunk sizes are too large. Quick fixes:

Process content in smaller batches (see code above)
Reduce chunk size in your chunking strategy
Use include and exclude patterns to limit what gets processed
Clear old/outdated content regularly with knowledge.remove_content_by_id()

# Process only what you need
knowledge.add_contents(
    paths=["large_dataset/"],
    include=["*.pdf"],       # Only PDFs
    exclude=["*backup*"],    # Skip backups
    skip_if_exists=True,
    metadata={"batch": "current"}
)

Advanced Optimizations

Once you’ve applied the quick wins above, consider these for further improvements:

Use Hybrid Search

Combine vector and keyword search for better results:

from agno.vectordb.pgvector import PgVector, SearchType

vector_db = PgVector(
    table_name="knowledge",
    db_url="postgresql+psycopg://user:pass@localhost:5432/db",
    search_type=SearchType.hybrid  # Vector + keyword search
)

Add Reranking

Improve result quality by reranking with Cohere:

from agno.knowledge.reranker.cohere import CohereReranker

vector_db = PgVector(
    table_name="knowledge",
    db_url="postgresql+psycopg://user:pass@localhost:5432/db",
    reranker=CohereReranker(
        model="rerank-multilingual-v3.0",
        top_n=10
    )
)

Optimize Embedder Dimensions

Reduce dimensions for faster search (with slight quality trade-off):

from agno.knowledge.embedder.openai import OpenAIEmbedder

# Smaller dimensions = faster search, lower cost
embedder = OpenAIEmbedder(
    id="text-embedding-3-large",
    dimensions=1024  # Instead of full 3072
)

Monitoring Performance

Keep an eye on these metrics:

# Check content processing status
content_list, total_count = knowledge.get_content()

failed = [c for c in content_list if c.status == "failed"]
if failed:
    print(f"Failed items: {len(failed)}")
    for content in failed:
        status, message = knowledge.get_content_status(content.id)
        print(f"  {content.name}: {message}")

# Time your searches
import time

start = time.time()
results = knowledge.search("test query", max_results=5)
elapsed = time.time() - start
print(f"Search took {elapsed:.2f} seconds")

Next Steps

Chunking Strategies

Learn how different chunking strategies affect performance

Vector Databases

Compare vector database options for your scale

Embedders

Choose the right embedder for your use case

Hybrid Search

Combine vector and keyword search for better results

Start simple, optimize when needed. Agno’s defaults work well for most use cases. Profile your application to find actual bottlenecks before spending time on optimization.

Introduction

Learn

Help

Performance Quick Wins

When to Optimize

The 80/20 of Performance

1. Pick the Right Vector Database

2. Skip Already-Processed Files

3. Use Metadata Filters

4. Match Chunking Strategy to Your Content

5. Use Async for Batch Operations

Common Performance Pitfalls

Issue: Search Returns Irrelevant Results

Issue: Content Loading is Slow

Issue: Running Out of Memory

Advanced Optimizations

Use Hybrid Search

Add Reranking

Optimize Embedder Dimensions

Monitoring Performance

Next Steps

Chunking Strategies

Vector Databases

Embedders

Hybrid Search

Introduction

Learn

Help

​When to Optimize

​The 80/20 of Performance

​1. Pick the Right Vector Database

​2. Skip Already-Processed Files

​3. Use Metadata Filters

​4. Match Chunking Strategy to Your Content

​5. Use Async for Batch Operations

​Common Performance Pitfalls

​Issue: Search Returns Irrelevant Results

​Issue: Content Loading is Slow

​Issue: Running Out of Memory

​Advanced Optimizations

​Use Hybrid Search

​Add Reranking

​Optimize Embedder Dimensions

​Monitoring Performance

​Next Steps

Chunking Strategies

Vector Databases

Embedders

Hybrid Search

When to Optimize

The 80/20 of Performance

1. Pick the Right Vector Database

2. Skip Already-Processed Files

3. Use Metadata Filters

4. Match Chunking Strategy to Your Content

5. Use Async for Batch Operations

Common Performance Pitfalls

Issue: Search Returns Irrelevant Results

Issue: Content Loading is Slow

Issue: Running Out of Memory

Advanced Optimizations

Use Hybrid Search

Add Reranking

Optimize Embedder Dimensions

Monitoring Performance

Next Steps