Skip to main content
Most knowledge bases work great with Agno’s defaults. But if you’re seeing slow searches, memory issues, or poor results, a few strategic changes can make a big difference.

When to Optimize

Don’t prematurely optimize. Focus on performance when you notice:
  • Slow search - Queries taking more than 2-3 seconds
  • Memory issues - Out of memory errors during content loading
  • Poor results - Search returning irrelevant chunks or missing obvious matches
  • Slow loading - Content processing taking unusually long
If things are working fine, stick with the defaults and focus on building your application.

The 80/20 of Performance

These five changes give you the biggest performance boost for the least effort:

1. Pick the Right Vector Database

Your database choice has the biggest impact on performance at scale:
from agno.vectordb.lancedb import LanceDb
from agno.vectordb.pgvector import PgVector

# Development: Fast, local, zero setup
dev_db = LanceDb(
    table_name="dev_knowledge",
    uri="./local_db"
)

# Production: Scalable, battle-tested
prod_db = PgVector(
    table_name="prod_knowledge",
    db_url="postgresql+psycopg://user:pass@db:5432/knowledge"
)
Guidelines:
  • LanceDB for development and testing (no setup required)
  • PgVector for production (up to 1M documents, need SQL features)
  • Pinecone for managed services (no ops overhead, auto-scaling)

2. Skip Already-Processed Files

The single biggest speed-up for re-running your ingestion:
# Skip files you've already processed
knowledge.add_content(
    path="large_document.pdf",
    skip_if_exists=True,  # Don't reprocess existing files
    upsert=False          # Don't update existing
)

# For batch loading
knowledge.add_contents(
    paths=["docs/", "policies/"],
    skip_if_exists=True,
    include=["*.pdf", "*.md"],
    exclude=["*temp*", "*draft*"]
)

3. Use Metadata Filters

Narrow searches before vector comparison for faster, more accurate results:
# Slow: Search everything
results = knowledge.search("deployment process", max_results=10)

# Fast: Filter first, then search
results = knowledge.search(
    query="deployment process",
    max_results=10,
    filters={"department": "engineering", "type": "procedure"}
)

# Validate your filters to catch typos
valid_filters, invalid_keys = knowledge.validate_filters({
    "department": "engineering",
    "invalid_key": "value"  # This gets flagged
})

4. Match Chunking Strategy to Your Content

Different strategies have different performance characteristics:
StrategySpeedQualityBest For
Fixed SizeFastGoodUniform content, when speed matters
SemanticSlowerBestComplex docs, when quality matters
RecursiveFastGoodStructured docs, good balance
from agno.knowledge.chunking.fixed import FixedSizeChunking
from agno.knowledge.chunking.semantic import SemanticChunking

# Fast processing for simple content
fast_chunking = FixedSizeChunking(
    chunk_size=800,
    overlap=80
)

# Better quality for complex content (but slower)
quality_chunking = SemanticChunking(
    chunk_size=1200,
    similarity_threshold=0.5
)
Learn more about choosing chunking strategies.

5. Use Async for Batch Operations

Process multiple items concurrently:
import asyncio

async def load_knowledge_efficiently():
    # Load multiple content sources in parallel
    tasks = [
        knowledge.add_content_async(path="docs/hr/"),
        knowledge.add_content_async(path="docs/engineering/"),
        knowledge.add_content_async(url="https://company.com/api-docs"),
    ]
    await asyncio.gather(*tasks)

asyncio.run(load_knowledge_efficiently())

Common Performance Pitfalls

Issue: Search Returns Irrelevant Results

What’s happening: Chunks are too large, too small, or chunking strategy doesn’t match your content. Quick fixes:
  1. Check your chunking strategy - try semantic chunking for better context
  2. Verify content actually loaded: knowledge.get_content_status(content_id)
  3. Increase max_results to see if relevant results are just ranked lower
  4. Add metadata filters to narrow the search scope
# Debug search quality
results = knowledge.search("your query", max_results=10)
if not results:
    content_list, count = knowledge.get_content()
    print(f"Total content items: {count}")
    
    # Check for failed content
    for content in content_list[:5]:
        status, message = knowledge.get_content_status(content.id)
        print(f"{content.name}: {status}")

Issue: Content Loading is Slow

What’s happening: Processing large files without batching, or using semantic chunking on huge datasets. Quick fixes:
  1. Use skip_if_exists=True to avoid reprocessing
  2. Switch to fixed-size chunking for faster processing
  3. Process in batches instead of all at once
  4. Use file filters to only process what you need
# Batch processing for large datasets
import os

def load_content_in_batches(knowledge, content_dir, batch_size=10):
    files = [f for f in os.listdir(content_dir) if f.endswith('.pdf')]
    
    for i in range(0, len(files), batch_size):
        batch_files = files[i:i+batch_size]
        print(f"Processing batch {i//batch_size + 1}")
        
        for file in batch_files:
            knowledge.add_content(
                path=os.path.join(content_dir, file),
                skip_if_exists=True
            )

Issue: Running Out of Memory

What’s happening: Loading too many large files at once, or chunk sizes are too large. Quick fixes:
  1. Process content in smaller batches (see code above)
  2. Reduce chunk size in your chunking strategy
  3. Use include and exclude patterns to limit what gets processed
  4. Clear old/outdated content regularly with knowledge.remove_content_by_id()
# Process only what you need
knowledge.add_contents(
    paths=["large_dataset/"],
    include=["*.pdf"],       # Only PDFs
    exclude=["*backup*"],    # Skip backups
    skip_if_exists=True,
    metadata={"batch": "current"}
)

Advanced Optimizations

Once you’ve applied the quick wins above, consider these for further improvements: Combine vector and keyword search for better results:
from agno.vectordb.pgvector import PgVector, SearchType

vector_db = PgVector(
    table_name="knowledge",
    db_url="postgresql+psycopg://user:pass@localhost:5432/db",
    search_type=SearchType.hybrid  # Vector + keyword search
)

Add Reranking

Improve result quality by reranking with Cohere:
from agno.knowledge.reranker.cohere import CohereReranker

vector_db = PgVector(
    table_name="knowledge",
    db_url="postgresql+psycopg://user:pass@localhost:5432/db",
    reranker=CohereReranker(
        model="rerank-multilingual-v3.0",
        top_n=10
    )
)

Optimize Embedder Dimensions

Reduce dimensions for faster search (with slight quality trade-off):
from agno.knowledge.embedder.openai import OpenAIEmbedder

# Smaller dimensions = faster search, lower cost
embedder = OpenAIEmbedder(
    id="text-embedding-3-large",
    dimensions=1024  # Instead of full 3072
)

Monitoring Performance

Keep an eye on these metrics:
# Check content processing status
content_list, total_count = knowledge.get_content()

failed = [c for c in content_list if c.status == "failed"]
if failed:
    print(f"Failed items: {len(failed)}")
    for content in failed:
        status, message = knowledge.get_content_status(content.id)
        print(f"  {content.name}: {message}")

# Time your searches
import time

start = time.time()
results = knowledge.search("test query", max_results=5)
elapsed = time.time() - start
print(f"Search took {elapsed:.2f} seconds")

Next Steps

Start simple, optimize when needed. Agno’s defaults work well for most use cases. Profile your application to find actual bottlenecks before spending time on optimization.
I