Most knowledge bases work great with Agno’s defaults. But if you’re seeing slow searches, memory issues, or poor results, a few strategic changes can make a big difference.
When to Optimize
Don’t prematurely optimize. Focus on performance when you notice:
- Slow search - Queries taking more than 2-3 seconds
- Memory issues - Out of memory errors during content loading
- Poor results - Search returning irrelevant chunks or missing obvious matches
- Slow loading - Content processing taking unusually long
If things are working fine, stick with the defaults and focus on building your application.
These five changes give you the biggest performance boost for the least effort:
1. Pick the Right Vector Database
Your database choice has the biggest impact on performance at scale:
from agno.vectordb.lancedb import LanceDb
from agno.vectordb.pgvector import PgVector
# Development: Fast, local, zero setup
dev_db = LanceDb(
table_name="dev_knowledge",
uri="./local_db"
)
# Production: Scalable, battle-tested
prod_db = PgVector(
table_name="prod_knowledge",
db_url="postgresql+psycopg://user:pass@db:5432/knowledge"
)
Guidelines:
- LanceDB for development and testing (no setup required)
- PgVector for production (up to 1M documents, need SQL features)
- Pinecone for managed services (no ops overhead, auto-scaling)
2. Skip Already-Processed Files
The single biggest speed-up for re-running your ingestion:
# Skip files you've already processed
knowledge.add_content(
path="large_document.pdf",
skip_if_exists=True, # Don't reprocess existing files
upsert=False # Don't update existing
)
# For batch loading
knowledge.add_contents(
paths=["docs/", "policies/"],
skip_if_exists=True,
include=["*.pdf", "*.md"],
exclude=["*temp*", "*draft*"]
)
Narrow searches before vector comparison for faster, more accurate results:
# Slow: Search everything
results = knowledge.search("deployment process", max_results=10)
# Fast: Filter first, then search
results = knowledge.search(
query="deployment process",
max_results=10,
filters={"department": "engineering", "type": "procedure"}
)
# Validate your filters to catch typos
valid_filters, invalid_keys = knowledge.validate_filters({
"department": "engineering",
"invalid_key": "value" # This gets flagged
})
4. Match Chunking Strategy to Your Content
Different strategies have different performance characteristics:
Strategy | Speed | Quality | Best For |
---|
Fixed Size | Fast | Good | Uniform content, when speed matters |
Semantic | Slower | Best | Complex docs, when quality matters |
Recursive | Fast | Good | Structured docs, good balance |
from agno.knowledge.chunking.fixed import FixedSizeChunking
from agno.knowledge.chunking.semantic import SemanticChunking
# Fast processing for simple content
fast_chunking = FixedSizeChunking(
chunk_size=800,
overlap=80
)
# Better quality for complex content (but slower)
quality_chunking = SemanticChunking(
chunk_size=1200,
similarity_threshold=0.5
)
Learn more about choosing chunking strategies.
5. Use Async for Batch Operations
Process multiple items concurrently:
import asyncio
async def load_knowledge_efficiently():
# Load multiple content sources in parallel
tasks = [
knowledge.add_content_async(path="docs/hr/"),
knowledge.add_content_async(path="docs/engineering/"),
knowledge.add_content_async(url="https://company.com/api-docs"),
]
await asyncio.gather(*tasks)
asyncio.run(load_knowledge_efficiently())
Issue: Search Returns Irrelevant Results
What’s happening: Chunks are too large, too small, or chunking strategy doesn’t match your content.
Quick fixes:
- Check your chunking strategy - try semantic chunking for better context
- Verify content actually loaded:
knowledge.get_content_status(content_id)
- Increase
max_results
to see if relevant results are just ranked lower
- Add metadata filters to narrow the search scope
# Debug search quality
results = knowledge.search("your query", max_results=10)
if not results:
content_list, count = knowledge.get_content()
print(f"Total content items: {count}")
# Check for failed content
for content in content_list[:5]:
status, message = knowledge.get_content_status(content.id)
print(f"{content.name}: {status}")
Issue: Content Loading is Slow
What’s happening: Processing large files without batching, or using semantic chunking on huge datasets.
Quick fixes:
- Use
skip_if_exists=True
to avoid reprocessing
- Switch to fixed-size chunking for faster processing
- Process in batches instead of all at once
- Use file filters to only process what you need
# Batch processing for large datasets
import os
def load_content_in_batches(knowledge, content_dir, batch_size=10):
files = [f for f in os.listdir(content_dir) if f.endswith('.pdf')]
for i in range(0, len(files), batch_size):
batch_files = files[i:i+batch_size]
print(f"Processing batch {i//batch_size + 1}")
for file in batch_files:
knowledge.add_content(
path=os.path.join(content_dir, file),
skip_if_exists=True
)
Issue: Running Out of Memory
What’s happening: Loading too many large files at once, or chunk sizes are too large.
Quick fixes:
- Process content in smaller batches (see code above)
- Reduce chunk size in your chunking strategy
- Use
include
and exclude
patterns to limit what gets processed
- Clear old/outdated content regularly with
knowledge.remove_content_by_id()
# Process only what you need
knowledge.add_contents(
paths=["large_dataset/"],
include=["*.pdf"], # Only PDFs
exclude=["*backup*"], # Skip backups
skip_if_exists=True,
metadata={"batch": "current"}
)
Advanced Optimizations
Once you’ve applied the quick wins above, consider these for further improvements:
Use Hybrid Search
Combine vector and keyword search for better results:
from agno.vectordb.pgvector import PgVector, SearchType
vector_db = PgVector(
table_name="knowledge",
db_url="postgresql+psycopg://user:pass@localhost:5432/db",
search_type=SearchType.hybrid # Vector + keyword search
)
Add Reranking
Improve result quality by reranking with Cohere:
from agno.knowledge.reranker.cohere import CohereReranker
vector_db = PgVector(
table_name="knowledge",
db_url="postgresql+psycopg://user:pass@localhost:5432/db",
reranker=CohereReranker(
model="rerank-multilingual-v3.0",
top_n=10
)
)
Optimize Embedder Dimensions
Reduce dimensions for faster search (with slight quality trade-off):
from agno.knowledge.embedder.openai import OpenAIEmbedder
# Smaller dimensions = faster search, lower cost
embedder = OpenAIEmbedder(
id="text-embedding-3-large",
dimensions=1024 # Instead of full 3072
)
Keep an eye on these metrics:
# Check content processing status
content_list, total_count = knowledge.get_content()
failed = [c for c in content_list if c.status == "failed"]
if failed:
print(f"Failed items: {len(failed)}")
for content in failed:
status, message = knowledge.get_content_status(content.id)
print(f" {content.name}: {message}")
# Time your searches
import time
start = time.time()
results = knowledge.search("test query", max_results=5)
elapsed = time.time() - start
print(f"Search took {elapsed:.2f} seconds")
Next Steps
Start simple, optimize when needed. Agno’s defaults work well for most use cases. Profile your application to find actual bottlenecks before spending time on optimization.