Skip to main content
Embedders turn text, images, and other data into vectors (lists of numbers) that capture meaning. Those vectors make it easy to store and search information semantically—so you find content by intent and context, not just exact keywords. If you’re building features like retrieval-augmented generation (RAG), semantic search, question answering over docs, or long-term memory for agents, embedders are the foundation that makes it all work.

Why use embedders?

  • Better recall than keywords: They understand meaning, so “How do I reset my passcode?” finds docs mentioning “change PIN”.
  • Ground LLMs in your data: Provide the model with trusted, domain-specific context at answer time.
  • Scale to large knowledge bases: Vectors enable fast similarity search across thousands or millions of chunks.
  • Multilingual retrieval: Many embedders map different languages to the same semantic space.

When to use embedders

Use embedders when you need any of the following:
  • RAG and context injection: Supply relevant snippets to your agent before responding.
  • Semantic search: Let users query by meaning across product docs, wikis, tickets, or chats.
  • Deduplication and clustering: Group similar content or avoid repeating the same info.
  • Personal and team memory: Store summaries and facts for later recall by agents.
You probably don’t need embedders when your dataset is tiny (a handful of pages) and simple keyword search already works well.

How it works in Agno

Agno uses OpenAIEmbedder as the default, but you can swap in any supported embedder. When you add content to a knowledge base, the embedder converts each chunk into a vector and stores it in your vector database. Later, when an agent searches, it embeds the query and finds the most similar vectors. Here’s a basic setup:
from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector
from agno.knowledge.embedder.openai import OpenAIEmbedder

# Create knowledge base with embedder
knowledge = Knowledge(
    vector_db=PgVector(
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
        table_name="my_embeddings",
        embedder=OpenAIEmbedder(),  # Default embedder
    ),
    max_results=2,  # Return top 2 most relevant chunks
)

# Add content - gets embedded automatically
knowledge.add_content(
    text_content="The sky is blue during the day and dark at night."
)

# Agent can now search this knowledge
agent = Agent(knowledge=knowledge, search_knowledge=True)
agent.print_response("What color is the sky?")

Choosing an embedder

Pick based on your constraints:
  • Hosted vs local: Prefer local (e.g., Ollama, FastEmbed) for offline or strict data residency; hosted (OpenAI, Gemini, Voyage) for best quality and convenience.
  • Latency and cost: Smaller models are cheaper/faster; larger models often retrieve better.
  • Language support: Ensure your embedder supports the languages you expect.
  • Dimension compatibility: Match your vector DB’s expected embedding size if it’s fixed.

Quick Comparison

EmbedderTypeBest ForCostPerformance
OpenAIHostedGeneral use, proven quality$$Excellent
OllamaLocalPrivacy, offline, no API costsFreeGood
Voyage AIHostedSpecialized retrieval tasks$$$Excellent
GeminiHostedGoogle ecosystem, multilingual$$Excellent
FastEmbedLocalFast local embeddingsFreeGood
HuggingFaceLocal/HostedOpen source models, customizationFree/$Variable

Supported embedders

The following embedders are supported:

Best Practices

Chunk your content wisely: Split long docs into 300–1,000 token chunks with 10-20% overlap. This balances context preservation with retrieval precision.
Store rich metadata: Include titles, source URLs, timestamps, and permissions with each chunk. This enables filtering and better context in responses.
Test your retrieval quality: Use a small set of test queries to evaluate if you’re finding the right chunks. Adjust chunking strategy and embedder if needed.
Re-embed when you change models: If you switch embedders, you must re-embed all your content. Vectors from different models aren’t compatible.

Batch Embeddings

Many embedding providers support processing multiple texts in a single API call, known as batch embedding. This approach offers several advantages: it reduces the number of API requests, helps avoid rate limits, and significantly improves performance when processing large amounts of text. To enable batch processing, set the enable_batch flag to True when configuring your embedder. The batch_size paramater can be used to control the amount of texts sent per batch.
from agno.knowledge.embedder.openai import OpenAIEmbedder

embedder=OpenAIEmbedder(
    id="text-embedding-3-small",
    dimensions=1536,
    enable_batch=True,
    batch_size=100
)
The following embedders currently support batching:
I