Implementing a Custom Retriever

In some cases, you may need complete control over how your agent retrieves information from the knowledge base. This can be achieved by implementing a custom retriever function. A custom retriever allows you to define the logic for searching and retrieving documents from your vector database.

Setup

Follow the instructions in the Qdrant Setup Guide to install Qdrant locally. Here is a guide to get API keys: Qdrant API Keys.

Example: Custom Retriever for a PDF Knowledge Base

Below is a detailed example of how to implement a custom retriever function using the agno library. This example demonstrates how to set up a knowledge base with PDF documents, define a custom retriever, and use it with an agent.

from typing import Optional
from agno.agent import Agent
from agno.embedder.openai import OpenAIEmbedder
from agno.knowledge.pdf_url import PDFUrlKnowledgeBase
from agno.vectordb.qdrant import Qdrant
from qdrant_client import QdrantClient

# ---------------------------------------------------------
# This section loads the knowledge base. Skip if your knowledge base was populated elsewhere.
# Define the embedder
embedder = OpenAIEmbedder(id="text-embedding-3-small")
# Initialize vector database connection
vector_db = Qdrant(collection="thai-recipes", url="http://localhost:6333", embedder=embedder)
# Load the knowledge base
knowledge_base = PDFUrlKnowledgeBase(
    urls=["https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
    vector_db=vector_db,
)

# Load the knowledge base
knowledge_base.load(recreate=True)  # Comment out after first run
# Knowledge base is now loaded
# ---------------------------------------------------------

# Define the custom retriever
# This is the function that the agent will use to retrieve documents
def retriever(
    query: str, agent: Optional[Agent] = None, num_documents: int = 5, **kwargs
) -> Optional[list[dict]]:
    """
    Custom retriever function to search the vector database for relevant documents.

    Args:
        query (str): The search query string
        agent (Agent): The agent instance making the query
        num_documents (int): Number of documents to retrieve (default: 5)
        **kwargs: Additional keyword arguments

    Returns:
        Optional[list[dict]]: List of retrieved documents or None if search fails
    """
    try:
        qdrant_client = QdrantClient(url="http://localhost:6333")
        query_embedding = embedder.get_embedding(query)
        results = qdrant_client.query_points(
            collection_name="thai-recipes",
            query=query_embedding,
            limit=num_documents,
        )
        results_dict = results.model_dump()
        if "points" in results_dict:
            return results_dict["points"]
        else:
            return None
    except Exception as e:
        print(f"Error during vector database search: {str(e)}")
        return None

def main():
    """Main function to demonstrate agent usage."""
    # Initialize agent with custom retriever
    # Remember to set search_knowledge=True to use agentic_rag or add_reference=True for traditional RAG
    # search_knowledge=True is default when you add a knowledge base but is needed here
    agent = Agent(
        retriever=retriever,
        search_knowledge=True,
        instructions="Search the knowledge base for information",
        show_tool_calls=True,
    )

    # Example query
    query = "List down the ingredients to make Massaman Gai"
    agent.print_response(query, markdown=True)

if __name__ == "__main__":
    main()

Asynchronous Implementation

import asyncio
from typing import Optional
from agno.agent import Agent
from agno.embedder.openai import OpenAIEmbedder
from agno.knowledge.pdf_url import PDFUrlKnowledgeBase
from agno.vectordb.qdrant import Qdrant
from qdrant_client import AsyncQdrantClient

# ---------------------------------------------------------
# Knowledge base setup (same as synchronous example)
embedder = OpenAIEmbedder(id="text-embedding-3-small")
vector_db = Qdrant(collection="thai-recipes", url="http://localhost:6333", embedder=embedder)
knowledge_base = PDFUrlKnowledgeBase(
    urls=["https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
    vector_db=vector_db,
)
# ---------------------------------------------------------

# Define the custom async retriever
async def retriever(
    query: str, agent: Optional[Agent] = None, num_documents: int = 5, **kwargs
) -> Optional[list[dict]]:
    """
    Custom async retriever function to search the vector database for relevant documents.
    """
    try:
        qdrant_client = AsyncQdrantClient(path="tmp/qdrant")
        query_embedding = embedder.get_embedding(query)
        results = await qdrant_client.query_points(
            collection_name="thai-recipes",
            query=query_embedding,
            limit=num_documents,
        )
        results_dict = results.model_dump()
        if "points" in results_dict:
            return results_dict["points"]
        else:
            return None
    except Exception as e:
        print(f"Error during vector database search: {str(e)}")
        return None

async def main():
    """Async main function to demonstrate agent usage."""
    agent = Agent(
        retriever=retriever,
        search_knowledge=True,
        instructions="Search the knowledge base for information",
        show_tool_calls=True,
    )

    # Load the knowledge base (uncomment for first run)
    await knowledge_base.aload(recreate=True)

    # Example query
    query = "List down the ingredients to make Massaman Gai"
    await agent.aprint_response(query, markdown=True)

if __name__ == "__main__":
    asyncio.run(main())

Explanation

Embedder and Vector Database Setup: We start by defining an embedder and initializing a connection to a vector database. This setup is crucial for converting queries into embeddings and storing them in the database.
Loading the Knowledge Base: The knowledge base is loaded with PDF documents. This step involves converting the documents into embeddings and storing them in the vector database.
Custom Retriever Function: The retriever function is defined to handle the retrieval of documents. It takes a query, converts it into an embedding, and searches the vector database for relevant documents.
Agent Initialization: An agent is initialized with the custom retriever. The agent uses this retriever to search the knowledge base and retrieve information.
Example Query: The main function demonstrates how to use the agent to perform a query and retrieve information from the knowledge base.

Developer Resources

View Sync Retriever
View Async Retriever

Introduction

Concepts

Other

How to

Implementing a Custom Retriever

Setup

Example: Custom Retriever for a PDF Knowledge Base

Asynchronous Implementation

Explanation

Developer Resources

Introduction

Concepts

Other

How to

​Setup

​Example: Custom Retriever for a PDF Knowledge Base

​Asynchronous Implementation

​Explanation

​Developer Resources

Setup

Example: Custom Retriever for a PDF Knowledge Base

Asynchronous Implementation

Explanation

Developer Resources