Skip to main content
The vLLM Embedder provides high-performance embedding inference with support for both local and remote deployment modes. All models are downloaded from HuggingFace.

Usage

Local Mode

You can load local models directly using the vLLM library, without any need to host a model on a server.
vllm_embedder.py
from agno.knowledge.embedder.vllm import VLLMEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector

# Get embeddings directly
embeddings = VLLMEmbedder(
    id="intfloat/e5-mistral-7b-instruct",
    dimensions=4096,
    enforce_eager=True,
    vllm_kwargs={
        "disable_sliding_window": True,
        "max_model_len": 4096,
    },
).get_embedding("The quick brown fox jumps over the lazy dog.")

print(f"Embeddings: {embeddings[:5]}")
print(f"Dimensions: {len(embeddings)}")

# Use with Knowledge
knowledge = Knowledge(
    vector_db=PgVector(
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
        table_name="vllm_embeddings",
        embedder=VLLMEmbedder(
            id="intfloat/e5-mistral-7b-instruct",
            dimensions=4096,
            enforce_eager=True,
            vllm_kwargs={
                "disable_sliding_window": True,
                "max_model_len": 4096,
            },
        ),
    ),
    max_results=2,
)

Remote Mode

You can connect to a running vLLM server via an OpenAI-compatible API.
vllm_embedder_remote.py
# Remote mode (for production deployments)
knowledge_remote = Knowledge(
    vector_db=PgVector(
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
        table_name="vllm_embeddings_remote",
        embedder=VLLMEmbedder(
            id="intfloat/e5-mistral-7b-instruct",
            dimensions=4096,
            base_url="http://localhost:8000/v1",  # Example endpoint for local development
            api_key="your-api-key",  # Optional
        ),
    ),
    max_results=2,
)

Params

ParameterTypeDefaultDescription
idstr"intfloat/e5-mistral-7b-instruct"Model identifier (HuggingFace model name)
dimensionsint4096Embedding vector dimensions
base_urlOptional[str]NoneRemote vLLM server URL (enables remote mode)
api_keyOptional[str]getenv("VLLM_API_KEY")API key for remote server authentication
enable_batchboolFalseEnable batch processing for multiple texts
batch_sizeint10Number of texts to process per batch
enforce_eagerboolTrueUse eager execution mode (local mode)
vllm_kwargsOptional[Dict[str, Any]]NoneAdditional vLLM engine parameters (local mode)
request_paramsOptional[Dict[str, Any]]NoneAdditional request parameters (remote mode)
client_paramsOptional[Dict[str, Any]]NoneOpenAI client configuration (remote mode)

Developer Resources