Skip to main content
The vLLM Embedder provides high-performance embedding inference with support for both local and remote deployment modes. It can load models directly for local inference or connect to a remote vLLM server via an OpenAI-compatible API.

Usage

from agno.knowledge.embedder.vllm import VLLMEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector

# Local mode
embedder = VLLMEmbedder(
    id="intfloat/e5-mistral-7b-instruct",
    dimensions=4096,
    enforce_eager=True,
    vllm_kwargs={
        "disable_sliding_window": True,
        "max_model_len": 4096,
    },
)

# Use with Knowledge
knowledge = Knowledge(
    vector_db=PgVector(
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
        table_name="vllm_embeddings",
        embedder=embedder,
    ),
)

Parameters

ParameterTypeDefaultDescription
idstr"intfloat/e5-mistral-7b-instruct"Model identifier (HuggingFace model name)
dimensionsint4096Embedding vector dimensions
base_urlOptional[str]NoneRemote vLLM server URL (enables remote mode)
api_keyOptional[str]getenv("VLLM_API_KEY")API key for remote server authentication
enable_batchboolFalseEnable batch processing for multiple texts
batch_sizeint10Number of texts to process per batch
enforce_eagerboolTrueUse eager execution mode (local mode)
vllm_kwargsOptional[Dict[str, Any]]NoneAdditional vLLM engine parameters (local mode)
request_paramsOptional[Dict[str, Any]]NoneAdditional request parameters (remote mode)
client_paramsOptional[Dict[str, Any]]NoneOpenAI client configuration (remote mode)

Developer Resources