The JinaEmbedder class is used to embed text data into vectors using the Jina AI API. Jina provides high-quality embeddings with support for different embedding types and late chunking for improved processing of long documents. Get your API key from here.

Setup

Set your JINA_API_KEY environment variable.
export JINA_API_KEY="xxx"

Run PgVector

docker run - d \
    - e POSTGRES_DB = ai \
    - e POSTGRES_USER = ai \
    - e POSTGRES_PASSWORD = ai \
    - e PGDATA = /var/lib/postgresql/data/pgdata \
    - v pgvolume: / var/lib/postgresql/data \
    - p 5532: 5432 \
    - -name pgvector \
    agnohq/pgvector: 16

Usage

cookbook/embedders/jina_embedder.py
from agno.agent import AgentKnowledge
from agno.vectordb.pgvector import PgVector
from agno.embedder.jina import JinaEmbedder

# Basic usage - automatically loads from JINA_API_KEY environment variable
embeddings = JinaEmbedder().get_embedding(
    "The quick brown fox jumps over the lazy dog."
)

# Print the embeddings and their dimensions
print(f"Embeddings: {embeddings[:5]}")
print(f"Dimensions: {len(embeddings)}")

# Custom configuration with late chunking for long documents
custom_embedder = JinaEmbedder(
    dimensions=1024,
    late_chunking=True,  # Improved processing for long documents
    timeout=30.0,  # Request timeout in seconds
)

# Get embedding with usage information
embedding, usage = custom_embedder.get_embedding_and_usage(
    "Advanced text processing with Jina embeddings and late chunking."
)
print(f"Embedding dimensions: {len(embedding)}")
if usage:
    print(f"Usage info: {usage}")

# Use an embedder in a knowledge base
knowledge_base = AgentKnowledge(
    vector_db=PgVector(
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
        table_name="jina_embeddings",
        embedder=JinaEmbedder(
            late_chunking=True,  # Better handling of long documents
            timeout=30.0,  # Configure request timeout
        ),
    ),
    num_documents=2,
)

Params

ParameterTypeDefaultDescription
idstr"jina-embeddings-v3"The model ID used for generating embeddings.
dimensionsint1024The dimensionality of the embeddings generated by the model.
embedding_typeLiteral['float', 'base64', 'int8']"float"The format in which the embeddings are encoded. Options are “float”, “base64”, or “int8”.
late_chunkingboolFalseWhether to use late chunking for improved processing of long documents.
userOptional[str]-The user associated with the API request.
api_keyOptional[str]-The API key used for authenticating requests.
base_urlstr"https://api.jina.ai/v1/embeddings"The base URL for the API endpoint.
headersOptional[Dict[str, str]]-Additional headers to include in the API request.
request_paramsOptional[Dict[str, Any]]-Additional parameters to include in the API request.
timeoutOptional[float]-Request timeout in seconds.

Developer Resources