JinaEmbedder
class is used to embed text data into vectors using the Jina AI API. You can get started with Jina AI here.
Get your API key.
Usage
jina_embedder.py
Advanced Usage
Params
Parameter | Type | Default | Description |
---|---|---|---|
id | str | "jina-embeddings-v3" | The model ID to use for generating embeddings. |
dimensions | int | 1024 | The number of dimensions for the embedding vectors. |
embedding_type | Literal["float", "base64", "int8"] | "float" | The format type of the returned embeddings. |
late_chunking | bool | False | Whether to enable late chunking optimization. |
user | Optional[str] | None | User identifier for tracking purposes. Optional. |
api_key | Optional[str] | JINA_API_KEY env var | The Jina AI API key. Can be set via environment variable. |
base_url | str | "https://api.jina.ai/v1/embeddings" | The base URL for the Jina API. |
headers | Optional[Dict[str, str]] | None | Additional headers to include in API requests. Optional. |
request_params | Optional[Dict[str, Any]] | None | Additional parameters to include in the API request. Optional. |
timeout | Optional[float] | None | Timeout in seconds for API requests. Optional. |
enable_batch | bool | False | Enable batch processing to reduce API calls and avoid rate limits |
batch_size | int | 100 | Number of texts to process in each API call for batch operations. |
Features
- Async Support: Full async/await support for better performance in concurrent applications
- Batch Processing: Efficient batch processing of multiple texts with configurable batch size
- Late Chunking: Support for Jina’s late chunking optimization technique
- Flexible Output: Multiple embedding formats (float, base64, int8)
- Usage Tracking: Get detailed usage information for API calls
- Error Handling: Robust error handling with fallback mechanisms