Usage
Local Mode
You can load local models directly using the vLLM library, without any need to host a model on a server.vllm_embedder.py
Remote Mode
You can connect to a running vLLM server via an OpenAI-compatible API.vllm_embedder_remote.py
Params
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "intfloat/e5-mistral-7b-instruct" | Model identifier (HuggingFace model name) |
dimensions | int | 4096 | Embedding vector dimensions |
base_url | Optional[str] | None | Remote vLLM server URL (enables remote mode) |
api_key | Optional[str] | getenv("VLLM_API_KEY") | API key for remote server authentication |
enable_batch | bool | False | Enable batch processing for multiple texts |
batch_size | int | 10 | Number of texts to process per batch |
enforce_eager | bool | True | Use eager execution mode (local mode) |
vllm_kwargs | Optional[Dict[str, Any]] | None | Additional vLLM engine parameters (local mode) |
request_params | Optional[Dict[str, Any]] | None | Additional request parameters (remote mode) |
client_params | Optional[Dict[str, Any]] | None | OpenAI client configuration (remote mode) |
Developer Resources
- View Cookbook