import asyncio from agno.agent import Agent from agno.models.vllm import VLLM agent = Agent(model=VLLM(id="Qwen/Qwen2.5-7B-Instruct"), markdown=True) asyncio.run(agent.aprint_response("Share a 2 sentence horror story", stream=True))
Set up your virtual environment
uv venv --python 3.12 source .venv/bin/activate
Install Libraries
uv pip install -U agno openai vllm
Start vLLM server
vllm serve Qwen/Qwen2.5-7B-Instruct \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --dtype float16 \ --max-model-len 8192 \ --gpu-memory-utilization 0.9
Run Agent
python cookbook/11_models/vllm/async_basic_stream.py
Was this page helpful?