from agno.agent import Agent
from agno.models.vllm import vLLM
agent = Agent(
model=vLLM(id="Qwen/Qwen2.5-7B-Instruct", top_k=20, enable_thinking=False),
markdown=True,
)
agent.print_response("Share a 2 sentence horror story", stream=True)
Create a virtual environment
Terminal
and create a python virtual environment.python3 -m venv .venv
source .venv/bin/activate
Install Libraries
pip install -U agno openai vllm
Start vLLM server
vllm serve Qwen/Qwen2.5-7B-Instruct \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--dtype float16 \
--max-model-len 8192 \
--gpu-memory-utilization 0.9
Run Agent
python cookbook/models/vllm/basic_stream.py
Was this page helpful?