Async Agent with Streaming

Code
Usage

Code

cookbook/11_models/vllm/async_basic_stream.py

import asyncio

from agno.agent import Agent
from agno.models.vllm import VLLM

agent = Agent(model=VLLM(id="Qwen/Qwen2.5-7B-Instruct"), markdown=True)
asyncio.run(agent.aprint_response("Share a 2 sentence horror story", stream=True))

Usage

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate

Install Libraries

uv pip install -U agno openai vllm

Start vLLM server

vllm serve Qwen/Qwen2.5-7B-Instruct \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --dtype float16 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.9

Run Agent

python cookbook/11_models/vllm/async_basic_stream.py

Async Agent Code Generation

⌘I

Get Started

Basics

Advanced

Production

Providers

Other

Additional Resources

Async Agent with Streaming

Code

Usage

Get Started

Basics

Advanced

Production

Providers

Other

Additional Resources

​Code

​Usage

Code

Usage