Cerebras

Cerebras Inference provides high-speed, low-latency AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. Agno integrates directly with the Cerebras Python SDK, allowing you to use state-of-the-art Llama models with a simple interface.

Prerequisites

To use Cerebras with Agno, you need to:

  1. Install the required packages:

    pip install cerebras-cloud-sdk
    
  2. Set your API key: The Cerebras SDK expects your API key to be available as an environment variable:

    export CEREBRAS_API_KEY=your_api_key_here
    

Basic Usage

Here’s how to use a Cerebras model with Agno:

from agno.agent import Agent
from agno.models.cerebras import Cerebras

agent = Agent(
    model=Cerebras(id="llama-4-scout-17b-16e-instruct"),
    markdown=True,
)

# Print the response in the terminal
agent.print_response("write a two sentence horror story")

Supported Models

Cerebras currently supports the following models (see docs for the latest list):

Model NameModel IDParametersKnowledge
Llama 4 Scoutllama-4-scout-17b-16e-instruct109 billionAugust 2024
Llama 3.1 8Bllama3.1-8b8 billionMarch 2023
Llama 3.3 70Bllama-3.3-70b70 billionDecember 2023
DeepSeek R1 Distill Llama 70B*deepseek-r1-distill-llama-70b70 billionDecember 2023

* DeepSeek R1 Distill Llama 70B is available in private preview.

Configuration Options

The Cerebras class accepts the following parameters:

ParameterTypeDescriptionDefault
idstrModel identifier (e.g., “llama-4-scout-17b-16e-instruct”)Required
namestrDisplay name for the model”Cerebras”
providerstrProvider name”Cerebras”
api_keyOptional[str]API key (falls back to CEREBRAS_API_KEY env var)None
max_tokensOptional[int]Maximum tokens in the responseNone
temperaturefloatSampling temperature0.7
top_pfloatTop-p sampling value1.0
request_paramsOptional[Dict[str, Any]]Additional request parametersNone

Resources

SDK Examples

  • View more examples here.