Run Large Language Models locally with Ollama

Ollama is a fantastic tool for running models locally.

Ollama supports multiple open-source models. See the library here.

We recommend experimenting to find the best-suited model for your use-case. Here are some general recommendations:

  • llama3.3 models are good for most basic use-cases.
  • qwen models perform specifically well with tool use.
  • deepseek-r1 models have strong reasoning capabilities.
  • phi4 models are powerful, while being really small in size.

Set up a model

Install ollama and run a model using

run model
ollama run llama3.1

This gives you an interactive session with the model.

Alternatively, to download the model to be used in an Agno agent

pull model
ollama pull llama3.1

Example

After you have the model locally, use the Ollama model class to access it

View more examples here.

Params

ParameterTypeDefaultDescription
idstr"llama3.1"The ID of the model to use.
namestr"Ollama"The name of the model.
providerstr"Ollama"The provider of the model.
formatOptional[Any]NoneThe format of the response.
optionsOptional[Any]NoneAdditional options to pass to the model.
keep_aliveOptional[Union[float, str]]NoneThe keep alive time for the model.
request_paramsOptional[Dict[str, Any]]NoneAdditional parameters to pass to the request.
hostOptional[str]NoneThe host to connect to.
timeoutOptional[Any]NoneThe timeout for the connection.
client_paramsOptional[Dict[str, Any]]NoneAdditional parameters to pass to the client.
clientOptional[OllamaClient]NoneA pre-configured instance of the Ollama client.
async_clientOptional[AsyncOllamaClient]NoneA pre-configured instance of the asynchronous Ollama client.
structured_outputsboolFalseWhether to use the structured outputs with this Model.
supports_structured_outputsboolTrueWhether the Model supports structured outputs.

Ollama is a subclass of the Model class and has access to the same params.