AI tools like chatbots and content generators are everywhere. But usually, they run online using cloud services. What if you could run those smart AI models directly on your own computer, just like running a regular app? That’s what Ollama helps you do.
In this blog, you’ll learn how to set it up, use it in different ways (like with terminal, code, or API), change some basic settings, and know what it can and can't do.
Ollama is a software that allows you to use large, powerful AI models on your own machine without having to rely on the internet. It handles downloading and running the models and allows you to chat with them or get responses from them, much like ChatGPT.
You can interact with it through simple commands, programming (Python, etc.) or with other API tools. It’s awesome for testing AI that you have locally, doing your projects, testing things, private and easy.
Ollama is aimed at developers and AI enthusiasts who are interested in running large language models locally, so they have more control and flexibility, and less dependency on cloud services.
Installing Ollama is quick and straightforward. It works on macOS, Windows, and Linux.
brew install ollama
Go to https://ollama.com/download and choose the right version for your operating system.
After installation, open a terminal and test it by running:
ollama --version
If the version number appears, Ollama is successfully installed and ready to use.
Experience seamless collaboration and exceptional results.
Ollama provides a simple command-line interface to help you manage and interact with language models on your local machine. Below are some of the most commonly used commands:
Command | Description |
ollama run <model> | Starts and runs a specified model |
ollama list | Displays a list of installed models |
ollama pull <model> | Downloads a model from the Ollama library |
ollama create <name> -f Modelfile | Creates a custom model using a Modelfile |
ollama serve | Starts the Ollama API server |
ollama stop <model> | Stops a running model |
These commands form the foundation of how you interact with Ollama through the terminal, making it easy to manage and use models locally.
Ollama provides a RESTful api to be able to access and play with models through code. These are endpoints that allow you to process text, manage models, generate embeddings, and more, all of which work locally on your machine
Method | Endpoint | Purpose |
POST | /api/generate | Text generation |
POST | /api/chat | Chat-style message handling |
POST | /api/create | Create custom models |
GET | /api/tags | List available (installed) models |
DELETE | /api/delete | Delete a model |
POST | /api/pull | Download a model |
POST | /api/push | Upload a model |
POST | /api/embed | Generate embeddings |
GET | /api/ps | List running model processes |
POST | /api/embeddings | OpenAI-compatible embedding endpoint |
GET | /api/version | Get the current Ollama version |
These endpoints will allow you to easily incorporate Ollama into your apps, tools, or workflows with just a few HTTP requests.
Suggested Reads- What are Embedding Models in Machine Learning?
Before using Ollama through code or APIs, you first need to install and run a supported model. Here's how to get started:
ollama pull llama3.2
ollama run llama3.2
Now you can start chatting with the model directly in the terminal.
If you want to use Python without any third-party SDKs like OpenAI's, you can make direct HTTP requests to Ollama's local server. Here’s how to do it:
import requests
import json
url = "http://localhost:11434/api/generate"
payload = {
"model": "llama3.2",
"prompt": "What is LLM?",
"temperature": 0.2,
"top_p": 0.7,
"top_k": 30,
"repeat_penalty": 1.1,
"max_tokens": 100,
"stream": False
}
response = requests.post(url, json=payload)
if response.status_code == 200:
result = response.json()
print(result["response"])
else:
print("Error:", response.status_code, response.text)
Experience seamless collaboration and exceptional results.
This configuration allows you to use Ollama as a local LLM server and test a variety of model behaviors with actual API calls. In your script if you don't want to use the default AI model name and warmup settings, you can modify the name or parameters such as temperature, top_p, repeat_penalty, etc., in the script to affect the trivia model behaviour.
Ollama is designed to follow the OpenAI API format (like ChatGPT).This means you can use Ollama as a local drop-in replacement in apps or tools that were originally built for those services without changing much of your code.
In this blog we’ll explore how to use OpenAI-compatible code and tools, since it's one of the most widely supported and easiest ways to get started with local LLMs.
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1',
api_key='ollama' # required, but not actually used
)
response = client.chat.completions.create(
model="llama2",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
temperature=0.7,
top_p=0.9,
top_k=40,
repeat_penalty=1.1,
num_ctx=2048,
max_tokens=1000
)
print(response.choices[0].message.content)
Make sure the model is installed and running:
ollama run llama3.2
While Ollama is a powerful tool for running local LLMs, it does come with a few limitations to keep in mind:
These limitations don’t affect most local development or testing needs, but they’re important to be aware of depending on your use case.
Ollama is an easy-to-deploy, high-performing software to run AI language models locally on your desktop or server without requiring the cloud. It has varied use cases such as on the command line, via APIs, or as code and has the same structure as the widely used tools like ChatGPT.
Though it might require a little additional memory, it’s not compatible with every advanced feature, but it’s good for learning, testing, and creating local AI projects. Ollama is a good place to begin if you prefer privacy, control, and offline access to AI