Running Models with Ollama: A Step-by-Step Guide
Ollama-Powered Python Applications

Introduction
If you're looking for an easy way to test large language models (LLMs) without setting up a full infrastructure, Ollama is a great option. This guide walks you through installing, running, and integrating models using Ollama, focusing on Meta's Llama2 models and other AI models.
What is Ollama?
Ollama is an open-source tool designed for seamless integration with language models, whether locally or on a server. It provides a cost-effective alternative to commercial APIs, especially since Meta's Llama2 models are available for commercial use. This makes Ollama an excellent choice for further training on custom datasets.
Key Links:
GitHub Repository: Ollama on GitHub
Official Website: Ollama
Installing Ollama on Windows
Ollama is compatible with Windows, Mac, and Linux. This section provides installation steps specifically for Windows 10.
Steps to Install:
Download the Installer: Ollama Download
Install Ollama: Follow the on-screen instructions.
Change Model Save Path (Optional):
If space is limited on your
C:drive, change the default model directory by setting an environment variable:Navigate to Advanced system settings > Environment Variables.
Create a new variable:
Variable:
OLLAMA_MODELSValue:
D:\your_directory\models
Launch Ollama: If not automatically running, search for Ollama in Windows programs and launch it.
Running Ollama from Command Line (CMD)
Once installed, Ollama can be used via the command line.
Key Commands:
Check available models:
ollama listGet details of a model:
ollama show --modelfile llama2:7bRemove a model:
ollama rm llama2:7bStart the server:
ollama serveAccess local dashboard:
http://localhost:11434/api/
Downloading Models Locally
Ollama provides access to various models through its model library. Before downloading a model, ensure your system has enough memory.
Recommended Models:
Llama2 Models (Meta)
Standard:
ollama pull llama2Uncensored:
ollama pull llama2-uncensored:7bChat Model:
ollama pull llama2:7b-chat
Gemma (Google)
ollama pull gemma:7b
LLaVa (Multimodal Model)
ollama pull llava
Choosing Models for Different Purposes
Ollama offers models for various tasks, including:
Conversational AI: Llama2, Gemma, Falcon, OpenChat
Multimodal Models (Image + Text): LLaVa, Bakllava
Coding & Developer Support: CodeLlama, Dolphin-Mistral, Dolphin-Mixtral
Running Models in Command Line
To run a model with a prompt, use:
ollama run llama2:7b "your prompt"Multimodal models allow additional inputs, such as image files or paths.
CPU-Friendly Quantized Models
Quantization reduces model size and enables LLMs to run on less powerful hardware without significant accuracy loss.
Further Reading on Quantization:
Ollama supports quantized models natively, simplifying their implementation.
Integrating Models from Other Sources
If your required model is not available in Ollama’s library, you can integrate external models (e.g., from HuggingFace).
Example: Adding a Custom Model
Download a quantized model (e.g., Medicine Chat GGUF).
Create a file named
Modelfilewith the following content:
FROM D:\...\medicine-chat.Q4_0.gguf
# PARAMETER temperature 0.6
# SYSTEM "You are a helpful medicine assistant."Run the command:
ollama create model_name -f Modelfile
Ollama-Powered Python Applications
Ollama runs in the background as a REST API, making it easy to integrate with applications using Python frameworks like FastAPI, Flask, and Django.
Installation:
pip install ollamaGenerating Embeddings in Python:
import ollama
embedding = ollama.embeddings(model="llama2:7b", prompt="Hello Ollama!")Using cURL for API Calls:
curl http://localhost:11434/api/embeddings -d '{
"model": "llama2:7b",
"prompt": "Here is an article about llamas..."
}'Integration with LangChain:
from langchain_community.embeddings import OllamaEmbeddings
embed = OllamaEmbeddings(model="llama2:7b")
embedding = embed.embed_query("Hello Ollama!")For API documentation, visit: Ollama API
Conclusion
This guide provides a step-by-step overview of using Ollama to run language models locally, avoiding costly API subscriptions. It covers installation, running models, downloading different model types, leveraging quantized models, and integrating external models. Additionally, Ollama-powered Python applications simplify AI development workflows, making it easier for developers to deploy AI solutions efficiently.

