Running Models with Ollama: A Step-by-Step Guide

Ollama-Powered Python Applications

Feb 05, 2025

Introduction

If you're looking for an easy way to test large language models (LLMs) without setting up a full infrastructure, Ollama is a great option. This guide walks you through installing, running, and integrating models using Ollama, focusing on Meta's Llama2 models and other AI models.

What is Ollama?

Ollama is an open-source tool designed for seamless integration with language models, whether locally or on a server. It provides a cost-effective alternative to commercial APIs, especially since Meta's Llama2 models are available for commercial use. This makes Ollama an excellent choice for further training on custom datasets.

Key Links:

GitHub Repository: Ollama on GitHub
Official Website: Ollama

Installing Ollama on Windows

Ollama is compatible with Windows, Mac, and Linux. This section provides installation steps specifically for Windows 10.

Steps to Install:

Download the Installer: Ollama Download
Install Ollama: Follow the on-screen instructions.
Change Model Save Path (Optional):
- If space is limited on your C: drive, change the default model directory by setting an environment variable:
  - Navigate to Advanced system settings > Environment Variables.
  - Create a new variable:
    - Variable: OLLAMA_MODELS
    - Value: D:\your_directory\models
Launch Ollama: If not automatically running, search for Ollama in Windows programs and launch it.

Running Ollama from Command Line (CMD)

Once installed, Ollama can be used via the command line.

Key Commands:

Check available models: ollama list
Get details of a model: ollama show --modelfile llama2:7b
Remove a model: ollama rm llama2:7b
Start the server: ollama serve
Access local dashboard: http://localhost:11434/api/

Downloading Models Locally

Ollama provides access to various models through its model library. Before downloading a model, ensure your system has enough memory.

Recommended Models:

Llama2 Models (Meta)

Standard: ollama pull llama2
Uncensored: ollama pull llama2-uncensored:7b
Chat Model: ollama pull llama2:7b-chat

Gemma (Google)

ollama pull gemma:7b
More Info

LLaVa (Multimodal Model)

ollama pull llava
More Info

Choosing Models for Different Purposes

Ollama offers models for various tasks, including:

Conversational AI: Llama2, Gemma, Falcon, OpenChat
Multimodal Models (Image + Text): LLaVa, Bakllava
Coding & Developer Support: CodeLlama, Dolphin-Mistral, Dolphin-Mixtral

Running Models in Command Line

To run a model with a prompt, use:

ollama run llama2:7b "your prompt"

Multimodal models allow additional inputs, such as image files or paths.

CPU-Friendly Quantized Models

Quantization reduces model size and enables LLMs to run on less powerful hardware without significant accuracy loss.

Integrating Models from Other Sources

If your required model is not available in Ollama’s library, you can integrate external models (e.g., from HuggingFace).

Example: Adding a Custom Model

Download a quantized model (e.g., Medicine Chat GGUF).
Create a file named Modelfile with the following content:

FROM D:\...\medicine-chat.Q4_0.gguf
# PARAMETER temperature 0.6
# SYSTEM "You are a helpful medicine assistant."

Run the command: ollama create model_name -f Modelfile

Ollama-Powered Python Applications

Ollama runs in the background as a REST API, making it easy to integrate with applications using Python frameworks like FastAPI, Flask, and Django.

Installation:

pip install ollama

Generating Embeddings in Python:

import ollama
embedding = ollama.embeddings(model="llama2:7b", prompt="Hello Ollama!")

Using cURL for API Calls:

curl http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'

Integration with LangChain:

from langchain_community.embeddings import OllamaEmbeddings

embed = OllamaEmbeddings(model="llama2:7b")
embedding = embed.embed_query("Hello Ollama!")

For API documentation, visit: Ollama API

Conclusion

This guide provides a step-by-step overview of using Ollama to run language models locally, avoiding costly API subscriptions. It covers installation, running models, downloading different model types, leveraging quantized models, and integrating external models. Additionally, Ollama-powered Python applications simplify AI development workflows, making it easier for developers to deploy AI solutions efficiently.

Featurepreneur’s Substack

Discussion about this post

Ready for more?

Featurepreneur’s Substack

Running Models with Ollama: A Step-by-Step Guide

Ollama-Powered Python Applications

Introduction

What is Ollama?

Key Links:

Installing Ollama on Windows

Steps to Install:

Running Ollama from Command Line (CMD)

Key Commands:

Downloading Models Locally

Recommended Models:

Llama2 Models (Meta)

Gemma (Google)

LLaVa (Multimodal Model)

Choosing Models for Different Purposes

Running Models in Command Line

CPU-Friendly Quantized Models

Further Reading on Quantization:

Integrating Models from Other Sources

Example: Adding a Custom Model

Ollama-Powered Python Applications

Installation:

Generating Embeddings in Python:

Using cURL for API Calls:

Integration with LangChain:

Conclusion

Discussion about this post

Ready for more?