Embeddings Explained: What They Are and How to Use Them with Python and Ollama (2026)

Embeddings are one of the most useful building blocks in modern AI applications, yet they are often explained poorly. This guide starts from zero — what an embedding actually is, why it works, and how to put it to use in Python using Ollama (free, local) or OpenAI.

By the end you will have working code for semantic search, duplicate detection, and storing embeddings in PostgreSQL with pgvector.

What Are Embeddings?

An embedding is a list of numbers (a vector) that represents the meaning of a piece of text. Texts with similar meanings get similar vectors.

"Python is a programming language"  → [0.12, -0.45, 0.89, 0.03, ...]
"Python is used for coding"         → [0.11, -0.43, 0.91, 0.02, ...]  # very similar
"The Eiffel Tower is in Paris"      → [-0.67, 0.23, -0.12, 0.78, ...] # very different

A typical embedding model produces a vector with hundreds or thousands of dimensions. You never interpret individual numbers — what matters is the geometric relationship between vectors. Texts that mean similar things cluster together in that high-dimensional space.

Use cases:

Semantic search (find similar documents)
Recommendation systems
Duplicate detection
Text classification
RAG (Retrieval-Augmented Generation)

Embeddings are not the same as word count or TF-IDF. Those methods care about which words appear; embeddings care about meaning. "automobile" and "car" are very different strings, but their embeddings will be nearly identical.

Generate Embeddings with Ollama (Free, Local)

Ollama runs open-source models on your machine. The nomic-embed-text model is fast, produces 768-dimensional vectors, and requires no API key.

ollama pull nomic-embed-text  # best embedding model for Ollama

Call it via HTTP:

import requests

def embed(text: str) -> list[float]:
    response = requests.post(
        "http://localhost:11434/api/embeddings",
        json={"model": "nomic-embed-text", "prompt": text}
    )
    return response.json()["embedding"]

vector = embed("Python is a programming language")
print(f"Dimensions: {len(vector)}")  # 768
print(f"First 5 values: {vector[:5]}")

Or Use the Official Ollama Python Library

pip install ollama

import ollama

response = ollama.embeddings(
    model="nomic-embed-text",
    prompt="Python is a programming language"
)
vector = response["embedding"]

The library handles connection pooling and is cleaner for production code. Under the hood it calls the same local HTTP API.

Generate with OpenAI

If you prefer a cloud model or need higher-quality embeddings for a critical application:

from openai import OpenAI

client = OpenAI()

def embed_openai(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",  # 1536 dimensions, very cheap
        input=text
    )
    return response.data[0].embedding

text-embedding-3-small costs roughly $0.02 per million tokens — extremely cheap. text-embedding-3-large (3072 dimensions) is more accurate for tasks where quality matters more than cost.

The rest of this guide works with either provider. Just swap embed() for embed_openai() and adjust the vector dimension where needed.

Cosine Similarity: Measure How Similar Two Texts Are

The standard way to compare two embeddings is cosine similarity. It measures the angle between two vectors: 1.0 means identical direction (same meaning), 0.0 means perpendicular (unrelated), and -1.0 means opposite.

import numpy as np

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a, b = np.array(a), np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

v1 = embed("Python is a programming language")
v2 = embed("Python is used for coding")
v3 = embed("The Eiffel Tower is in Paris")

print(cosine_similarity(v1, v2))  # ~0.95 (very similar)
print(cosine_similarity(v1, v3))  # ~0.20 (unrelated)

Install NumPy if needed:

pip install numpy

Cosine similarity is preferred over Euclidean distance for text embeddings because it is not affected by the length of the text — a short sentence and a long paragraph about the same topic will still score high.

Semantic Search: Find Similar Documents

With a small document set, you can do semantic search entirely in memory using NumPy:

import numpy as np

documents = [
    "Python is a programming language",
    "Docker containers run applications",
    "Machine learning uses neural networks",
    "Linux is an open-source operating system",
    "Flask is a Python web framework",
]

# Pre-compute embeddings for all documents
doc_embeddings = [embed(doc) for doc in documents]

def search(query: str, top_k: int = 3) -> list[tuple[str, float]]:
    query_embedding = embed(query)

    scores = [
        (doc, cosine_similarity(query_embedding, doc_emb))
        for doc, doc_emb in zip(documents, doc_embeddings)
    ]

    return sorted(scores, key=lambda x: x[1], reverse=True)[:top_k]

results = search("web development with Python")
for doc, score in results:
    print(f"{score:.3f}: {doc}")

# Output:
# 0.891: Flask is a Python web framework
# 0.743: Python is a programming language
# 0.312: Docker containers run applications

The key insight: you pre-compute document embeddings once and store them. At query time, you only embed the query (one call), then compute similarity against all stored vectors. This is fast enough for thousands of documents in pure Python.

Keyword Search vs Semantic Search

The difference becomes obvious when the query uses different words than the documents:

# Keyword search: misses synonyms and related concepts
def keyword_search(query, docs):
    return [d for d in docs if query.lower() in d.lower()]

print(keyword_search("ML", documents))  # Returns nothing (no "ML" in docs)

# Semantic search: understands meaning
results = search("ML")
print(results[0])  # Finds "Machine learning uses neural networks" (0.82 similarity)

Semantic search also handles misspellings, synonyms, and paraphrases gracefully — anything the embedding model learned during training.

Store Embeddings in NumPy (Small Scale)

For up to tens of thousands of documents, NumPy arrays stored on disk are sufficient:

import numpy as np
import pickle

# Save
matrix = np.array(doc_embeddings)
np.save("embeddings.npy", matrix)

with open("documents.pkl", "wb") as f:
    pickle.dump(documents, f)

# Load
matrix = np.load("embeddings.npy")
with open("documents.pkl", "rb") as f:
    documents = pickle.load(f)

For vectorized similarity search over the whole matrix at once (much faster than a Python loop):

def fast_search(query: str, matrix: np.ndarray, documents: list[str], top_k: int = 3):
    q = np.array(embed(query))
    # Dot product of query with every row; normalise both sides
    norms = np.linalg.norm(matrix, axis=1) * np.linalg.norm(q)
    similarities = matrix @ q / norms
    top_indices = np.argsort(similarities)[::-1][:top_k]
    return [(documents[i], float(similarities[i])) for i in top_indices]

Store in PostgreSQL with pgvector (Production)

For production systems with millions of documents, you need a vector database. pgvector is a PostgreSQL extension that adds vector types and approximate nearest-neighbour search — no separate infrastructure required if you already use Postgres.

-- Install extension
CREATE EXTENSION vector;

-- Create table
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(768)  -- match your model's dimensions
);

-- Create index for fast search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);

import psycopg2
import json

conn = psycopg2.connect("postgresql://user:pass@localhost/mydb")
cur = conn.cursor()

def store_document(content: str):
    embedding = embed(content)
    cur.execute(
        "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
        (content, json.dumps(embedding))
    )
    conn.commit()

def semantic_search(query: str, limit: int = 5):
    query_embedding = embed(query)
    cur.execute(
        """
        SELECT content, 1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
        """,
        (json.dumps(query_embedding), json.dumps(query_embedding), limit)
    )
    return cur.fetchall()

The <=> operator computes cosine distance. 1 - cosine_distance = cosine_similarity, so we subtract to get a score where 1.0 is most similar. The ivfflat index makes this fast even with millions of rows.

Install the Python driver:

pip install psycopg2-binary

Batch Embed for Efficiency

Embedding one document at a time is slow when processing large datasets. Batch in groups to reduce overhead:

def batch_embed(texts: list[str], batch_size: int = 32) -> list[list[float]]:
    embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        for text in batch:
            embeddings.append(embed(text))
    return embeddings

With the OpenAI API you can pass a list directly to client.embeddings.create(input=batch) and get all embeddings in one network round-trip, which is significantly faster:

def batch_embed_openai(texts: list[str], batch_size: int = 100) -> list[list[float]]:
    embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=batch
        )
        embeddings.extend([item.embedding for item in response.data])
    return embeddings

Practical: Deduplicate Documents

Embeddings make it easy to find near-duplicate content even when the wording differs:

def find_duplicates(documents: list[str], threshold: float = 0.95) -> list[tuple[int, int]]:
    embeddings = batch_embed(documents)
    duplicates = []

    for i in range(len(embeddings)):
        for j in range(i + 1, len(embeddings)):
            sim = cosine_similarity(embeddings[i], embeddings[j])
            if sim > threshold:
                duplicates.append((i, j, sim))

    return duplicates

docs = [
    "How do I install Python on Ubuntu?",
    "Steps to install Python on Ubuntu Linux",
    "What is the capital of France?",
]

dupes = find_duplicates(docs, threshold=0.90)
for i, j, sim in dupes:
    print(f"Duplicate ({sim:.2f}): '{docs[i]}' and '{docs[j]}'")
# Duplicate (0.94): 'How do I install Python on Ubuntu?' and 'Steps to install Python on Ubuntu Linux'

This is useful for cleaning training data, deduplicating knowledge bases, or flagging repeated support tickets.

Choosing the Right Model

Model	Dimensions	Cost	Best for
nomic-embed-text (Ollama)	768	Free (local)	Local dev, private data
text-embedding-3-small (OpenAI)	1536	Very cheap	Production, cloud
text-embedding-3-large (OpenAI)	3072	Cheap	High-accuracy retrieval
mxbai-embed-large (Ollama)	1024	Free (local)	Better quality local option

For most applications, nomic-embed-text with Ollama is the right starting point: zero cost, runs offline, and good enough for semantic search over tens of thousands of documents. Switch to an OpenAI model when you need higher quality or are deploying to an environment without local GPU access.

Embeddings Explained: What They Are and How to Use Them with Python and Ollama (2026)

Embeddings Explained: What They Are and How to Use Them with Python and Ollama (2026)

What Are Embeddings?

Generate Embeddings with Ollama (Free, Local)

Or Use the Official Ollama Python Library

Generate with OpenAI

Cosine Similarity: Measure How Similar Two Texts Are

Semantic Search: Find Similar Documents

Keyword Search vs Semantic Search

Store Embeddings in NumPy (Small Scale)

Store in PostgreSQL with pgvector (Production)

Batch Embed for Efficiency

Practical: Deduplicate Documents

Choosing the Right Model

Related Guides

Leonardo Lazzaro

Embeddings Explained: What They Are and How to Use Them with Python and Ollama (2026)

What Are Embeddings?

Generate Embeddings with Ollama (Free, Local)

Or Use the Official Ollama Python Library

Generate with OpenAI

Cosine Similarity: Measure How Similar Two Texts Are

Semantic Search: Find Similar Documents

Keyword Search vs Semantic Search

Store Embeddings in NumPy (Small Scale)

Store in PostgreSQL with pgvector (Production)

Batch Embed for Efficiency

Practical: Deduplicate Documents

Choosing the Right Model

Related Guides

Related Articles

Leonardo Lazzaro