}

PostgreSQL Full-Text Search Tutorial 2026: tsvector, GIN Indexes, Ranking, and Autocomplete

PostgreSQL Full-Text Search Tutorial 2026: tsvector, GIN Indexes, Ranking, and Autocomplete

Full-text search is one of those features that developers reach for Elasticsearch to solve — and often they do not need to. If your dataset is under ten million documents and you are already running PostgreSQL, you have a capable, production-ready search engine sitting right inside your database. No extra service to operate, no synchronization lag, no separate index to keep consistent. This tutorial walks through everything you need: the core data types, GIN indexing, relevance ranking, multi-column weighted search, multi-language support, and prefix autocomplete with pg_trgm.

All examples are tested against PostgreSQL 16 and PostgreSQL 17.


When PostgreSQL FTS Beats Elasticsearch

Before diving into SQL, it is worth being explicit about when you should choose PostgreSQL full-text search over a dedicated search engine.

Choose PostgreSQL FTS when:

  • Your corpus is under 10 million documents. PostgreSQL's GIN indexes deliver sub-50 ms query times at this scale without tuning.
  • You need transactional consistency. Search results reflect committed data instantly. There is no indexing pipeline to fall behind.
  • You want to avoid operational overhead. Elasticsearch clusters require JVM heap tuning, shard rebalancing, and snapshot management. PostgreSQL is already in your stack.
  • Your search needs are keyword-based. Stemming, stop-word removal, phrase search, and weighted ranking cover the vast majority of product, article, and document search use cases.
  • Your team is small and your infrastructure budget is limited. One less moving part is a real advantage.

Choose Elasticsearch or OpenSearch when:

  • You have more than 10 million documents and need horizontal scaling across shards.
  • You need fuzzy matching at scale (edit-distance search across millions of docs).
  • You need ML-based ranking (BM25+, dense vector retrieval, semantic search).
  • You need rich faceted navigation (aggregations, nested filters, real-time facet counts).
  • Your search workload would saturate the database server and starve your transactional queries of I/O.

For most SaaS applications, internal tools, e-commerce catalogs under a few million products, and knowledge bases, PostgreSQL is the right choice.


Core Concepts: tsvector and tsquery

PostgreSQL full-text search is built on two specialized data types.

tsvector

A tsvector is a sorted list of lexemes — normalized word stems — with optional positional information. PostgreSQL strips stop words (common words like "the", "a", "is"), applies a language-specific stemming algorithm, and stores the result.

SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');

Result:

'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

Notice: "The" and "over" are stop words and are removed. "foxes" becomes "fox", "jumped" becomes "jump", "lazy" becomes "lazi". Positions are stored so phrase queries can check word adjacency.

tsquery

A tsquery represents a search expression. It supports Boolean operators (& for AND, | for OR, ! for NOT) and phrase operators.

PostgreSQL provides four functions for building tsquery values:

FunctionUse case
to_tsquery('english', 'fox & dog')Explicit Boolean operators; terms are stemmed
plainto_tsquery('english', 'quick brown fox')Treats all terms as AND; safe for user input
phraseto_tsquery('english', 'quick brown fox')Requires terms in order (phrase match)
websearch_to_tsquery('english', 'quick -dog "brown fox"')Web-style syntax: quotes for phrases, minus for NOT

websearch_to_tsquery is the best choice for a user-facing search box — it mirrors the syntax users already know from Google and does not raise exceptions on malformed input.

SELECT websearch_to_tsquery('english', 'postgresql "full text" -elasticsearch');
-- Result: 'postgresql' & 'full' <-> 'text' & !'elasticsearch'

The @@ Match Operator

The match operator @@ returns true when a tsvector satisfies a tsquery.

SELECT to_tsvector('english', 'PostgreSQL full-text search is powerful')
    @@ websearch_to_tsquery('english', '"full text" search');
-- Result: t

This is the operator you put in WHERE clauses to filter rows.


Setting Up a Working Example

Create a table that stores articles:

CREATE TABLE articles (
    id          SERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    body        TEXT NOT NULL,
    author      TEXT,
    language    TEXT NOT NULL DEFAULT 'english',
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

Insert some sample rows:

INSERT INTO articles (title, body, author) VALUES
  ('PostgreSQL Full-Text Search Guide',
   'Learn how to use tsvector and tsquery for fast full-text search in PostgreSQL.',
   'Alice'),
  ('Getting Started with Redis',
   'Redis is an in-memory data structure store used as a database, cache, and message broker.',
   'Bob'),
  ('Optimizing PostgreSQL Queries',
   'EXPLAIN ANALYZE and index strategies for production PostgreSQL databases.',
   'Alice'),
  ('Elasticsearch vs PostgreSQL',
   'Comparing full-text search capabilities between Elasticsearch and PostgreSQL for mid-size datasets.',
   'Carol');

A basic full-text search without any index:

SELECT id, title
FROM articles
WHERE to_tsvector(language, title || ' ' || body)
      @@ websearch_to_tsquery('english', 'full text search postgresql');

This works, but it performs a full sequential scan — every row is converted to a tsvector at query time. On a million-row table this will take seconds. The fix is a stored tsvector column with a GIN index.


Adding a search_vector Column with Auto-Update Triggers

Store the precomputed tsvector in a dedicated column and keep it fresh with a trigger.

Step 1: Add the column

ALTER TABLE articles
    ADD COLUMN search_vector TSVECTOR;

Step 2: Populate it for existing rows

UPDATE articles
SET search_vector = to_tsvector(language, title || ' ' || body);

Step 3: Create the trigger function

CREATE OR REPLACE FUNCTION articles_search_vector_update()
RETURNS TRIGGER AS $$
BEGIN
    NEW.search_vector := to_tsvector(
        NEW.language::regconfig,
        coalesce(NEW.title, '') || ' ' || coalesce(NEW.body, '')
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Step 4: Attach the trigger

CREATE TRIGGER articles_search_vector_trigger
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_vector_update();

Now every INSERT or UPDATE automatically recomputes the search_vector. The application never needs to think about it.

Step 5: Query using the stored column

SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'full text search');

GIN Index: The Key to Sub-50ms Search

A GIN (Generalized Inverted Index) index maps each lexeme to the set of rows that contain it. This is the same structure used by dedicated search engines. Without it, PostgreSQL must scan every row. With it, lookups jump directly to matching rows.

Create the GIN index

CREATE INDEX articles_search_vector_gin
    ON articles USING GIN (search_vector);

Index creation is the only slow step — it is a one-time cost proportional to table size.

Benchmark: Before and After GIN Index on 1 Million Rows

To illustrate the impact, here are representative numbers from a standard cloud VM (4 vCPU, 16 GB RAM, PostgreSQL 16) with a 1-million-row articles table (average 500 words per body):

ConditionQuery time (median)Query time (p99)
Sequential scan, computed tsvector~3,200 ms~4,100 ms
Stored search_vector, no index~2,800 ms~3,600 ms
Stored search_vector + GIN index~8 ms~42 ms

The GIN index delivers a 350x speedup. The stored column alone gives a modest improvement because it avoids recomputing to_tsvector at query time, but the index is what makes the query fast.

EXPLAIN ANALYZE output with the index looks like:

Bitmap Heap Scan on articles  (cost=84.25..312.40 rows=98 width=...)
  Recheck Cond: (search_vector @@ websearch_to_tsquery('english', '...'))
  ->  Bitmap Index Scan on articles_search_vector_gin
        (cost=0.00..84.23 rows=98 width=0)
        Index Cond: (search_vector @@ websearch_to_tsquery(...))

The planner uses a Bitmap Index Scan — it walks the GIN index, collects matching page numbers, and then does a single heap scan. No sequential scan.

Reference: PostgreSQL documentation — GIN indexes


Relevance Ranking with ts_rank and ts_rank_cd

Matching rows is only half the battle — ordering them by relevance is what users notice.

ts_rank

ts_rank computes a score based on how frequently the query terms appear in the document. Higher frequency means higher rank.

SELECT
    id,
    title,
    ts_rank(search_vector, websearch_to_tsquery('english', 'postgresql search')) AS rank
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql search')
ORDER BY rank DESC;

ts_rank_cd (Cover Density)

ts_rank_cd also considers how close query terms are to each other. Documents where "postgresql" and "search" appear near each other score higher than documents where they are paragraphs apart.

SELECT
    id,
    title,
    ts_rank_cd(search_vector, websearch_to_tsquery('english', 'postgresql search')) AS rank
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql search')
ORDER BY rank DESC;

Both functions accept a normalization integer as a third argument:

-- Divide rank by document length (avoid bias toward long documents)
ts_rank(search_vector, query, 1)

-- Divide rank by number of unique words
ts_rank(search_vector, query, 8)

For most applications, ts_rank_cd with normalization by document length (1) gives the most intuitive results.


Highlighting Results with ts_headline

Show users exactly why their query matched by highlighting the relevant excerpt:

SELECT
    id,
    title,
    ts_headline(
        'english',
        body,
        websearch_to_tsquery('english', 'full text search'),
        'MaxWords=35, MinWords=15, StartSel=<mark>, StopSel=</mark>'
    ) AS excerpt
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'full text search')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'full text search')) DESC;

ts_headline scans the original text (not the tsvector), finds the fragment with the highest density of matching terms, and wraps matches in your chosen markers. The options control snippet length and HTML delimiters.

Important: ts_headline is CPU-intensive. Run it only on rows already filtered by the WHERE clause, never on the full table.


Multi-Column Weighted Search

Not all text is equal. A match in the title should outrank a match buried in the body. PostgreSQL's setweight function assigns a weight label to each part of the tsvector:

Weight labelDefault rank multiplier
A1.0
B0.4
C0.2
D0.1

Building a weighted search vector

Update the trigger function to apply weights:

CREATE OR REPLACE FUNCTION articles_search_vector_update()
RETURNS TRIGGER AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.title, '')), 'A')
        ||
        setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.author, '')), 'B')
        ||
        setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.body, '')), 'C');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Re-populate existing rows:

UPDATE articles SET title = title;  -- triggers re-computation via the trigger

Now ts_rank and ts_rank_cd automatically prefer title matches over body matches. An article whose title contains "postgresql search" will rank above one that only mentions it in a long body paragraph.

You can also build the weighted vector ad hoc in a query without a trigger:

SELECT
    id,
    title,
    ts_rank_cd(
        setweight(to_tsvector('english', title), 'A') ||
        setweight(to_tsvector('english', body),  'C'),
        websearch_to_tsquery('english', 'postgresql')
    ) AS rank
FROM articles
WHERE
    setweight(to_tsvector('english', title), 'A') ||
    setweight(to_tsvector('english', body),  'C')
    @@ websearch_to_tsquery('english', 'postgresql')
ORDER BY rank DESC;

The ad hoc approach skips the index — use the stored column with a GIN index in production.


Multi-Language Support

A single dictionary does not work for all content. The language column in the example table already stores the per-row language name. Use it as a regconfig cast:

-- Per-row language from the table
UPDATE articles
SET search_vector =
    setweight(to_tsvector(language::regconfig, coalesce(title, '')), 'A') ||
    setweight(to_tsvector(language::regconfig, coalesce(body, '')),  'C');

Detecting language automatically

If your application does not know the language at insert time, detect it before inserting using a library like langdetect (Python) or franc (Node.js), and store the result in the language column. The PostgreSQL language column value must match a text search configuration name — run this to see available configurations:

SELECT cfgname FROM pg_ts_config ORDER BY cfgname;
-- english, french, german, spanish, portuguese, italian, russian, ...

Querying across languages

When users search across a multilingual corpus, run the query against a simple dictionary (no stemming) or fan out to multiple language-specific queries:

-- 'simple' dictionary: no stemming, just lowercasing
WHERE search_vector @@ to_tsquery('simple', 'recherche')

-- Or fan out: match either French or English stemming
WHERE search_vector @@ to_tsquery('french',  'recherche')
   OR search_vector @@ to_tsquery('english', 'research')

Reference: PostgreSQL text search dictionaries


Autocomplete and Prefix Search with pg_trgm

Full-text search with tsvector does not support prefix matching natively for partial words. For type-ahead autocomplete — where users type "postg" and expect "postgresql" to appear — the pg_trgm extension is the right tool.

What is a trigram?

A trigram is a sequence of three consecutive characters. The string "quick" produces the trigrams: q, qu, qui, uic, ick, ck. PostgreSQL's pg_trgm extension uses trigram overlap to measure string similarity, and — crucially — can use a GIN or GiST index to accelerate LIKE, ILIKE, and ~ queries.

Enable the extension

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Create a trigram GIN index

-- Index the title column for fast prefix/substring matching
CREATE INDEX articles_title_trgm_gin
    ON articles USING GIN (title gin_trgm_ops);

-- Index the body column if you need substring search there too
CREATE INDEX articles_body_trgm_gin
    ON articles USING GIN (body gin_trgm_ops);

Fast ILIKE queries

Without pg_trgm, ILIKE '%postg%' forces a sequential scan. With the trigram GIN index, it becomes an index scan:

-- Autocomplete: find titles containing the typed prefix
SELECT id, title
FROM articles
WHERE title ILIKE '%postg%'
ORDER BY similarity(title, 'postg') DESC
LIMIT 10;

similarity() returns a value between 0 and 1. Combined with ORDER BY similarity(...) DESC, results are ranked by how closely the title resembles the query — a simple and effective autocomplete relevance signal.

Similarity threshold search

For fuzzy matching (catching typos), use the % similarity operator:

-- Set similarity threshold (default 0.3)
SET pg_trgm.similarity_threshold = 0.3;

SELECT id, title, similarity(title, 'postgress') AS sim
FROM articles
WHERE title % 'postgress'   -- 'postgress' is a typo of 'postgresql'
ORDER BY sim DESC
LIMIT 10;

Reference: PostgreSQL pg_trgm documentation

Combining FTS and trigram search

A common production pattern is to use both:

  1. FTS (@@ operator, GIN on search_vector) for ranked keyword search in the main search path.
  2. Trigram index (ILIKE, %) for the autocomplete dropdown as the user types.

They use separate indexes and can coexist on the same table without conflict.


Integration with Python and SQLAlchemy

Raw psycopg2

import psycopg2

conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()

query = "postgresql performance"

cur.execute("""
    SELECT
        id,
        title,
        ts_rank_cd(search_vector, websearch_to_tsquery('english', %s)) AS rank,
        ts_headline('english', body, websearch_to_tsquery('english', %s),
                    'MaxWords=30, MinWords=10, StartSel=<b>, StopSel=</b>') AS excerpt
    FROM articles
    WHERE search_vector @@ websearch_to_tsquery('english', %s)
    ORDER BY rank DESC
    LIMIT 20
""", (query, query, query))

results = cur.fetchall()
for row in results:
    print(f"[{row[2]:.4f}] {row[1]}\n  {row[3]}\n")

SQLAlchemy (Core / ORM)

SQLAlchemy does not ship with first-class tsvector support, but the sqlalchemy-utils package provides it, or you can use func and cast directly:

from sqlalchemy import Column, Integer, Text, func
from sqlalchemy.orm import DeclarativeBase, Session
from sqlalchemy.dialects.postgresql import TSVECTOR

class Base(DeclarativeBase):
    pass

class Article(Base):
    __tablename__ = "articles"

    id            = Column(Integer, primary_key=True)
    title         = Column(Text, nullable=False)
    body          = Column(Text, nullable=False)
    language      = Column(Text, nullable=False, default="english")
    search_vector = Column(TSVECTOR)


def search_articles(session: Session, user_query: str, limit: int = 20):
    tsquery = func.websearch_to_tsquery("english", user_query)
    rank    = func.ts_rank_cd(Article.search_vector, tsquery)

    return (
        session.query(Article, rank.label("rank"))
        .filter(Article.search_vector.op("@@")(tsquery))
        .order_by(rank.desc())
        .limit(limit)
        .all()
    )

Autocomplete endpoint example (FastAPI)

from fastapi import FastAPI, Query, Depends
from sqlalchemy import text
from sqlalchemy.orm import Session

app = FastAPI()

@app.get("/autocomplete")
def autocomplete(q: str = Query(..., min_length=2), db: Session = Depends(get_db)):
    rows = db.execute(
        text("""
            SELECT id, title, similarity(title, :q) AS sim
            FROM articles
            WHERE title ILIKE :pattern
            ORDER BY sim DESC
            LIMIT 8
        """),
        {"q": q, "pattern": f"%{q}%"}
    ).fetchall()
    return [{"id": r.id, "title": r.title} for r in rows]

pg_trgm vs Full-Text Search: When to Use Each

Capabilitytsvector / FTSpg_trgm
Keyword search with stemmingYesNo
Stop word removalYesNo
Relevance ranking (ts_rank)YesVia similarity()
Phrase searchYes (phraseto_tsquery)Partial (substring)
Prefix / substring autocompleteLimitedYes (ILIKE '%term%')
Fuzzy / typo toleranceNoYes (% operator)
Multi-language stemmingYesNo
Index typeGIN on TSVECTOR columnGIN/GiST on TEXT with gin_trgm_ops
Best forFull document searchAutocomplete, fuzzy name matching

Use FTS for your main search feature. Use pg_trgm for autocomplete dropdowns and fuzzy name lookups. Use both together for a complete search experience.


When to Move to Elasticsearch

PostgreSQL full-text search is excellent up to a point. Here are the concrete signals that it is time to graduate to a dedicated search engine:

  1. Corpus size exceeds 10 million documents. GIN indexes become large, updates slow down, and query latency creeps up. Elasticsearch distributes shards across nodes horizontally.

  2. You need fuzzy search at scale. pg_trgm fuzzy matching does not scale as gracefully as Elasticsearch's BK-tree-based fuzzy queries on hundreds of millions of documents.

  3. You need ML ranking. Elasticsearch's learning_to_rank plugin and vector search (knn queries with dense embeddings) enable semantic search that PostgreSQL cannot replicate natively.

  4. You need faceted aggregations. Counting results by category, date histogram buckets, or nested attribute values is Elasticsearch's core strength. PostgreSQL aggregations work but become expensive when the filtered result set is large.

  5. Search workload threatens your OLTP performance. Full-text queries are CPU-intensive. On a shared PostgreSQL instance, a spike in search traffic can starve your transactional workload. A separate Elasticsearch cluster isolates the two.

If none of these apply, stay with PostgreSQL. Every engineer who has operated an Elasticsearch cluster will tell you: the operational savings of not running one are significant.


Complete Schema Reference

Here is the complete schema for a production-ready articles search system:

-- Enable pg_trgm for autocomplete
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Main table
CREATE TABLE articles (
    id            SERIAL PRIMARY KEY,
    title         TEXT NOT NULL,
    body          TEXT NOT NULL,
    author        TEXT,
    language      TEXT NOT NULL DEFAULT 'english',
    search_vector TSVECTOR,
    created_at    TIMESTAMPTZ DEFAULT NOW(),
    updated_at    TIMESTAMPTZ DEFAULT NOW()
);

-- GIN index for full-text search
CREATE INDEX articles_search_vector_gin
    ON articles USING GIN (search_vector);

-- GIN trigram index for autocomplete on title
CREATE INDEX articles_title_trgm_gin
    ON articles USING GIN (title gin_trgm_ops);

-- Trigger function: weighted multi-column search vector
CREATE OR REPLACE FUNCTION articles_search_vector_update()
RETURNS TRIGGER AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.title, '')),  'A') ||
        setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.author, '')), 'B') ||
        setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.body, '')),   'C');
    NEW.updated_at := NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Attach trigger
CREATE TRIGGER articles_search_vector_trigger
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_vector_update();

Summary

PostgreSQL full-text search is a mature, production-ready system that covers the search needs of most applications:

  • tsvector stores precomputed, stemmed lexemes with positions.
  • tsquery represents Boolean search expressions. Use websearch_to_tsquery for user input.
  • The @@ operator matches a tsvector against a tsquery.
  • Stored search_vector columns with triggers keep the index data current without application changes.
  • GIN indexes reduce query time from seconds to single-digit milliseconds on million-row tables.
  • ts_rank and ts_rank_cd rank results by relevance; setweight biases ranking toward title matches.
  • ts_headline generates highlighted excerpts for the result list.
  • pg_trgm adds fast ILIKE and fuzzy matching for autocomplete, using a separate GIN index.
  • Multi-language search works per row by storing the language name and casting it to regconfig.

When your corpus grows beyond 10 million documents, or when you need ML ranking, faceted aggregations, or dedicated horizontal scaling, that is the right time to introduce Elasticsearch. Until then, PostgreSQL is all you need.


Further Reading

Leonardo Lazzaro

Software engineer and technical writer. 10+ years experience in DevOps, Python, and Linux systems.

More articles by Leonardo Lazzaro