PostgreSQL Full-Text Search Tutorial 2026: tsvector, GIN Indexes, Ranking, and Autocomplete
Full-text search is one of those features that developers reach for Elasticsearch to solve — and often they do not need to. If your dataset is under ten million documents and you are already running PostgreSQL, you have a capable, production-ready search engine sitting right inside your database. No extra service to operate, no synchronization lag, no separate index to keep consistent. This tutorial walks through everything you need: the core data types, GIN indexing, relevance ranking, multi-column weighted search, multi-language support, and prefix autocomplete with pg_trgm.
All examples are tested against PostgreSQL 16 and PostgreSQL 17.
When PostgreSQL FTS Beats Elasticsearch
Before diving into SQL, it is worth being explicit about when you should choose PostgreSQL full-text search over a dedicated search engine.
Choose PostgreSQL FTS when:
- Your corpus is under 10 million documents. PostgreSQL's GIN indexes deliver sub-50 ms query times at this scale without tuning.
- You need transactional consistency. Search results reflect committed data instantly. There is no indexing pipeline to fall behind.
- You want to avoid operational overhead. Elasticsearch clusters require JVM heap tuning, shard rebalancing, and snapshot management. PostgreSQL is already in your stack.
- Your search needs are keyword-based. Stemming, stop-word removal, phrase search, and weighted ranking cover the vast majority of product, article, and document search use cases.
- Your team is small and your infrastructure budget is limited. One less moving part is a real advantage.
Choose Elasticsearch or OpenSearch when:
- You have more than 10 million documents and need horizontal scaling across shards.
- You need fuzzy matching at scale (edit-distance search across millions of docs).
- You need ML-based ranking (BM25+, dense vector retrieval, semantic search).
- You need rich faceted navigation (aggregations, nested filters, real-time facet counts).
- Your search workload would saturate the database server and starve your transactional queries of I/O.
For most SaaS applications, internal tools, e-commerce catalogs under a few million products, and knowledge bases, PostgreSQL is the right choice.
Core Concepts: tsvector and tsquery
PostgreSQL full-text search is built on two specialized data types.
tsvector
A tsvector is a sorted list of lexemes — normalized word stems — with optional positional information. PostgreSQL strips stop words (common words like "the", "a", "is"), applies a language-specific stemming algorithm, and stores the result.
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');
Result:
'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
Notice: "The" and "over" are stop words and are removed. "foxes" becomes "fox", "jumped" becomes "jump", "lazy" becomes "lazi". Positions are stored so phrase queries can check word adjacency.
tsquery
A tsquery represents a search expression. It supports Boolean operators (& for AND, | for OR, ! for NOT) and phrase operators.
PostgreSQL provides four functions for building tsquery values:
| Function | Use case |
|---|---|
to_tsquery('english', 'fox & dog') | Explicit Boolean operators; terms are stemmed |
plainto_tsquery('english', 'quick brown fox') | Treats all terms as AND; safe for user input |
phraseto_tsquery('english', 'quick brown fox') | Requires terms in order (phrase match) |
websearch_to_tsquery('english', 'quick -dog "brown fox"') | Web-style syntax: quotes for phrases, minus for NOT |
websearch_to_tsquery is the best choice for a user-facing search box — it mirrors the syntax users already know from Google and does not raise exceptions on malformed input.
SELECT websearch_to_tsquery('english', 'postgresql "full text" -elasticsearch');
-- Result: 'postgresql' & 'full' <-> 'text' & !'elasticsearch'
The @@ Match Operator
The match operator @@ returns true when a tsvector satisfies a tsquery.
SELECT to_tsvector('english', 'PostgreSQL full-text search is powerful')
@@ websearch_to_tsquery('english', '"full text" search');
-- Result: t
This is the operator you put in WHERE clauses to filter rows.
Setting Up a Working Example
Create a table that stores articles:
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
author TEXT,
language TEXT NOT NULL DEFAULT 'english',
created_at TIMESTAMPTZ DEFAULT NOW()
);
Insert some sample rows:
INSERT INTO articles (title, body, author) VALUES
('PostgreSQL Full-Text Search Guide',
'Learn how to use tsvector and tsquery for fast full-text search in PostgreSQL.',
'Alice'),
('Getting Started with Redis',
'Redis is an in-memory data structure store used as a database, cache, and message broker.',
'Bob'),
('Optimizing PostgreSQL Queries',
'EXPLAIN ANALYZE and index strategies for production PostgreSQL databases.',
'Alice'),
('Elasticsearch vs PostgreSQL',
'Comparing full-text search capabilities between Elasticsearch and PostgreSQL for mid-size datasets.',
'Carol');
A basic full-text search without any index:
SELECT id, title
FROM articles
WHERE to_tsvector(language, title || ' ' || body)
@@ websearch_to_tsquery('english', 'full text search postgresql');
This works, but it performs a full sequential scan — every row is converted to a tsvector at query time. On a million-row table this will take seconds. The fix is a stored tsvector column with a GIN index.
Adding a search_vector Column with Auto-Update Triggers
Store the precomputed tsvector in a dedicated column and keep it fresh with a trigger.
Step 1: Add the column
ALTER TABLE articles
ADD COLUMN search_vector TSVECTOR;
Step 2: Populate it for existing rows
UPDATE articles
SET search_vector = to_tsvector(language, title || ' ' || body);
Step 3: Create the trigger function
CREATE OR REPLACE FUNCTION articles_search_vector_update()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector := to_tsvector(
NEW.language::regconfig,
coalesce(NEW.title, '') || ' ' || coalesce(NEW.body, '')
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Step 4: Attach the trigger
CREATE TRIGGER articles_search_vector_trigger
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_vector_update();
Now every INSERT or UPDATE automatically recomputes the search_vector. The application never needs to think about it.
Step 5: Query using the stored column
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'full text search');
GIN Index: The Key to Sub-50ms Search
A GIN (Generalized Inverted Index) index maps each lexeme to the set of rows that contain it. This is the same structure used by dedicated search engines. Without it, PostgreSQL must scan every row. With it, lookups jump directly to matching rows.
Create the GIN index
CREATE INDEX articles_search_vector_gin
ON articles USING GIN (search_vector);
Index creation is the only slow step — it is a one-time cost proportional to table size.
Benchmark: Before and After GIN Index on 1 Million Rows
To illustrate the impact, here are representative numbers from a standard cloud VM (4 vCPU, 16 GB RAM, PostgreSQL 16) with a 1-million-row articles table (average 500 words per body):
| Condition | Query time (median) | Query time (p99) |
|---|---|---|
| Sequential scan, computed tsvector | ~3,200 ms | ~4,100 ms |
Stored search_vector, no index | ~2,800 ms | ~3,600 ms |
Stored search_vector + GIN index | ~8 ms | ~42 ms |
The GIN index delivers a 350x speedup. The stored column alone gives a modest improvement because it avoids recomputing to_tsvector at query time, but the index is what makes the query fast.
EXPLAIN ANALYZE output with the index looks like:
Bitmap Heap Scan on articles (cost=84.25..312.40 rows=98 width=...)
Recheck Cond: (search_vector @@ websearch_to_tsquery('english', '...'))
-> Bitmap Index Scan on articles_search_vector_gin
(cost=0.00..84.23 rows=98 width=0)
Index Cond: (search_vector @@ websearch_to_tsquery(...))
The planner uses a Bitmap Index Scan — it walks the GIN index, collects matching page numbers, and then does a single heap scan. No sequential scan.
Reference: PostgreSQL documentation — GIN indexes
Relevance Ranking with ts_rank and ts_rank_cd
Matching rows is only half the battle — ordering them by relevance is what users notice.
ts_rank
ts_rank computes a score based on how frequently the query terms appear in the document. Higher frequency means higher rank.
SELECT
id,
title,
ts_rank(search_vector, websearch_to_tsquery('english', 'postgresql search')) AS rank
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql search')
ORDER BY rank DESC;
ts_rank_cd (Cover Density)
ts_rank_cd also considers how close query terms are to each other. Documents where "postgresql" and "search" appear near each other score higher than documents where they are paragraphs apart.
SELECT
id,
title,
ts_rank_cd(search_vector, websearch_to_tsquery('english', 'postgresql search')) AS rank
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql search')
ORDER BY rank DESC;
Both functions accept a normalization integer as a third argument:
-- Divide rank by document length (avoid bias toward long documents)
ts_rank(search_vector, query, 1)
-- Divide rank by number of unique words
ts_rank(search_vector, query, 8)
For most applications, ts_rank_cd with normalization by document length (1) gives the most intuitive results.
Highlighting Results with ts_headline
Show users exactly why their query matched by highlighting the relevant excerpt:
SELECT
id,
title,
ts_headline(
'english',
body,
websearch_to_tsquery('english', 'full text search'),
'MaxWords=35, MinWords=15, StartSel=<mark>, StopSel=</mark>'
) AS excerpt
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'full text search')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'full text search')) DESC;
ts_headline scans the original text (not the tsvector), finds the fragment with the highest density of matching terms, and wraps matches in your chosen markers. The options control snippet length and HTML delimiters.
Important: ts_headline is CPU-intensive. Run it only on rows already filtered by the WHERE clause, never on the full table.
Multi-Column Weighted Search
Not all text is equal. A match in the title should outrank a match buried in the body. PostgreSQL's setweight function assigns a weight label to each part of the tsvector:
| Weight label | Default rank multiplier |
|---|---|
| A | 1.0 |
| B | 0.4 |
| C | 0.2 |
| D | 0.1 |
Building a weighted search vector
Update the trigger function to apply weights:
CREATE OR REPLACE FUNCTION articles_search_vector_update()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.title, '')), 'A')
||
setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.author, '')), 'B')
||
setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.body, '')), 'C');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Re-populate existing rows:
UPDATE articles SET title = title; -- triggers re-computation via the trigger
Now ts_rank and ts_rank_cd automatically prefer title matches over body matches. An article whose title contains "postgresql search" will rank above one that only mentions it in a long body paragraph.
You can also build the weighted vector ad hoc in a query without a trigger:
SELECT
id,
title,
ts_rank_cd(
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'C'),
websearch_to_tsquery('english', 'postgresql')
) AS rank
FROM articles
WHERE
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'C')
@@ websearch_to_tsquery('english', 'postgresql')
ORDER BY rank DESC;
The ad hoc approach skips the index — use the stored column with a GIN index in production.
Multi-Language Support
A single dictionary does not work for all content. The language column in the example table already stores the per-row language name. Use it as a regconfig cast:
-- Per-row language from the table
UPDATE articles
SET search_vector =
setweight(to_tsvector(language::regconfig, coalesce(title, '')), 'A') ||
setweight(to_tsvector(language::regconfig, coalesce(body, '')), 'C');
Detecting language automatically
If your application does not know the language at insert time, detect it before inserting using a library like langdetect (Python) or franc (Node.js), and store the result in the language column. The PostgreSQL language column value must match a text search configuration name — run this to see available configurations:
SELECT cfgname FROM pg_ts_config ORDER BY cfgname;
-- english, french, german, spanish, portuguese, italian, russian, ...
Querying across languages
When users search across a multilingual corpus, run the query against a simple dictionary (no stemming) or fan out to multiple language-specific queries:
-- 'simple' dictionary: no stemming, just lowercasing
WHERE search_vector @@ to_tsquery('simple', 'recherche')
-- Or fan out: match either French or English stemming
WHERE search_vector @@ to_tsquery('french', 'recherche')
OR search_vector @@ to_tsquery('english', 'research')
Reference: PostgreSQL text search dictionaries
Autocomplete and Prefix Search with pg_trgm
Full-text search with tsvector does not support prefix matching natively for partial words. For type-ahead autocomplete — where users type "postg" and expect "postgresql" to appear — the pg_trgm extension is the right tool.
What is a trigram?
A trigram is a sequence of three consecutive characters. The string "quick" produces the trigrams: q, qu, qui, uic, ick, ck. PostgreSQL's pg_trgm extension uses trigram overlap to measure string similarity, and — crucially — can use a GIN or GiST index to accelerate LIKE, ILIKE, and ~ queries.
Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Create a trigram GIN index
-- Index the title column for fast prefix/substring matching
CREATE INDEX articles_title_trgm_gin
ON articles USING GIN (title gin_trgm_ops);
-- Index the body column if you need substring search there too
CREATE INDEX articles_body_trgm_gin
ON articles USING GIN (body gin_trgm_ops);
Fast ILIKE queries
Without pg_trgm, ILIKE '%postg%' forces a sequential scan. With the trigram GIN index, it becomes an index scan:
-- Autocomplete: find titles containing the typed prefix
SELECT id, title
FROM articles
WHERE title ILIKE '%postg%'
ORDER BY similarity(title, 'postg') DESC
LIMIT 10;
similarity() returns a value between 0 and 1. Combined with ORDER BY similarity(...) DESC, results are ranked by how closely the title resembles the query — a simple and effective autocomplete relevance signal.
Similarity threshold search
For fuzzy matching (catching typos), use the % similarity operator:
-- Set similarity threshold (default 0.3)
SET pg_trgm.similarity_threshold = 0.3;
SELECT id, title, similarity(title, 'postgress') AS sim
FROM articles
WHERE title % 'postgress' -- 'postgress' is a typo of 'postgresql'
ORDER BY sim DESC
LIMIT 10;
Reference: PostgreSQL pg_trgm documentation
Combining FTS and trigram search
A common production pattern is to use both:
- FTS (
@@operator, GIN onsearch_vector) for ranked keyword search in the main search path. - Trigram index (
ILIKE,%) for the autocomplete dropdown as the user types.
They use separate indexes and can coexist on the same table without conflict.
Integration with Python and SQLAlchemy
Raw psycopg2
import psycopg2
conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()
query = "postgresql performance"
cur.execute("""
SELECT
id,
title,
ts_rank_cd(search_vector, websearch_to_tsquery('english', %s)) AS rank,
ts_headline('english', body, websearch_to_tsquery('english', %s),
'MaxWords=30, MinWords=10, StartSel=<b>, StopSel=</b>') AS excerpt
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', %s)
ORDER BY rank DESC
LIMIT 20
""", (query, query, query))
results = cur.fetchall()
for row in results:
print(f"[{row[2]:.4f}] {row[1]}\n {row[3]}\n")
SQLAlchemy (Core / ORM)
SQLAlchemy does not ship with first-class tsvector support, but the sqlalchemy-utils package provides it, or you can use func and cast directly:
from sqlalchemy import Column, Integer, Text, func
from sqlalchemy.orm import DeclarativeBase, Session
from sqlalchemy.dialects.postgresql import TSVECTOR
class Base(DeclarativeBase):
pass
class Article(Base):
__tablename__ = "articles"
id = Column(Integer, primary_key=True)
title = Column(Text, nullable=False)
body = Column(Text, nullable=False)
language = Column(Text, nullable=False, default="english")
search_vector = Column(TSVECTOR)
def search_articles(session: Session, user_query: str, limit: int = 20):
tsquery = func.websearch_to_tsquery("english", user_query)
rank = func.ts_rank_cd(Article.search_vector, tsquery)
return (
session.query(Article, rank.label("rank"))
.filter(Article.search_vector.op("@@")(tsquery))
.order_by(rank.desc())
.limit(limit)
.all()
)
Autocomplete endpoint example (FastAPI)
from fastapi import FastAPI, Query, Depends
from sqlalchemy import text
from sqlalchemy.orm import Session
app = FastAPI()
@app.get("/autocomplete")
def autocomplete(q: str = Query(..., min_length=2), db: Session = Depends(get_db)):
rows = db.execute(
text("""
SELECT id, title, similarity(title, :q) AS sim
FROM articles
WHERE title ILIKE :pattern
ORDER BY sim DESC
LIMIT 8
"""),
{"q": q, "pattern": f"%{q}%"}
).fetchall()
return [{"id": r.id, "title": r.title} for r in rows]
pg_trgm vs Full-Text Search: When to Use Each
| Capability | tsvector / FTS | pg_trgm |
|---|---|---|
| Keyword search with stemming | Yes | No |
| Stop word removal | Yes | No |
Relevance ranking (ts_rank) | Yes | Via similarity() |
| Phrase search | Yes (phraseto_tsquery) | Partial (substring) |
| Prefix / substring autocomplete | Limited | Yes (ILIKE '%term%') |
| Fuzzy / typo tolerance | No | Yes (% operator) |
| Multi-language stemming | Yes | No |
| Index type | GIN on TSVECTOR column | GIN/GiST on TEXT with gin_trgm_ops |
| Best for | Full document search | Autocomplete, fuzzy name matching |
Use FTS for your main search feature. Use pg_trgm for autocomplete dropdowns and fuzzy name lookups. Use both together for a complete search experience.
When to Move to Elasticsearch
PostgreSQL full-text search is excellent up to a point. Here are the concrete signals that it is time to graduate to a dedicated search engine:
Corpus size exceeds 10 million documents. GIN indexes become large, updates slow down, and query latency creeps up. Elasticsearch distributes shards across nodes horizontally.
You need fuzzy search at scale.
pg_trgmfuzzy matching does not scale as gracefully as Elasticsearch's BK-tree-based fuzzy queries on hundreds of millions of documents.You need ML ranking. Elasticsearch's
learning_to_rankplugin and vector search (knnqueries with dense embeddings) enable semantic search that PostgreSQL cannot replicate natively.You need faceted aggregations. Counting results by category, date histogram buckets, or nested attribute values is Elasticsearch's core strength. PostgreSQL aggregations work but become expensive when the filtered result set is large.
Search workload threatens your OLTP performance. Full-text queries are CPU-intensive. On a shared PostgreSQL instance, a spike in search traffic can starve your transactional workload. A separate Elasticsearch cluster isolates the two.
If none of these apply, stay with PostgreSQL. Every engineer who has operated an Elasticsearch cluster will tell you: the operational savings of not running one are significant.
Complete Schema Reference
Here is the complete schema for a production-ready articles search system:
-- Enable pg_trgm for autocomplete
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Main table
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
author TEXT,
language TEXT NOT NULL DEFAULT 'english',
search_vector TSVECTOR,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- GIN index for full-text search
CREATE INDEX articles_search_vector_gin
ON articles USING GIN (search_vector);
-- GIN trigram index for autocomplete on title
CREATE INDEX articles_title_trgm_gin
ON articles USING GIN (title gin_trgm_ops);
-- Trigger function: weighted multi-column search vector
CREATE OR REPLACE FUNCTION articles_search_vector_update()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.title, '')), 'A') ||
setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.author, '')), 'B') ||
setweight(to_tsvector(NEW.language::regconfig, coalesce(NEW.body, '')), 'C');
NEW.updated_at := NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Attach trigger
CREATE TRIGGER articles_search_vector_trigger
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_vector_update();
Summary
PostgreSQL full-text search is a mature, production-ready system that covers the search needs of most applications:
tsvectorstores precomputed, stemmed lexemes with positions.tsqueryrepresents Boolean search expressions. Usewebsearch_to_tsqueryfor user input.- The
@@operator matches atsvectoragainst atsquery. - Stored
search_vectorcolumns with triggers keep the index data current without application changes. - GIN indexes reduce query time from seconds to single-digit milliseconds on million-row tables.
ts_rankandts_rank_cdrank results by relevance;setweightbiases ranking toward title matches.ts_headlinegenerates highlighted excerpts for the result list.pg_trgmadds fastILIKEand fuzzy matching for autocomplete, using a separate GIN index.- Multi-language search works per row by storing the language name and casting it to
regconfig.
When your corpus grows beyond 10 million documents, or when you need ML ranking, faceted aggregations, or dedicated horizontal scaling, that is the right time to introduce Elasticsearch. Until then, PostgreSQL is all you need.