}

PostgreSQL EXPLAIN ANALYZE: Fix Slow Queries with Proper Indexes (2026)

PostgreSQL EXPLAIN ANALYZE: Fix Slow Queries with Proper Indexes (2026)

PostgreSQL has held the top spot as the most admired database management system in the Stack Overflow Developer Survey for several consecutive years. Part of what earns that reputation is a world-class query planner and a rich set of tools for understanding exactly what it is doing. The most important of those tools is EXPLAIN ANALYZE — the command that reveals not just what the planner intends to do, but what it actually did when it ran.

This tutorial covers reading query plans from first principles, choosing the right index type for each access pattern, and working through a realistic case study that takes a 30-second query down to 50 milliseconds.

EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN shows the query plan the planner intends to use without executing the query. It is completely free to run — you can call it on an UPDATE or DELETE without touching any data.

EXPLAIN SELECT * FROM orders WHERE user_id = 42;

EXPLAIN ANALYZE actually executes the query and reports both the planner's estimates and the real measurements:

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

The addition of BUFFERS shows cache hit/miss statistics, which is invaluable for understanding I/O behaviour:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;

For queries that modify data, wrap them in a transaction you roll back so the analysis runs without side effects:

BEGIN;
EXPLAIN ANALYZE DELETE FROM sessions WHERE expires_at < NOW();
ROLLBACK;

Reading EXPLAIN ANALYZE Output

A typical output looks like this:

Seq Scan on orders  (cost=0.00..18340.00 rows=450000 width=64)
                    (actual time=0.042..312.741 rows=450000 loops=1)
Planning Time: 0.8 ms
Execution Time: 342.5 ms

Each node in the plan has two pairs of numbers:

  • cost=startup..total — the planner's estimate. Startup cost is the time before the first row is returned; total cost is the time to return all rows. These are in abstract "cost units", not milliseconds.
  • rows — how many rows the planner expects this node to return.
  • actual time=startup..total — real elapsed time in milliseconds, measured during execution.
  • loops — how many times this node was executed (relevant for nested loops and subqueries). The actual time shown is per loop; multiply by loops for the total.

The key comparison to make is between the estimated rows and the actual rows. A large discrepancy (say, the planner expected 100 rows but got 450,000) means table statistics are stale — run ANALYZE tablename to refresh them.

Seq Scan means PostgreSQL read every page of the table in order. For small tables or queries returning most rows, this is correct. For a query returning a handful of rows from a million-row table, it is a red flag.

Index Scan means PostgreSQL used an index to locate specific rows, then fetched them from the heap. Much faster for selective queries.

Index Only Scan means all needed columns were in the index — no heap access at all. Fastest for covered queries.

Bitmap Heap Scan is a two-phase approach: build a bitmap of matching pages from the index, then fetch those pages. Good for moderately selective queries returning many rows.

Finding Slow Queries Before They Bite You

Do not wait for a user to complain. Enable pg_stat_statements to track query statistics across all sessions:

-- In postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'

-- After restarting PostgreSQL:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Then find your worst offenders by total time:

SELECT query,
       calls,
       round(total_exec_time::numeric, 2) AS total_ms,
       round(mean_exec_time::numeric, 2)  AS mean_ms,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

For immediate logging of slow queries, set in postgresql.conf:

log_min_duration_statement = 1000

This logs any query taking more than 1000 ms, with its full text and duration, to the PostgreSQL log. Lower the threshold as you fix issues.

B-tree Indexes: The Default Workhorse

A B-tree index is what PostgreSQL creates when you write CREATE INDEX with no further specification. It supports equality, range queries, and ORDER BY on a single column or a set of columns.

CREATE INDEX ON users (email);

This immediately speeds up:

SELECT * FROM users WHERE email = '[email protected]';
SELECT * FROM users WHERE email LIKE 'alice%';  -- prefix match only

PostgreSQL will use the index for =, <, <=, >, >=, BETWEEN, and IN predicates on the indexed column.

B-tree indexes are not helpful when the query has no selective predicate (e.g., returning 90% of the table), or when the indexed column has very low cardinality (e.g., a boolean column — an index on a boolean is almost never useful).

Composite Indexes: Column Order Matters

A composite index covers multiple columns. The order of columns determines which queries benefit from it.

CREATE INDEX ON orders (user_id, created_at);

This index helps:

  • WHERE user_id = 42 (leftmost column alone)
  • WHERE user_id = 42 AND created_at > '2026-01-01' (both columns)

It does not help:

  • WHERE created_at > '2026-01-01' (range column without the leading equality column)

The general rule: put equality columns first, range columns last. The planner can use a leading prefix of a composite index but cannot skip columns to reach a later one.

Partial Indexes: Index Only What You Query

A partial index covers only the rows that satisfy a WHERE clause. This makes the index smaller, faster to scan, and more selective:

CREATE INDEX ON orders (created_at) WHERE status = 'pending';

If your application constantly queries for pending orders — and pending orders are a small fraction of the total — this index is dramatically more efficient than a full index on created_at. The index only contains rows where status = 'pending', so it stays small even as the orders table grows to millions of completed rows.

Partial indexes are one of the most underused features in PostgreSQL. Any time you have a frequently queried subset of a table, consider one.

Unique Indexes

A unique index enforces a uniqueness constraint while simultaneously speeding up lookups. PostgreSQL creates one automatically for PRIMARY KEY and UNIQUE constraints, but you can also create one explicitly:

CREATE UNIQUE INDEX ON users (email);

This is equivalent to adding UNIQUE to the column definition, but can be added to an existing table without rewriting it (the index build happens concurrently if you use CREATE INDEX CONCURRENTLY).

GIN Indexes: Full-Text Search and JSONB

B-tree indexes work on scalar values. For full-text search vectors and JSONB documents, you need a GIN (Generalized Inverted Index).

For full-text search:

ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);
CREATE INDEX ON articles USING GIN (search_vector);

Now this query uses the index:

SELECT * FROM articles WHERE search_vector @@ to_tsquery('english', 'wireguard & vpn');

For JSONB containment queries:

CREATE INDEX ON events USING GIN (payload);
SELECT * FROM events WHERE payload @> '{"type": "login"}';

The @> (containment) and ? (key existence) operators on JSONB use GIN indexes. Regular B-tree indexes on JSONB expressions only help for equality on extracted scalar values.

BRIN Indexes: Tiny Indexes for Huge Append-Only Tables

A BRIN (Block Range INdex) stores the minimum and maximum value of a column for each range of physical pages. The index is tiny — often just a few kilobytes for a multi-gigabyte table — but it is only useful when column values are correlated with physical storage order.

CREATE INDEX ON sensor_readings USING BRIN (recorded_at);

This is ideal for time-series tables where rows are inserted in timestamp order. A query for a recent time range can skip the vast majority of pages using the BRIN index. It is not appropriate for tables where the column values are randomly distributed across pages.

Expression Indexes: Index on a Computed Value

If your queries transform a column before comparing it, an index on the raw column will not help. Create an index on the expression instead:

CREATE INDEX ON users (LOWER(email));

Now this query uses the index:

SELECT * FROM users WHERE LOWER(email) = '[email protected]';

Without the expression index, PostgreSQL would do a sequential scan even if there were a B-tree index on email, because LOWER(email) is not the same as email from the planner's perspective.

VACUUM, ANALYZE, and Table Bloat

PostgreSQL uses MVCC (Multi-Version Concurrency Control). When a row is updated or deleted, the old version is not immediately removed — it is marked as dead. These dead tuples accumulate as table bloat, slowing sequential scans and wasting disk space.

VACUUM removes dead tuples and returns space to the free space map (though not necessarily to the OS). VACUUM FULL reclaims disk space but locks the table exclusively and should be used sparingly.

ANALYZE updates the table statistics that the planner uses to estimate row counts. Without fresh statistics, the planner makes poor decisions.

Run both together:

VACUUM ANALYZE orders;

PostgreSQL's autovacuum daemon runs this automatically, but it may lag behind on very busy tables. Key autovacuum parameters to tune in postgresql.conf:

autovacuum_vacuum_scale_factor = 0.01   -- vacuum when 1% of rows are dead (default 20%)
autovacuum_analyze_scale_factor = 0.005 -- analyze when 0.5% of rows change (default 10%)

Lower thresholds mean more frequent autovacuum, which keeps bloat and stale statistics under control on large, frequently updated tables.

Connection Pooling: Why Raw Connections Kill Performance

PostgreSQL spawns a new OS process for each client connection. At 20–50 connections this is fine. At 500–1000 connections, the overhead of process scheduling and memory consumption degrades performance for everyone, and connection establishment time dominates query latency.

The solution is PgBouncer, a lightweight connection pooler. In transaction mode, a server connection is held only for the duration of a transaction, then returned to the pool:

[databases]
mydb = host=localhost dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

With PgBouncer in front, 1000 application connections share 25 real PostgreSQL connections. Query throughput increases dramatically because the database is not thrashing between hundreds of OS processes.

Finding Unused Indexes

Every index has a write cost — every INSERT, UPDATE, and DELETE must update all indexes on the table. Unused indexes waste I/O and slow down writes. Find them:

SELECT schemaname,
       tablename,
       indexname,
       idx_scan,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_%'
ORDER BY pg_relation_size(indexrelid) DESC;

Any index with idx_scan = 0 has never been used since statistics were last reset. Before dropping, confirm by resetting stats (SELECT pg_stat_reset()) and waiting a few days in production traffic. Then drop with:

DROP INDEX CONCURRENTLY indexname;

CONCURRENTLY drops the index without locking writes on the table.

Case Study: 30-Second Query to 50ms

Here is a realistic scenario. An e-commerce platform's order history page takes 30 seconds to load for users with many orders.

Step 1: Identify the query with pg_stat_statements

SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 5;

The slow query is:

SELECT * FROM orders WHERE customer_id = $1 AND status = 'delivered' ORDER BY created_at DESC LIMIT 20;

Step 2: Run EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'delivered'
ORDER BY created_at DESC LIMIT 20;

Output (abbreviated):

Seq Scan on orders  (cost=0.00..52000.00 rows=18 width=128)
                    (actual time=0.05..29840.12 rows=20 loops=1)
  Filter: ((customer_id = 123) AND (status = 'delivered'))
  Rows Removed by Filter: 2199980
Execution Time: 29841.3 ms

The planner scanned all 2.2 million rows and discarded nearly all of them. There is no index on customer_id.

Step 3: Add the index

CREATE INDEX CONCURRENTLY ON orders (customer_id, created_at DESC) WHERE status = 'delivered';

This composite partial index covers customer_id (equality), created_at (sort order), and filters out non-delivered rows entirely.

Step 4: Run EXPLAIN ANALYZE again

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'delivered'
ORDER BY created_at DESC LIMIT 20;

Output:

Index Scan using orders_customer_id_created_at_idx on orders
  (cost=0.56..85.40 rows=20 width=128)
  (actual time=0.032..0.481 rows=20 loops=1)
Execution Time: 0.52 ms

The query now uses an Index Scan, reads exactly 20 rows, and completes in under 1 millisecond. End-to-end with application overhead: 50ms.

The improvement came from three decisions made together: using a composite index to satisfy both the filter and the sort, making it a partial index to keep it small and selective, and including the DESC sort direction to avoid a separate sort step.

Conclusion

Slow PostgreSQL queries are almost always fixable. The workflow is consistent: find the slow query via pg_stat_statements or the slow query log, run EXPLAIN (ANALYZE, BUFFERS) to read the actual execution plan, identify the scan type and row count estimates, add the right index for the access pattern, and measure the improvement. The index types covered here — B-tree, composite, partial, unique, GIN, BRIN, and expression — cover the vast majority of real-world performance problems. Combine them with regular VACUUM ANALYZE, connection pooling via PgBouncer, and periodic cleanup of unused indexes, and PostgreSQL will handle enormous workloads with ease.

Leonardo Lazzaro

Software engineer and technical writer. 10+ years experience in DevOps, Python, and Linux systems.

More articles by Leonardo Lazzaro