}

PostgreSQL: Create Database with UTF-8 Encoding [Complete Guide]

PostgreSQL: Create Database with UTF-8 Encoding

Creating a PostgreSQL database with UTF-8 encoding is essential for applications that need to support international characters, emojis, or any non-ASCII text. This comprehensive guide covers how to create databases with UTF-8 encoding, fix common encoding errors, and convert existing databases from SQL_ASCII to UTF-8.

PostgreSQL Logo

Quick Reference: Create UTF-8 Database

-- Create database with UTF-8 encoding
CREATE DATABASE mydb
    WITH 
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TEMPLATE = template0;

Or from command line:

createdb -E UTF8 -l en_US.UTF-8 -T template0 mydb

Understanding PostgreSQL Encoding

What is Database Encoding?

Database encoding determines how text data is stored in PostgreSQL:

  • UTF-8: Universal encoding supporting all Unicode characters (recommended)
  • SQL_ASCII: No encoding validation, stores raw bytes (problematic)
  • LATIN1: Western European characters only
  • Other: Various regional encodings

Why UTF-8 Matters

Without proper UTF-8 encoding, you may encounter:

  • UnicodeDecodeError in Python applications
  • Encoding::InvalidByteSequenceError in Ruby
  • Garbled text when displaying international characters
  • Data corruption when inserting emojis or special characters

Check Current Database Encoding

Via SQL Query

-- Show encoding for current database
SHOW SERVER_ENCODING;

-- Show encoding for all databases
SELECT datname, pg_encoding_to_char(encoding) 
FROM pg_database;

Via Command Line

sudo -u postgres psql -c "SHOW SERVER_ENCODING" your_database

If the output shows SQL_ASCII, you need to convert to UTF-8.

Creating a New Database with UTF-8

Method 1: Using CREATE DATABASE

CREATE DATABASE myapp_production
    WITH 
    OWNER = myapp_user
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1
    TEMPLATE = template0;

Important: Use TEMPLATE = template0 to override the default template's encoding.

Method 2: Using createdb Command

# Basic UTF-8 database
createdb -E UTF8 -T template0 mydb

# With full options
createdb \
    --encoding=UTF8 \
    --locale=en_US.UTF-8 \
    --template=template0 \
    --owner=myuser \
    mydb

Method 3: With Specific Locale

For other languages:

-- German
CREATE DATABASE german_db
    ENCODING = 'UTF8'
    LC_COLLATE = 'de_DE.UTF-8'
    LC_CTYPE = 'de_DE.UTF-8'
    TEMPLATE = template0;

-- Japanese
CREATE DATABASE japanese_db
    ENCODING = 'UTF8'
    LC_COLLATE = 'ja_JP.UTF-8'
    LC_CTYPE = 'ja_JP.UTF-8'
    TEMPLATE = template0;

-- French
CREATE DATABASE french_db
    ENCODING = 'UTF8'
    LC_COLLATE = 'fr_FR.UTF-8'
    LC_CTYPE = 'fr_FR.UTF-8'
    TEMPLATE = template0;

Common Error: Template Encoding Incompatibility

When creating a UTF-8 database, you might see:

ERROR: new encoding (UTF8) is incompatible with the encoding 
of the template database (SQL_ASCII)
HINT: Use the same encoding as in the template database, 
or use template0 as template.

Solution: Use template0

Always specify TEMPLATE = template0 when creating databases with non-default encoding:

CREATE DATABASE mydb 
    ENCODING = 'UTF8' 
    TEMPLATE = template0;

Better Solution: Fix template1 Encoding

Fix the default template database so all new databases are UTF-8:

-- Connect as superuser
\c postgres

-- Update template1 settings
UPDATE pg_database SET datistemplate = FALSE WHERE datname = 'template1';

-- Drop and recreate template1
DROP DATABASE template1;

CREATE DATABASE template1 
    WITH TEMPLATE = template0 
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8';

-- Mark as template again
UPDATE pg_database SET datistemplate = TRUE WHERE datname = 'template1';

-- Vacuum freeze the new template
\c template1
VACUUM FREEZE;

Now all new databases will default to UTF-8.

Converting Existing Database to UTF-8

Warning: This process requires downtime. Always backup first!

Step 1: Create Backup

# Dump the database
pg_dump -U postgres -Fc mydb > mydb_backup.dump

# Or plain SQL for manual inspection
pg_dump -U postgres mydb > mydb_backup.sql

Step 2: Drop and Recreate Database

# Drop the old database
dropdb -U postgres mydb

# Create with UTF-8
createdb -U postgres -E UTF8 -T template0 mydb

Step 3: Restore Data

# Restore from custom format
pg_restore -U postgres -d mydb mydb_backup.dump

# Or from plain SQL
psql -U postgres -d mydb < mydb_backup.sql

Handling Encoding Conversion Errors

If you get encoding errors during restore:

# Force client encoding during restore
PGCLIENTENCODING=UTF8 pg_restore -U postgres -d mydb mydb_backup.dump

# Or with psql
psql -U postgres -d mydb -c "SET client_encoding TO 'UTF8';" < mydb_backup.sql

For problematic data, you may need to clean it:

# Convert file encoding (Linux)
iconv -f SQL_ASCII -t UTF-8 mydb_backup.sql > mydb_utf8.sql

# Remove invalid UTF-8 sequences
iconv -f UTF-8 -t UTF-8//IGNORE mydb_backup.sql > mydb_clean.sql

PostgreSQL Cluster: Reinitialize with UTF-8

For fresh installations or when you can rebuild the entire cluster:

# Stop PostgreSQL
sudo systemctl stop postgresql

# Drop the cluster (DESTROYS ALL DATA!)
sudo pg_dropcluster --stop 15 main

# Create new cluster with UTF-8
sudo pg_createcluster --locale=en_US.UTF-8 --start 15 main

# Verify encoding
sudo -u postgres psql -c "SHOW SERVER_ENCODING"

Replace 15 with your PostgreSQL version.

Framework-Specific Issues

Ruby on Rails

When running rake db:create with encoding issues:

# config/database.yml
production:
  adapter: postgresql
  encoding: unicode
  database: myapp_production
  template: template0

Django

In settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'mydb',
        'OPTIONS': {
            'client_encoding': 'UTF8',
        },
    }
}

Node.js (pg)

const { Pool } = require('pg');

const pool = new Pool({
  database: 'mydb',
  client_encoding: 'UTF8',
});

Verify Encoding After Changes

-- Check database encoding
SELECT 
    datname,
    pg_encoding_to_char(encoding) as encoding,
    datcollate,
    datctype
FROM pg_database
WHERE datname = 'mydb';

-- Test inserting Unicode
INSERT INTO test_table (text_column) VALUES ('Hello δΈ–η•Œ 🌍');
SELECT * FROM test_table;

Common Locale Values

Locale Language/Region
en_US.UTF-8 English (US)
en_GB.UTF-8 English (UK)
de_DE.UTF-8 German
fr_FR.UTF-8 French
es_ES.UTF-8 Spanish
pt_BR.UTF-8 Portuguese (Brazil)
ja_JP.UTF-8 Japanese
zh_CN.UTF-8 Chinese (Simplified)
ko_KR.UTF-8 Korean
C.UTF-8 POSIX (no locale-specific rules)

Troubleshooting

"Locale not found" Error

Install the locale on your system:

# Debian/Ubuntu
sudo locale-gen en_US.UTF-8
sudo update-locale

# RHEL/CentOS
sudo localedef -i en_US -f UTF-8 en_US.UTF-8

Performance Considerations

UTF-8 uses variable-width encoding (1-4 bytes per character). For ASCII-only data, this has minimal impact. For heavily international text, consider:

  • Using C.UTF-8 locale for faster sorting (no locale-specific rules)
  • Appropriate indexes for text search
-- Use C collation for faster, byte-based sorting
CREATE INDEX idx_name ON users (name COLLATE "C");

Related PostgreSQL Guides

Conclusion

Creating PostgreSQL databases with UTF-8 encoding is essential for modern applications. The key points to remember:

  1. Always use TEMPLATE = template0 when specifying encoding
  2. Fix template1 for consistent defaults across all new databases
  3. Backup before converting existing databases
  4. Verify encoding after any changes

With proper UTF-8 encoding, your PostgreSQL database will correctly handle any international text, emojis, or special characters your application needs to store.