PostgreSQL: Create Database with UTF-8 Encoding
Creating a PostgreSQL database with UTF-8 encoding is essential for applications that need to support international characters, emojis, or any non-ASCII text. This comprehensive guide covers how to create databases with UTF-8 encoding, fix common encoding errors, and convert existing databases from SQL_ASCII to UTF-8.

Quick Reference: Create UTF-8 Database
-- Create database with UTF-8 encoding
CREATE DATABASE mydb
WITH
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
TEMPLATE = template0;
Or from command line:
createdb -E UTF8 -l en_US.UTF-8 -T template0 mydb
Understanding PostgreSQL Encoding
What is Database Encoding?
Database encoding determines how text data is stored in PostgreSQL:
- UTF-8: Universal encoding supporting all Unicode characters (recommended)
- SQL_ASCII: No encoding validation, stores raw bytes (problematic)
- LATIN1: Western European characters only
- Other: Various regional encodings
Why UTF-8 Matters
Without proper UTF-8 encoding, you may encounter:
UnicodeDecodeErrorin Python applicationsEncoding::InvalidByteSequenceErrorin Ruby- Garbled text when displaying international characters
- Data corruption when inserting emojis or special characters
Check Current Database Encoding
Via SQL Query
-- Show encoding for current database
SHOW SERVER_ENCODING;
-- Show encoding for all databases
SELECT datname, pg_encoding_to_char(encoding)
FROM pg_database;
Via Command Line
sudo -u postgres psql -c "SHOW SERVER_ENCODING" your_database
If the output shows SQL_ASCII, you need to convert to UTF-8.
Creating a New Database with UTF-8
Method 1: Using CREATE DATABASE
CREATE DATABASE myapp_production
WITH
OWNER = myapp_user
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
TEMPLATE = template0;
Important: Use TEMPLATE = template0 to override the default template's encoding.
Method 2: Using createdb Command
# Basic UTF-8 database
createdb -E UTF8 -T template0 mydb
# With full options
createdb \
--encoding=UTF8 \
--locale=en_US.UTF-8 \
--template=template0 \
--owner=myuser \
mydb
Method 3: With Specific Locale
For other languages:
-- German
CREATE DATABASE german_db
ENCODING = 'UTF8'
LC_COLLATE = 'de_DE.UTF-8'
LC_CTYPE = 'de_DE.UTF-8'
TEMPLATE = template0;
-- Japanese
CREATE DATABASE japanese_db
ENCODING = 'UTF8'
LC_COLLATE = 'ja_JP.UTF-8'
LC_CTYPE = 'ja_JP.UTF-8'
TEMPLATE = template0;
-- French
CREATE DATABASE french_db
ENCODING = 'UTF8'
LC_COLLATE = 'fr_FR.UTF-8'
LC_CTYPE = 'fr_FR.UTF-8'
TEMPLATE = template0;
Common Error: Template Encoding Incompatibility
When creating a UTF-8 database, you might see:
ERROR: new encoding (UTF8) is incompatible with the encoding
of the template database (SQL_ASCII)
HINT: Use the same encoding as in the template database,
or use template0 as template.
Solution: Use template0
Always specify TEMPLATE = template0 when creating databases with non-default encoding:
CREATE DATABASE mydb
ENCODING = 'UTF8'
TEMPLATE = template0;
Better Solution: Fix template1 Encoding
Fix the default template database so all new databases are UTF-8:
-- Connect as superuser
\c postgres
-- Update template1 settings
UPDATE pg_database SET datistemplate = FALSE WHERE datname = 'template1';
-- Drop and recreate template1
DROP DATABASE template1;
CREATE DATABASE template1
WITH TEMPLATE = template0
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8';
-- Mark as template again
UPDATE pg_database SET datistemplate = TRUE WHERE datname = 'template1';
-- Vacuum freeze the new template
\c template1
VACUUM FREEZE;
Now all new databases will default to UTF-8.
Converting Existing Database to UTF-8
Warning: This process requires downtime. Always backup first!
Step 1: Create Backup
# Dump the database
pg_dump -U postgres -Fc mydb > mydb_backup.dump
# Or plain SQL for manual inspection
pg_dump -U postgres mydb > mydb_backup.sql
Step 2: Drop and Recreate Database
# Drop the old database
dropdb -U postgres mydb
# Create with UTF-8
createdb -U postgres -E UTF8 -T template0 mydb
Step 3: Restore Data
# Restore from custom format
pg_restore -U postgres -d mydb mydb_backup.dump
# Or from plain SQL
psql -U postgres -d mydb < mydb_backup.sql
Handling Encoding Conversion Errors
If you get encoding errors during restore:
# Force client encoding during restore
PGCLIENTENCODING=UTF8 pg_restore -U postgres -d mydb mydb_backup.dump
# Or with psql
psql -U postgres -d mydb -c "SET client_encoding TO 'UTF8';" < mydb_backup.sql
For problematic data, you may need to clean it:
# Convert file encoding (Linux)
iconv -f SQL_ASCII -t UTF-8 mydb_backup.sql > mydb_utf8.sql
# Remove invalid UTF-8 sequences
iconv -f UTF-8 -t UTF-8//IGNORE mydb_backup.sql > mydb_clean.sql
PostgreSQL Cluster: Reinitialize with UTF-8
For fresh installations or when you can rebuild the entire cluster:
# Stop PostgreSQL
sudo systemctl stop postgresql
# Drop the cluster (DESTROYS ALL DATA!)
sudo pg_dropcluster --stop 15 main
# Create new cluster with UTF-8
sudo pg_createcluster --locale=en_US.UTF-8 --start 15 main
# Verify encoding
sudo -u postgres psql -c "SHOW SERVER_ENCODING"
Replace 15 with your PostgreSQL version.
Framework-Specific Issues
Ruby on Rails
When running rake db:create with encoding issues:
# config/database.yml
production:
adapter: postgresql
encoding: unicode
database: myapp_production
template: template0
Django
In settings.py:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'OPTIONS': {
'client_encoding': 'UTF8',
},
}
}
Node.js (pg)
const { Pool } = require('pg');
const pool = new Pool({
database: 'mydb',
client_encoding: 'UTF8',
});
Verify Encoding After Changes
-- Check database encoding
SELECT
datname,
pg_encoding_to_char(encoding) as encoding,
datcollate,
datctype
FROM pg_database
WHERE datname = 'mydb';
-- Test inserting Unicode
INSERT INTO test_table (text_column) VALUES ('Hello δΈη π');
SELECT * FROM test_table;
Common Locale Values
| Locale | Language/Region |
|---|---|
| en_US.UTF-8 | English (US) |
| en_GB.UTF-8 | English (UK) |
| de_DE.UTF-8 | German |
| fr_FR.UTF-8 | French |
| es_ES.UTF-8 | Spanish |
| pt_BR.UTF-8 | Portuguese (Brazil) |
| ja_JP.UTF-8 | Japanese |
| zh_CN.UTF-8 | Chinese (Simplified) |
| ko_KR.UTF-8 | Korean |
| C.UTF-8 | POSIX (no locale-specific rules) |
Troubleshooting
"Locale not found" Error
Install the locale on your system:
# Debian/Ubuntu
sudo locale-gen en_US.UTF-8
sudo update-locale
# RHEL/CentOS
sudo localedef -i en_US -f UTF-8 en_US.UTF-8
Performance Considerations
UTF-8 uses variable-width encoding (1-4 bytes per character). For ASCII-only data, this has minimal impact. For heavily international text, consider:
- Using
C.UTF-8locale for faster sorting (no locale-specific rules) - Appropriate indexes for text search
-- Use C collation for faster, byte-based sorting
CREATE INDEX idx_name ON users (name COLLATE "C");
Related PostgreSQL Guides
- PostgreSQL Query Analysis and Optimization
- PostgreSQL Configuration Tuning
- Moving PostgreSQL Data Directory
- Upgrade PostgreSQL to Version 10
- Using Go with PostgreSQL
Conclusion
Creating PostgreSQL databases with UTF-8 encoding is essential for modern applications. The key points to remember:
- Always use
TEMPLATE = template0when specifying encoding - Fix
template1for consistent defaults across all new databases - Backup before converting existing databases
- Verify encoding after any changes
With proper UTF-8 encoding, your PostgreSQL database will correctly handle any international text, emojis, or special characters your application needs to store.