Fast bulk insert with sqlalchemy

Introduction

SQLAlchemy supports two ways of bulk operations by using the session. One way is using objects and the other way is using python dictionaries.

Using these bulk methods of the session will bypass the normal unit of work mechanics of state, relationship and attribute management.

Using the bulk_insert_mappings or bulk_update_mappings will save you the overhead that SQLAlchemy objects have, like high memory usage.

We will see the two ways of bulk operation in this tutorial. With this tutorial, you will learn how to insert a fast record with SQLAlchemy.

Finally, we show a way to bypass the ORM and use the database engine directly.

Using python dicts to insert data with SQLAlchemy

One of the fastest and easy ways to insert/update a lot of registries into the database using SQLAlchemy is by using the bulk_insert_mappings. With bulk_insert_mappings you will have fast bulk inserts into the database.

Suppose we have the following model:

from sqlalchemy.orm import relationship
from sqlalchemy import (
    Column, 
    Integer, 
    String, 
    ForeignKey,
)

class User(Base):
    __tablename__ = ‘user’
    id = Column(Integer, primary_key=True)
    username = Column(String, unique=True)
    role_id = Column(Integer, ForeignKey(‘role.id’))
    role = relationship(‘Role’, backref=’users’)

class Role(Base):
    __tablename__ = ‘role’
    id = Column(Integer, primary_key=True)
    name = Column(String, unique=True)

Then you python dictionary keys should match the model attributes. For example in the next code we insert 4 users:

users_to_insert = [dict(username="john"), dict(username="mary"), dict(username="susan")]
s.bulk_insert_mappings(User, users_to_insert)

If you have relationships you can use ids in a dictionary to reference the relationship.

users_to_insert = [dict(username="john"), dict(username="mary"), dict(username="susan")]
s.bulk_insert_mappings(User, users_to_insert)

Loading a csv very fast with SQLALchemy

Here is a full example using a bulk insert with SQLAlchemy to load a csv.

import csv
from sqlalchemy import create_engine
from sqlalchemy.orm.session import sessionmaker

conn_string = ‘sqlite:///test.sqlite3’
engine = create_engine(conn_string)
Base.metadata.create_all(engine) # here we create all tables
Session = sessionmaker(bind=engine)
session = Session()

with open(‘million_users.csv’, ‘r’) as csv_file:
    csv_reader = csv.reader(csv_file)

buffer = []
for row in reader:
    buffer.append({
        ‘username’: row[0]
    })
    if len(buffer) % 10000 == 0:
        session.bulk_insert_mappings(buffer)
        buffer = []

session.bulk_insert_mappings(buffer)

The code above will insert 10000 rows every time the buffer has 9999 items. The last line is to insert the last elements of the buffer. You can increase or lower the 10000 buffer size to play with memory usage and database performance.

Bulk operations with SQLAlchemy objects

The bulk operation with SQL Alchemy is very similar to the previous one, but in this case, we use objects defined in your models.

from models import User
users_to_insert = [User(username="john"), User(username="mary"), User(username="susan")]
s.bulk_save_objects(users_to_insert)
s.commit()

As you can see in the example we switched the dict to the User and the list to insert/update has User instances. User instance uses a lot more memory than a python dictionary. We don’t have a separate method for insert or update when using objects.

Bulk insert without session

In this section, we show you how to do super fast bulk inserts using the engine directly.

conn_string = ‘sqlite:///test.sqlite3’
engine = create_engine(conn_string)
users = [User(username="john"), User(username="mary"), User(username="susan")]
engine.execute(User.__table__.insert(), users)

As you can see in the code above, we are not using any session and instead, we use the engine directly. This will skip any ORM checks and SQLAlchemy session will not track this newly created objects (unless you refresh the session).