Build a Python App with CockroachDB and SQLAlchemy

Warning:
Cockroach Labs will stop providing Assistance Support for this version on November 12, 2021. Prior to that date, upgrade to a more recent version to continue receiving support. For more details, see the Release Support Policy.
Tip:

Take our newest Cockroach University course, CockroachDB for Python Developers.

This tutorial shows you how build a simple Python application with CockroachDB and the SQLAlchemy ORM. For the CockroachDB back-end, you'll use a temporary local cluster.

Note:

The example code on this page uses Python 3.

Warning:

SQLAlchemy relies on the existence of foreign keys to generate JOIN expressions from your application code. If you remove foreign keys from your schema, SQLAlchemy will not generate joins for you. As a workaround, you can create a "custom foreign condition" by adding a relationship field to your table objects, or do the equivalent work in your application.

Step 1. Install SQLAlchemy

To install SQLAlchemy, as well as a CockroachDB Python package that accounts for some differences between CockroachDB and PostgreSQL, run the following command:

icon/buttons/copy
$ pip install sqlalchemy sqlalchemy-cockroachdb psycopg2
Tip:

You can substitute psycopg2 for other alternatives that include the psycopg python package.

For other ways to install SQLAlchemy, see the official documentation.

Step 2. Start CockroachDB

  1. If you haven't already, download the CockroachDB binary.
  2. Run the cockroach demo command:

    icon/buttons/copy
    $ cockroach demo \
    --empty
    

    This starts a temporary, in-memory cluster and opens an interactive SQL shell to the cluster.

  3. Take note of the (sql/tcp) connection string in the SQL shell welcome text:

    # Connection parameters:
    #   (console) http://127.0.0.1:61009
    #   (sql)     postgres://root:admin@?host=%2Fvar%2Ffolders%2Fk1%2Fr048yqpd7_9337rgxm9vb_gw0000gn%2FT%2Fdemo255013852&port=26257
    #   (sql/tcp) postgres://root:admin@127.0.0.1:61011?sslmode=require    
    

    You will use it in your application code later.

Step 3. Create a database

  1. In the SQL shell, create the bank database that your application will use:

    icon/buttons/copy
    > CREATE DATABASE bank;
    
  2. Create a SQL user for your app:

    icon/buttons/copy
    > CREATE USER <username> WITH PASSWORD <password>;
    

    Take note of the username and password. You will use it in your application code later.

  3. Give the user the necessary permissions:

    icon/buttons/copy
    > GRANT ALL ON DATABASE bank TO <username>;
    

Step 4. Run the Python code

The code below uses SQLAlchemy to map Python objects and methods to SQL operations.

You can run this script as many times as you want; on each run, the script will create some new accounts and shuffle money around between randomly selected accounts.

Specifically, the script:

  1. Reads in existing account IDs (if any) from the bank database.
  2. Creates additional accounts with randomly generated IDs. Then, it adds a bit of money to each new account.
  3. Chooses two accounts at random and takes half of the money from the first and deposits it into the second.

It does all of the above using the practices we recommend for using SQLAlchemy with CockroachDB, which are listed in the Best practices section below.

Note:

You must use the cockroachdb:// prefix in the URL passed to sqlalchemy.create_engine to make sure the cockroachdb dialect is used. Using the postgres:// URL prefix to connect to your CockroachDB cluster will not work.

Get the code

Copy the code below or download it directly.

icon/buttons/copy
"""This module performs the following steps sequentially:
    1. Reads in existing account IDs (if any) from the bank database.
    2. Creates additional accounts with randomly generated IDs. Then, it adds a bit of money to each new account.
    3. Chooses two accounts at random and takes half of the money from the first and deposits it into the second.
"""

import random
from math import floor
from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction

Base = declarative_base()



class Account(Base):
    """The Account class corresponds to the "accounts" database table.
    """
    __tablename__ = 'accounts'
    id = Column(Integer, primary_key=True)
    balance = Column(Integer)


# Create an engine to communicate with the database. The
# "cockroachdb://" prefix for the engine URL indicates that we are
# connecting to CockroachDB using the 'cockroachdb' dialect.
# For more information, see
# https://github.com/cockroachdb/sqlalchemy-cockroachdb.

engine = create_engine(
    # For cockroach demo:
    'cockroachdb://<username>:<password>@<hostname>:<port>/bank?sslmode=require',
    echo=True                   # Log SQL queries to stdout
)

# Automatically create the "accounts" table based on the Account class.
Base.metadata.create_all(engine)


# Store the account IDs we create for later use.

seen_account_ids = set()


# The code below generates random IDs for new accounts.

def create_random_accounts(sess, num):
    """Create N new accounts with random IDs and random account balances.
    Note that since this is a demo, we do not do any work to ensure the
    new IDs do not collide with existing IDs.
    """
    new_accounts = []
    while num > 0:
        billion = 1000000000
        new_id = floor(random.random()*billion)
        seen_account_ids.add(new_id)
        new_accounts.append(
            Account(
                id=new_id,
                balance=floor(random.random()*1000000)
            )
        )
        num = num - 1
    sess.add_all(new_accounts)


run_transaction(sessionmaker(bind=engine),
                lambda s: create_random_accounts(s, 100))


def get_random_account_id():
    """ Helper function for getting random existing account IDs.
    """
    random_id = random.choice(tuple(seen_account_ids))
    return random_id


def transfer_funds_randomly(session):
    """Transfer money randomly between accounts (during SESSION).
    Cuts a randomly selected account's balance in half, and gives the
    other half to some other randomly selected account.
    """
    source_id = get_random_account_id()
    sink_id = get_random_account_id()

    source = session.query(Account).filter_by(id=source_id).one()
    amount = floor(source.balance/2)

    # Check balance of the first account.
    if source.balance < amount:
        raise "Insufficient funds"

    source.balance -= amount
    session.query(Account).filter_by(id=sink_id).update(
        {"balance": (Account.balance + amount)}
    )


# Run the transfer inside a transaction.

run_transaction(sessionmaker(bind=engine), transfer_funds_randomly)

Update the connection parameters

In the create_engine() function, update the connection string as follows:

  • Replace <username> and <password> with the SQL username and password that you created earlier.
  • Replace <hostname> and <port> with the hostname and port in the (sql/tcp) connection string from SQL shell welcome text.

Run the code

icon/buttons/copy
$ python3 example.py

The output should look something like the following:

2020-10-11 16:49:48,048 INFO sqlalchemy.engine.base.Engine select current_schema()
2020-10-11 16:49:48,048 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,076 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2020-10-11 16:49:48,076 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,077 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2020-10-11 16:49:48,077 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,078 INFO sqlalchemy.engine.base.Engine select version()
2020-10-11 16:49:48,078 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,079 INFO sqlalchemy.engine.base.Engine SELECT table_name FROM information_schema.tables WHERE table_schema=%s
2020-10-11 16:49:48,079 INFO sqlalchemy.engine.base.Engine ('public',)
2020-10-11 16:49:48,096 INFO sqlalchemy.engine.base.Engine
CREATE TABLE accounts (
    id SERIAL NOT NULL,
    balance INTEGER,
    PRIMARY KEY (id)
)


2020-10-11 16:49:48,096 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,135 INFO sqlalchemy.engine.base.Engine COMMIT
2020-10-11 16:49:48,137 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-10-11 16:49:48,138 INFO sqlalchemy.engine.base.Engine SAVEPOINT cockroach_restart
2020-10-11 16:49:48,138 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,145 INFO sqlalchemy.engine.base.Engine INSERT INTO accounts (id, balance) VALUES (%(id)s, %(balance)s)
2020-10-11 16:49:48,145 INFO sqlalchemy.engine.base.Engine ({'id': 114550846, 'balance': 521920}, {'id': 959765825, 'balance': 107843}, {'id': 992234225, 'balance': 743056}, {'id': 524035239, 'balance': 883288}, {'id': 338833325, 'balance': 390589}, {'id': 298479318, 'balance': 878646}, {'id': 173609938, 'balance': 262413}, {'id': 678216195, 'balance': 791789}  ... displaying 10 of 100 total bound parameter sets ...  {'id': 531287362, 'balance': 589865}, {'id': 521940595, 'balance': 103451})
2020-10-11 16:49:48,266 INFO sqlalchemy.engine.base.Engine RELEASE SAVEPOINT cockroach_restart
2020-10-11 16:49:48,266 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,268 INFO sqlalchemy.engine.base.Engine COMMIT
2020-10-11 16:49:48,269 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-10-11 16:49:48,269 INFO sqlalchemy.engine.base.Engine SAVEPOINT cockroach_restart
2020-10-11 16:49:48,269 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,271 INFO sqlalchemy.engine.base.Engine SELECT accounts.id AS accounts_id, accounts.balance AS accounts_balance
FROM accounts
WHERE accounts.id = %(id_1)s
2020-10-11 16:49:48,271 INFO sqlalchemy.engine.base.Engine {'id_1': 721940623}
2020-10-11 16:49:48,273 INFO sqlalchemy.engine.base.Engine UPDATE accounts SET balance=%(balance)s WHERE accounts.id = %(accounts_id)s
2020-10-11 16:49:48,273 INFO sqlalchemy.engine.base.Engine {'balance': 50080, 'accounts_id': 721940623}
2020-10-11 16:49:48,275 INFO sqlalchemy.engine.base.Engine UPDATE accounts SET balance=(accounts.balance + %(balance_1)s) WHERE accounts.id = %(id_1)s
2020-10-11 16:49:48,275 INFO sqlalchemy.engine.base.Engine {'balance_1': 50080, 'id_1': 984244739}
2020-10-11 16:49:48,294 INFO sqlalchemy.engine.base.Engine RELEASE SAVEPOINT cockroach_restart
2020-10-11 16:49:48,294 INFO sqlalchemy.engine.base.Engine {}
2020-10-11 16:49:48,295 INFO sqlalchemy.engine.base.Engine COMMIT

Back in the terminal where the SQL shell is running, verify that the table and rows were created successfully:

icon/buttons/copy
> SELECT COUNT(*) FROM bank.accounts;
 count
-------
   100
(1 row)

Best practices

Use the run_transaction function

We strongly recommend using the sqlalchemy_cockroachdb.run_transaction() function as shown in the code samples on this page. This abstracts the details of transaction retries away from your application code. Transaction retries are more frequent in CockroachDB than in some other databases because we use optimistic concurrency control rather than locking. Because of this, a CockroachDB transaction may have to be tried more than once before it can commit. This is part of how we ensure that our transaction ordering guarantees meet the ANSI SERIALIZABLE isolation level.

In addition to the above, using run_transaction has the following benefits:

  • Because it must be passed a sqlalchemy.orm.session.sessionmaker object (not a session), it ensures that a new session is created exclusively for use by the callback, which protects you from accidentally reusing objects via any sessions created outside the transaction.
  • It abstracts away the client-side transaction retry logic from your application, which keeps your application code portable across different databases. For example, the sample code given on this page works identically when run against Postgres (modulo changes to the prefix and port number in the connection string).

For more information about how transactions (and retries) work, see Transactions.

Avoid mutations of session and/or transaction state inside run_transaction()

In general, this is in line with the recommendations of the SQLAlchemy FAQs, which state (with emphasis added by the original author) that

As a general rule, the application should manage the lifecycle of the session externally to functions that deal with specific data. This is a fundamental separation of concerns which keeps data-specific operations agnostic of the context in which they access and manipulate that data.

and

Keep the lifecycle of the session (and usually the transaction) separate and external.

In keeping with the above recommendations from the official docs, we strongly recommend avoiding any explicit mutations of the transaction state inside the callback passed to run_transaction, since that will lead to breakage. Specifically, do not make calls to the following functions from inside run_transaction:

  • sqlalchemy.orm.Session.commit() (or other variants of commit()): This is not necessary because cockroachdb.sqlalchemy.run_transaction handles the savepoint/commit logic for you.
  • sqlalchemy.orm.Session.rollback() (or other variants of rollback()): This is not necessary because cockroachdb.sqlalchemy.run_transaction handles the commit/rollback logic for you.
  • Session.flush(): This will not work as expected with CockroachDB because CockroachDB does not support nested transactions, which are necessary for Session.flush() to work properly. If the call to Session.flush() encounters an error and aborts, it will try to rollback. This will not be allowed by the currently-executing CockroachDB transaction created by run_transaction(), and will result in an error message like the following: sqlalchemy.orm.exc.DetachedInstanceError: Instance <FooModel at 0x12345678> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3).

Break up large transactions into smaller units of work

If you see an error message like transaction is too large to complete; try splitting into pieces, you are trying to commit too much data in a single transaction. As described in our Cluster Settings docs, the size limit for transactions is defined by the kv.transaction.max_intents_bytes setting, which defaults to 256 KiB. Although this setting can be changed by an admin, we strongly recommend against it in most cases.

Instead, we recommend breaking your transaction into smaller units of work (or "chunks"). A pattern that works for inserting large numbers of objects using run_transaction to handle retries automatically for you is shown below.

from sqlalchemy import create_engine, Column, Float, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction
from random import random

Base = declarative_base()

# The code below assumes you have run the following SQL statements.

# CREATE DATABASE pointstore;

# USE pointstore;

# CREATE TABLE points (
#     id INT PRIMARY KEY DEFAULT unique_rowid(),
#     x FLOAT NOT NULL,
#     y FLOAT NOT NULL,
#     z FLOAT NOT NULL
# );

engine = create_engine(
    # For cockroach demo:
    'cockroachdb://<username>:<password>@<hostname>:<port>/bank?sslmode=require',
    echo=True                   # Log SQL queries to stdout
)


class Point(Base):
    __tablename__ = 'points'
    id = Column(Integer, primary_key=True)
    x = Column(Float)
    y = Column(Float)
    z = Column(Float)


def add_points(num_points):
    chunk_size = 1000        # Tune this based on object sizes.

    def add_points_helper(sess, chunk, num_points):
        points = []
        for i in range(chunk, min(chunk + chunk_size, num_points)):
            points.append(
                Point(x=random()*1024, y=random()*1024, z=random()*1024)
            )
        sess.bulk_save_objects(points)

    for chunk in range(0, num_points, chunk_size):
        run_transaction(
            sessionmaker(bind=engine),
            lambda s: add_points_helper(
                s, chunk, min(chunk + chunk_size, num_points)
            )
        )


add_points(10000)

Use IMPORT to read in large data sets

If you are trying to get a large data set into CockroachDB all at once (a bulk import), avoid writing client-side code that uses an ORM and use the IMPORT statement instead. It is much faster and more efficient than making a series of INSERTs and UPDATEs such as are generated by calls to session.bulk_save_objects().

For more information about importing data from Postgres, see Migrate from Postgres.

For more information about importing data from MySQL, see Migrate from MySQL.

Prefer the query builder

In general, we recommend using the query-builder APIs of SQLAlchemy (e.g., Engine.execute()) in your application over the Session/ORM APIs if at all possible. That way, you know exactly what SQL is being generated and sent to CockroachDB, which has the following benefits:

  • It's easier to debug your SQL queries and make sure they are working as expected.
  • You can more easily tune SQL query performance by issuing different statements, creating and/or using different indexes, etc. For more information, see SQL Performance Best Practices.

See also

You might also be interested in the following pages:

YesYes NoNo