Build a Simple CRUD Python App with CockroachDB and SQLAlchemy

Tip:

Take our newest Cockroach University course, CockroachDB for Python Developers.

This tutorial shows you how build a simple CRUD Python application with CockroachDB and the SQLAlchemy ORM.

Step 1. Install SQLAlchemy

To install SQLAlchemy, as well as a CockroachDB Python package that accounts for some differences between CockroachDB and PostgreSQL, run the following command:

icon/buttons/copy
$ pip install sqlalchemy sqlalchemy-cockroachdb psycopg2
Tip:

You can substitute psycopg2 for other alternatives that include the psycopg python package.

For other ways to install SQLAlchemy, see the official documentation.

Step 2. Start CockroachDB

Create a free cluster

  1. If you haven't already, sign up for a CockroachCloud account.
  2. Log in to your CockroachCloud account.
  3. On the Clusters page, click Create Cluster.
  4. On the Create your cluster page, select CockroachCloud Free.

    Note:

    This cluster will be free forever.

  5. (Optional) Select a cloud provider (GCP or AWS) in the Additional configuration section.

  6. (Optional) Select a region in the Additional configuration section. For optimal performance, select the cloud provider region closest to the region in which you are running your application.

  7. Click Create your free cluster.

Your cluster will be created in approximately 20-30 seconds.

Set up your cluster connection

  1. Navigate to the cluster's SQL Users page, and create a new user, with a new password.

  2. Navigate to the Cluster Overview page, select Connect, and, under the Connection String tab, download the cluster certificate.

  3. Take note of the connection string provided. You'll use it to connect to the database later in this tutorial.

  1. If you haven't already, download the CockroachDB binary.
  2. Run the cockroach demo command:

    icon/buttons/copy
    $ cockroach demo \
    --no-example-database
    

    This starts a temporary, in-memory cluster and opens an interactive SQL shell to the cluster. Any changes to the database will not persist after the cluster is stopped.

  3. Take note of the (sql) connection string in the SQL shell welcome text:

    # Connection parameters:
    #   (webui)    http://127.0.0.1:8080/demologin?password=demo76950&username=demo
    #   (sql)      postgres://demo:demo76950@127.0.0.1:26257?sslmode=require
    #   (sql/unix) postgres://demo:demo76950@?host=%2Fvar%2Ffolders%2Fc8%2Fb_q93vjj0ybfz0fz0z8vy9zc0000gp%2FT%2Fdemo070856957&port=26257
    

    You'll use this connection string to connect to the database later in this tutorial.

Step 3. Get the code

Clone the code's GitHub repo:

icon/buttons/copy
$ git clone https://github.com/cockroachlabs/example-app-python-sqlalchemy/

The project has the following directory structure:

├── README.md
├── dbinit.sql
├── main.py
└── models.py

The dbinit.sql file initializes the database schema that the application uses:

icon/buttons/copy
SET sql_safe_updates = FALSE;

USE defaultdb;
DROP DATABASE IF EXISTS bank CASCADE;
CREATE DATABASE IF NOT EXISTS bank;

USE bank;

CREATE TABLE accounts (
    id UUID PRIMARY KEY,
    balance INT8
);

The models.py uses SQLAlchemy to map the Accounts table to a Python object:

icon/buttons/copy
from sqlalchemy import Column, Integer
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import declarative_base

Base = declarative_base()


class Account(Base):
    """The Account class corresponds to the "accounts" database table.
    """
    __tablename__ = 'accounts'
    id = Column(UUID(as_uuid=True), primary_key=True)
    balance = Column(Integer)

The main.py uses SQLAlchemy to map Python methods to SQL operations:

icon/buttons/copy
"""This simple CRUD application performs the following operations sequentially:
    1. Initializes a SQL database and table, using the cockroach sql CLI and a .sql file.
    2. Creates 100 new accounts with randomly generated IDs and randomly-computed balance amounts.
    3. Chooses two accounts at random and takes half of the money from the first and deposits it
     into the second.
    4. Chooses five accounts at random and deletes them.
"""

from argparse import ArgumentParser
from math import floor
import os
import random
import uuid

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_cockroachdb import run_transaction

from models import Account

# The code below inserts new accounts.


def create_accounts(session, num):
    """Create N new accounts with random account IDs and account balances.
    """
    print("Creating new accounts...")
    new_accounts = []
    while num > 0:
        account_id = uuid.uuid4()
        account_balance = floor(random.random()*1_000_000)
        new_accounts.append(Account(id=account_id, balance=account_balance))
        seen_account_ids.append(account_id)
        print("Created new account with id {0} and balance {1}.".format(
            account_id, account_balance))
        num = num - 1
    session.add_all(new_accounts)


def transfer_funds_randomly(session, one, two):
    """Transfer money between two accounts.
    """
    source = session.query(Account).filter(Account.id == one).first()
    dest = session.query(Account).filter(Account.id == two).first()
    print("Random account balances:\nAccount {0}: {1}\nAccount {2}: {3}".format(
        one, source.balance, two, dest.balance))

    amount = floor(source.balance/2)
    print("Transferring {0} from account {1} to account {2}...".format(
        amount, one, two))

    # Check balance of the first account.
    if source.balance < amount:
        raise "Insufficient funds in account {0}".format(one)
    else:
        source.balance -= amount
        dest.balance += amount

    print("Transfer complete.\nNew balances:\nAccount {0}: {1}\nAccount {2}: {3}".format(
        one, source.balance, two, dest.balance))


def delete_accounts(session, num):
    """Delete N existing accounts, at random.
    """
    print("Deleting existing accounts...")
    delete_ids = []
    while num > 0:
        delete_id = random.choice(seen_account_ids)
        delete_ids.append(delete_id)
        seen_account_ids.remove(delete_id)
        num = num - 1

    accounts = session.query(Account).filter(Account.id.in_(delete_ids)).all()

    for account in accounts:
        print("Deleted account {0}.".format(account.id))
        session.delete(account)


def parse_cmdline():
    parser = ArgumentParser()
    parser.add_argument("url", help="Enter your node\'s connection string\n")
    parser.add_argument("-n", "--no-init", dest="no_init",
                        action="store_true", help="Do not initialize the database")
    opt = parser.parse_args()
    return opt


if __name__ == '__main__':

    opt = parse_cmdline()
    conn_string = opt.url
    # For cockroach demo:
    # postgres://demo:<demo_password>@127.0.0.1:26257?sslmode=require
    # For CockroachCloud:
    # postgres://<username>:<password>@<globalhost>:26257/<cluster_name>.defaultdb?sslmode=verify-full&sslrootcert=<certs_dir>/<ca.crt>
    try:
        db_uri = os.path.expandvars(conn_string)

        if not opt.no_init:
            print("Initializing the bank database...")
            os.system(
                'cockroach sql --url \'{0}\' -f dbinit.sql'.format(db_uri))
            print("Database initialized.")

        psycopg_uri = db_uri.replace(
            'postgresql://', 'cockroachdb://').replace(
                'postgres://', 'cockroachdb://').replace(
                    '26257?', '26257/bank?')
        # The "cockroachdb://" prefix for the engine URL indicates that we are
        # connecting to CockroachDB using the 'cockroachdb' dialect.
        # For more information, see
        # https://github.com/cockroachdb/sqlalchemy-cockroachdb.
        engine = create_engine(psycopg_uri)
    except Exception as e:
        print('Failed to connect to database.')
        print('{0}'.format(e))

    seen_account_ids = []

    run_transaction(sessionmaker(bind=engine),
                    lambda s: create_accounts(s, 100))

    from_id = random.choice(seen_account_ids)
    to_id = random.choice([id for id in seen_account_ids if id != from_id])

    run_transaction(sessionmaker(bind=engine),
                    lambda s: transfer_funds_randomly(s, from_id, to_id))

    run_transaction(sessionmaker(bind=engine), lambda s: delete_accounts(s, 5))

main.py also executes the main method of the program.

Step 4. Run the code

To run the app, pass the connection string for your cluster to main.py:

icon/buttons/copy
$ python3 main.py '<connection_string>'

Where <connection_string> is the (sql) connection URL provided in the demo cluster's SQL shell welcome text.

Where <connection_string> is the connection string provided in the Connection info window of the CockroachCloud Console.

Note that you need to provide a SQL user password in order to securely connect to a CockroachCloud cluster. The connection string should have a placeholder for the password (<ENTER-PASSWORD>).

The application will format the connection string to fit the CockroachDB SQLAlchemy dialect requirements. It will then initialize the database with the DDL SQL statements in the dbinit.sql. After the table is initialized, the app performs some simple row inserts, updates, and deletes.

The output should look something like the following:

Initializing the bank database...
SET

Time: 0ms

DROP DATABASE

Time: 29ms

CREATE DATABASE

Time: 7ms

SET

Time: 7ms

CREATE TABLE

Time: 3ms

Database initialized.
Creating new accounts...
Created new account with id 3a8b74c8-6a05-4247-9c60-24b46e3a88fd and balance 248835.
Created new account with id c3985926-5b77-4c6d-a73d-7c0d4b2a51e7 and balance 781972.
...
Created new account with id 7b41386c-11d3-465e-a2a0-56e0dcd2e7db and balance 984387.
Random account balances:
Account 7ad14d02-217f-48ca-a53c-2c3a2528a0d9: 800795
Account 4040aeba-7194-4f29-b8e5-a27ed4c7a297: 149861
Transferring 400397 from account 7ad14d02-217f-48ca-a53c-2c3a2528a0d9 to account 4040aeba-7194-4f29-b8e5-a27ed4c7a297...
Transfer complete.
New balances:
Account 7ad14d02-217f-48ca-a53c-2c3a2528a0d9: 400398
Account 4040aeba-7194-4f29-b8e5-a27ed4c7a297: 550258
Deleting existing accounts...
Deleted account 41247e24-6210-4032-b622-c10b3c7222de.
Deleted account 502450e4-6daa-4ced-869c-4dff62dc52de.
Deleted account 6ff06ef0-423a-4b08-8b87-48af2221bc18.
Deleted account a1acb134-950c-4882-9ac7-6d6fbdaaaee1.
Deleted account e4f33c55-7230-4080-b5ac-5dde8a7ae41d.

In a SQL shell connected to the cluster, you can verify that the rows were inserted, updated, and deleted successfully:

icon/buttons/copy
> SELECT COUNT(*) FROM bank.accounts;
  count
---------
     95
(1 row)

Best practices

Use the run_transaction function

We strongly recommend using the sqlalchemy_cockroachdb.run_transaction() function as shown in the code samples on this page. This abstracts the details of transaction retries away from your application code. Transaction retries are more frequent in CockroachDB than in some other databases because we use optimistic concurrency control rather than locking. Because of this, a CockroachDB transaction may have to be tried more than once before it can commit. This is part of how we ensure that our transaction ordering guarantees meet the ANSI SERIALIZABLE isolation level.

In addition to the above, using run_transaction has the following benefits:

  • Because it must be passed a sqlalchemy.orm.session.sessionmaker object (not a session), it ensures that a new session is created exclusively for use by the callback, which protects you from accidentally reusing objects via any sessions created outside the transaction.
  • It abstracts away the client-side transaction retry logic from your application, which keeps your application code portable across different databases. For example, the sample code given on this page works identically when run against Postgres (modulo changes to the prefix and port number in the connection string).

For more information about how transactions (and retries) work, see Transactions.

Avoid mutations of session and/or transaction state inside run_transaction()

In general, this is in line with the recommendations of the SQLAlchemy FAQs, which state (with emphasis added by the original author) that

As a general rule, the application should manage the lifecycle of the session externally to functions that deal with specific data. This is a fundamental separation of concerns which keeps data-specific operations agnostic of the context in which they access and manipulate that data.

and

Keep the lifecycle of the session (and usually the transaction) separate and external.

In keeping with the above recommendations from the official docs, we strongly recommend avoiding any explicit mutations of the transaction state inside the callback passed to run_transaction, since that will lead to breakage. Specifically, do not make calls to the following functions from inside run_transaction:

  • sqlalchemy.orm.Session.commit() (or other variants of commit()): This is not necessary because cockroachdb.sqlalchemy.run_transaction handles the savepoint/commit logic for you.
  • sqlalchemy.orm.Session.rollback() (or other variants of rollback()): This is not necessary because cockroachdb.sqlalchemy.run_transaction handles the commit/rollback logic for you.
  • Session.flush(): This will not work as expected with CockroachDB because CockroachDB does not support nested transactions, which are necessary for Session.flush() to work properly. If the call to Session.flush() encounters an error and aborts, it will try to rollback. This will not be allowed by the currently-executing CockroachDB transaction created by run_transaction(), and will result in an error message like the following: sqlalchemy.orm.exc.DetachedInstanceError: Instance <FooModel at 0x12345678> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3).

Break up large transactions into smaller units of work

If you see an error message like transaction is too large to complete; try splitting into pieces, you are trying to commit too much data in a single transaction. As described in our Cluster Settings docs, the size limit for transactions is defined by the kv.transaction.max_intents_bytes setting, which defaults to 256 KiB. Although this setting can be changed by an admin, we strongly recommend against it in most cases.

Instead, we recommend breaking your transaction into smaller units of work (or "chunks"). A pattern that works for inserting large numbers of objects using run_transaction to handle retries automatically for you is shown below.

from sqlalchemy import create_engine, Column, Float, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction
from random import random

Base = declarative_base()

# The code below assumes you have run the following SQL statements.

# CREATE DATABASE pointstore;

# USE pointstore;

# CREATE TABLE points (
#     id INT PRIMARY KEY DEFAULT unique_rowid(),
#     x FLOAT NOT NULL,
#     y FLOAT NOT NULL,
#     z FLOAT NOT NULL
# );

engine = create_engine(
    # For cockroach demo:
    'cockroachdb://<username>:<password>@<hostname>:<port>/bank?sslmode=require',
    echo=True                   # Log SQL queries to stdout
)


class Point(Base):
    __tablename__ = 'points'
    id = Column(Integer, primary_key=True)
    x = Column(Float)
    y = Column(Float)
    z = Column(Float)


def add_points(num_points):
    chunk_size = 1000        # Tune this based on object sizes.

    def add_points_helper(sess, chunk, num_points):
        points = []
        for i in range(chunk, min(chunk + chunk_size, num_points)):
            points.append(
                Point(x=random()*1024, y=random()*1024, z=random()*1024)
            )
        sess.bulk_save_objects(points)

    for chunk in range(0, num_points, chunk_size):
        run_transaction(
            sessionmaker(bind=engine),
            lambda s: add_points_helper(
                s, chunk, min(chunk + chunk_size, num_points)
            )
        )


add_points(10000)

Use IMPORT to read in large data sets

If you are trying to get a large data set into CockroachDB all at once (a bulk import), avoid writing client-side code that uses an ORM and use the IMPORT statement instead. It is much faster and more efficient than making a series of INSERTs and UPDATEs such as are generated by calls to session.bulk_save_objects().

For more information about importing data from Postgres, see Migrate from Postgres.

For more information about importing data from MySQL, see Migrate from MySQL.

Prefer the query builder

In general, we recommend using the query-builder APIs of SQLAlchemy (e.g., Engine.execute()) in your application over the Session/ORM APIs if at all possible. That way, you know exactly what SQL is being generated and sent to CockroachDB, which has the following benefits:

  • It's easier to debug your SQL queries and make sure they are working as expected.
  • You can more easily tune SQL query performance by issuing different statements, creating and/or using different indexes, etc. For more information, see SQL Performance Best Practices.

Joins without foreign keys

SQLAlchemy relies on the existence of foreign keys to generate JOIN expressions from your application code. If you remove foreign keys from your schema, SQLAlchemy will not generate joins for you. As a workaround, you can create a "custom foreign condition" by adding a relationship field to your table objects, or do the equivalent work in your application.

See also

You might also be interested in the following pages:

YesYes NoNo