How we built a CockroachDB dialect for Hibernate

This post was originally published in 2017, upon announcing that CockroachDB’s support for Hibernate was in beta. Today, we’re excited to announce some big news: the CockroachDB dialect for Hibernate is officially available! 🥳

Hibernate now offers first-class support for CockroachDB. You can read more about the dialect, and our journey to get there, in this blog post.

We’re working hard to make CockroachDB scalable and robust, but simple to use. One way we’ve approached this is by adding support for existing object-relational mappers (ORMs), which make it fast and easy to develop applications that interact with CockroachDB. To determine which ORMs to support first, we asked customers and users of our community Slack, conducted developer surveys, and performed other research. Hibernate came up most often, likely because of Java’s large developer community and Hibernate’s popularity within that world. This post is primarily about our Hibernate compatibility, but a bunch of the SQL compatibility work we initially did helped pave the path toward other ORM compatibility as well.

Phase 1 (2017): Bootstrapping ORM Support

When we first started working on this project, we had an ambitious goal of getting support for five ORMs off the ground. Hibernate was at the top of our list, and there were two potential paths forward:

Implement the SQL features needed for Hibernate to function. This was feasible, because Cockroach supports the Postgres wire protocol and the majority of its syntax. Because many ORMs support PostgreSQL, this option involved adding SQL features that ORMs need but CockroachDB was lacking.
Create a CockroachDB adapter for each ORM. Though SQL has a set of standards, each commonly used database has significant deviations from standards and/or proprietary SQL features. This requires ORMs to have an adapter for each database it connects to. So, option 2 was to create an extension that performed ORM-specific tasks using features CockroachDB already supports.

In 2017, we wanted to release Hibernate support quickly. Plus, after some investigation, we discovered significant overlap in the SQL features other ORMs need. So, we went with option #1 at first. At the time, we decided that leveraging prebuilt adapters by fitting the interface they expected would be less effort in the long run than creating a custom adapter for each individual ORM or tool.

So, what are those features? ORMs issue two basic types of queries. The first set consists of CRUD (CREATE, READ, UPDATE, DELETE) queries. This was relatively straightforward given our existing implementations of those SQL statements. Minor issues existed, such as incompatible date and time formats, but were simple to fix.

The vast majority of the work involved supporting schema synchronization.

Schema Synchronization

One of the important responsibilities of an ORM is making sure that the tables, columns, and indices in the database are consistent with the models as defined by the code. For example, consider this Hibernate ORM model:

private static final SessionFactory sessionFactory = new Configuration() .configure("hibernate.cfg.xml") .addAnnotatedClass(Account.class) .buildSessionFactory(); @Entity @Table(name="accounts") public static class Account { @Id @Column(name="id") public long id; @Column(name="balance") public long balance; }

During startup, Hibernate performs the following actions:

Creates the accounts table if it doesn’t exist. Hibernate “knows” about the table because of the function call addAnnotatedClass(Account.class).
Adds the id column to accounts if it doesn’t exist and makes it the primary key for accounts.
Adds the balance column to accounts if it doesn’t exist.

Hibernate knows how it wants the accounts* table to be structured based on the @Table, @Column, and @Id annotations. Now, it needs to know if accounts already exists and, if it does, what its current schema is.

Schema Synchronization: pg_catalog vs. information_schema

Schema synchronization makes the database schema consistent with the models (e.g. Account in the above example). To do this, Hibernate must retrieve the database schema through the mechanisms provided by the database.

information_schema is a standard set of views that SQL defines to expose the structure of a database for introspection by clients. Unfortunately, the ORMs we’ve tested do not use this when connected to PostgreSQL. Instead, they use the relations in the PostgreSQL-specific pg_catalog schema. So, we’ve implemented the subset of pg_catalog views needed for Hibernate. This allows a Hibernate app to connect to CockroachDB as though it were PostgreSQL.

This process was relatively straightforward, but involved significant changes to SQL execution, the type system, and a variety of other code. It also exercised parts of our SQL engine that were previously underused. For example, consider this query that Hibernate issues while running a sample app:

SELECT NULL::text AS PKTABLE_CAT, pkn.nspname AS PKTABLE_SCHEM, [15 lines omitted] CASE con.confdeltype WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS DELETE_RULE, [8 lines omitted] FROM pg_catalog.pg_namespace pkn, pg_catalog.pg_class pkc, pg_catalog.pg_attribute pka, pg_catalog.pg_namespace fkn, pg_catalog.pg_class fkc, pg_catalog.pg_attribute fka, pg_catalog.pg_constraint con, pg_catalog.generate_series(1, 32) pos(n), pg_catalog.pg_depend dep, pg_catalog.pg_class pkic WHERE pkn.oid = pkc.relnamespace AND pkc.oid = pka.attrelid AND pka.attnum = con.confkey[pos.n] AND con.confrelid = pkc.oid AND fkn.oid = fkc.relnamespace AND fkc.oid = fka.attrelid AND fka.attnum = con.conkey[pos.n] AND con.conrelid = fkc.oid AND con.contype = 'f' AND con.oid = dep.objid AND pkic.oid = dep.refobjid AND pkic.relkind = 'i' AND dep.classid = 'pg_constraint'::regclass::oid AND dep.refclassid = 'pg_class'::regclass::oid AND fkn.nspname = 'public' AND fkc.relname = 'orders' ORDER BY pkn.nspname, pkc.relname, con.conname, pos.n

It’s reasonable for you to have looked at this query for a few minutes and still not understand what it’s trying to do. This query retrieves the foreign keys for the orders table in the public schema, using the relations in pg_catalog.

Hibernate is not unique in using pg_catalog instead of information_schema. Several other ORMs, including the other four we prioritized in our 2017 round of ORM compatibility use pg_catalog as well. Fortunately, we also confirmed that these ORMs mostly use the same parts of pg_catalog that Hibernate does.

Phase 2 (2020): Building a Cockroach/Hibernate Dialect

We’ve improved CockroachDB’s Postgres compatibility a lot since that initial beta support for Hibernate. Even so, we wanted to ensure that Hibernate has first-class support for CockroachDB, and that we could work around some of the lesser-used Postgres compatibility issues that aren’t coming up on our roadmap.

That’s why, in parallel to our SQL compatibility improvements, we decided to create our own Hibernate dialect. A CockroachDB Hibernate dialect allows us to provide a better developer experience by introducing CockroachDB-specific semantics. Compared to the Postgres dialect, the CockroachDB dialect has automatic handling for missing BLOB storage and Identity column support.

We worked with the Hibernate maintainers, and the CockroachDB dialect is part of the official Hibernate release as of v5.4.19. Going forward, our team will also have the opportunity to further extend the CockroachDB dialect with CockroachDB specific features. And because the dialect is in the main Hibernate repo, it’ll be maintained in perpetuity, and the integration into Hibernate’s main release helps us provide better ongoing support. Thank you to the Hibernate team for all their hard work on helping us get this merged!

Next Steps for ORM Compatibility

In addition to Hibernate, we’ve also implemented support in CockroachDB for ActiveRecord (Ruby), SQLAlchemy (Python), GORM (Go), and TypeORM (Typescript). A full list of ORMs and other third-party database tools that CockroachDB supports is available here. Want to use another ORM? We may already be working on it. You can first search for the ORM in our issue tracker. If you find a relevant issue, please voice your support with a GitHub reaction.

Pull requests to add ORM-related features are greatly appreciated. CockroachDB Core is open-source and benefits significantly from community pull requests. If that’s not feasible, please file an issue that includes a test script or unit test that uses the ORM. It’s easier to prioritize support if we don’t first have to invest in understanding it and its incompatibilities.

If you want to build an application with Hibernate ORM and CockroachDB, get started today by building a Java app with Hibernate.