Migrate a PostgreSQL database

On this page Carat arrow pointing down
Warning:
CockroachDB v22.1 is no longer supported. For more details, see the Release Support Policy.
Warning:

The instructions on this page require updates. For updated guidance, see the current documentation.

This page has instructions for migrating data from PostgreSQL to CockroachDB using IMPORT's support for reading pg_dump files.

The examples pull real data from Amazon S3. They use the employees data set that is also used in the MySQL docs. The data was imported to PostgreSQL using pgloader, and then modified for use here as explained below.

Tip:

For best practices for optimizing import performance in CockroachDB, see Import Performance Best Practices.

Pre-migration considerations

Primary keys

PostgreSQL and CockroachDB have different best practices surrounding primary keys on tables. While it's common to see sequences and auto-incrementing primary keys in PostgreSQL, these features can cause hotspots within your cluster when reading or writing large amounts of data. Cockroach Labs recommends that you use multi-column primary keys or the UUID datatype for primary key columns.

If you are working with a table that must be indexed on sequential keys, consider using hash-sharded indexes. We recommend doing thorough performance testing with and without hash-sharded indexes to see which works best for your application.

For further information, see Unique ID best practices and 3 Basic Rules for Choosing Indexes.

Step 1. Dump the PostgreSQL database

There are several ways to dump data from PostgreSQL to be imported into CockroachDB:

The import will fail if the dump file contains functions or type definitions. In addition to calling pg_dump as shown below, you may need to edit the dump file to remove functions and data types.

Also, note that CockroachDB's IMPORT does not support automatically importing data from PostgreSQL's non-public schemas. As a workaround, you can edit the dump file to change the table and schema names in the CREATE TABLE statements.

Dump the entire database

Most users will want to import their entire PostgreSQL database all at once, as shown below in Import a full database dump.

To dump the entire database, run the pg_dump command shown below.

icon/buttons/copy
$ pg_dump employees > /tmp/employees-full.sql

For this data set, the PostgreSQL dump file required the following edits, which have already been performed on the files used in the examples below:

  • The type of the employees.gender column in the CREATE TABLE statement had to be changed from employees.employees_gender to STRING since PostgreSQL represented the employee's gender using a CREATE TYPE statement that is not supported by CockroachDB.

  • A CREATE TYPE employee ... statement needed to be removed.

If you only want to import one table from a database dump, see Import a table from a full database dump below.

Dump one table at a time

To dump the employees table from a PostgreSQL database also named employees, run the pg_dump command shown below. You can import this table using the instructions in Import a table from a table dump below.

icon/buttons/copy
$ pg_dump -t employees  employees > /tmp/employees.sql

For this data set, the PostgreSQL dump file required the following edits, which have already been performed on the files used in the examples below.

  • The type of the employees.gender column in the CREATE TABLE statement had to be changed from employees.employees_gender to STRING since PostgreSQL represented the employee's gender using a CREATE TYPE statement that is not supported by CockroachDB.

Step 2. Host the files where the cluster can access them

Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for more information on the types of storage IMPORT can pull from, see the following:

Tip:

We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.

Step 3. Import the PostgreSQL dump file

You can choose from several variants of the IMPORT statement, depending on whether you want to import a full database or a single table:

Note that all of the IMPORT statements in this section pull real data from Amazon S3 and will kick off background import jobs that you can monitor with SHOW JOBS.

Import a full database dump

This example assumes you dumped the entire database.

The IMPORT statement below reads the data and DDL statements (including existing foreign key relationships) from the full database dump file.

icon/buttons/copy
> IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz' WITH ignore_unsupported_statements;
       job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |  bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
 381845110403104769 | succeeded |                  1 | 300024 |             0 |              0 | 11534293
(1 row)

Import a table from a full database dump

This example assumes you dumped the entire database.

IMPORT can import one table's data from a full database dump. It reads the data and applies any CREATE TABLE statements from the file.

icon/buttons/copy
> CREATE DATABASE IF NOT EXISTS employees;
> USE employees;
> IMPORT TABLE employees FROM PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz' WITH ignore_unsupported_statements;
       job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |  bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
 383839294913871873 | succeeded |                  1 | 300024 |             0 |              0 | 11534293
(1 row)

Import a table from a table dump

The simplest way to import a table dump is to run IMPORT. It reads the table data and any CREATE TABLE statements from the file:

icon/buttons/copy
> CREATE DATABASE IF NOT EXISTS employees;
> USE employees;
> IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees.sql.gz' WITH ignore_unsupported_statements;
       job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |  bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
 383855569817436161 | succeeded |                  1 | 300024 |             0 |              0 | 11534293
(1 row)

Configuration Options

The following options are available to IMPORT PGDUMP:

Max row size

The max_row_size option is used to override limits on line size. Default: 0.5MB. This setting may need to be tweaked if your PostgreSQL dump file has extremely long lines, for example as part of a COPY statement.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees FROM PGDUMP ('s3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456') WITH max_row_size = '5MB';

Row limit

The row_limit option determines the number of rows to import. If you are importing one table, setting row_limit = 'n' will import the first n rows of the table. If you are importing an entire database, this option will import the first n rows from each table in the dump file. It is useful for finding errors quickly before executing a more time- and resource-consuming import.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees FROM PGDUMP 's3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456' WITH row_limit = '10';

Ignore unsupported statements

The ignore_unsupported_statements option specifies whether the import will ignore unsupported statements in the PGDUMP file. Default: false.

If ignore_unsupported_statements is omitted, the import will fail if it encounters a statement that is unsupported by CockroachDB. Use ignore_unsupported_statements with log_ignored_statements to log unsupported statements.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees FROM PGDUMP's3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456' WITH ignore_unsupported_statements;

Log unsupported statements

The log_ignored_statements option is used with the ignore_unsupported_statements option to log unsupported statements in the PGDUMP file to specified a destination file.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees FROM PGDUMP 's3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456' WITH ignore_unsupported_statements, log_ignored_statements='userfile://defaultdb.public.userfiles_root/unsupported-statements.log';

Skip foreign keys

By default, IMPORT PGDUMP supports foreign keys. Default: false. Add the skip_foreign_keys option to speed up data import by ignoring foreign key constraints in the dump file's DDL. It will also enable you to import individual tables that would otherwise fail due to dependencies on other tables.

Note:

The most common dependency issues are caused by unsatisfied foreign key relationships. You can avoid these issues by adding the skip_foreign_keys option to your IMPORT statement as needed. For more information, see the list of import options.

For example, if you get the error message pq: there is no unique constraint matching given keys for referenced table tablename, use IMPORT ... WITH skip_foreign_keys.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees FROM PGDUMP ('s3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456') WITH skip_foreign_keys;

Foreign key constraints can be added by using ALTER TABLE ... ADD CONSTRAINT commands after importing the data.

See also


Yes No
On this page

Yes No