Import Performance Best Practices

On this page Carat arrow pointing down
Cockroach Labs will stop providing Assistance Support for v22.1 on November 24, 2023. Prior to that date, upgrade to a more recent version to continue receiving support. For more details, see the Release Support Policy.

This page provides best practices for optimizing import performance in CockroachDB.

Import speed primarily depends on the amount of data that you want to import. However, there are two main factors that have can have a large impact on the amount of time it will take to run an import:


If the import size is small, then you do not need to do anything to optimize performance. In this case, the import should run quickly, regardless of the settings.

Split your data into multiple files

Splitting the import data into multiple files can have a large impact on the import performance. The following formats support multi-file import using IMPORT INTO:

  • CSV
  • AVRO

For these formats, we recommend splitting your data into as many files as there are nodes.

For example, if you have a 3-node cluster, split your data into 3 files, create your table, and import into that table:

CREATE TABLE customers (id UUID PRIMARY KEY, name TEXT, INDEX name_idx(name));
IMPORT INTO customers (id, name)
    CSV DATA (

CockroachDB imports the files that you give it, and does not further split them. For example, if you import one large file for all of your data, CockroachDB will process that file on one node—even if you have more nodes available. However, if you import two files (and your cluster has at least two nodes), each node will process a file in parallel. This is why splitting your data into as many files as you have nodes will dramatically decrease the time it takes to import data.


If you split the data into more files than you have nodes, it will not have a large impact on performance.

File storage during import

During migration, all of the features of IMPORT that interact with external file storage assume that every node has the exact same view of that storage. In other words, in order to import from a file, every node needs to have the same access to that file.

Choose a performant import format

Import formats do not have the same performance because of the way they are processed. Below, import formats are listed from fastest to slowest:

  1. CSV or DELIMITED DATA (both have about the same import performance)
  2. AVRO

We recommend formatting your import files as CSV, DELIMITED DATA, or AVRO. These formats can be processed in parallel by multiple threads, which increases performance. To import in these formats, use IMPORT INTO.

MYSQLDUMP and PGDUMP run a single thread to parse their data, and therefore have substantially slower performance.

MYSQLDUMP and PGDUMP are two examples of "bundled" data. This means that the dump file contains both the table schema and the data to import. These formats are the slowest to import, with PGDUMP being the slower of the two. This is because CockroachDB has to first load the whole file, read the whole file to get the schema, create the table with that schema, and then import the data. While these formats are slow, see Import the schema separately from the data for guidance on speeding up bundled-data imports.


As of v22.1, certain IMPORT TABLE statements that defined the table schema inline are not supported. See Import — Considerations for more details. To import data into a new table, use CREATE TABLE followed by IMPORT INTO. For an example, read Import into a new table from a CSV file.

Import the schema separately from the data

For single-table MYSQLDUMP or PGDUMP imports, split your dump data into two files:

  1. A SQL file containing the table schema
  2. A CSV file containing the table data

Then, import the schema-only file:

> IMPORT TABLE customers
    '' WITH ignore_unsupported_statements

And use the IMPORT INTO statement to import the CSV data into the newly created table:

> IMPORT INTO customers (id, name)

This method has the added benefit of alerting on potential issues with the import sooner; that is, you will not have to wait for the file to load both the schema and data just to find an error in the schema.

Import into a schema with secondary indexes

When importing data into a table with secondary indexes, the import job will ingest the table data and required secondary index data concurrently. This may result in a longer import time compared to a table without secondary indexes. However, this typically adds less time to the initial import than following it with a separate pass to add the indexes. As a result, importing tables with their secondary indexes is the default workflow, suitable for most import jobs.

However, in large imports, it may be preferable to remove the secondary indexes from the schema, perform the import, and then re-create the indexes separately. This provides increased visibility into its progress and ability to retry each step independently.

Data type sizes

Above a certain size, many data types such as STRINGs, DECIMALs, ARRAY, BYTES, and JSONB may run into performance issues due to write amplification. See each data type's documentation for its recommended size limits.

See also

Yes No
On this page

Yes No