Migrate from CSV

On this page Carat arrow pointing down
Warning:
CockroachDB v21.1 is no longer supported. For more details, see the Release Support Policy.

This page has instructions for migrating data from CSV files into CockroachDB using IMPORT.

The examples below use the employees data set that is also used in the MySQL docs.

The examples below pull real data from Amazon S3. They use the employees data set that is also used in the MySQL docs, dumped as a set of CSV files.

Tip:

For best practices for optimizing import performance in CockroachDB, see Import Performance Best Practices.

Step 1. Export data to CSV

Please refer to the documentation of your database for instructions on exporting data to CSV.

You will need to export one CSV file per table, with the following requirements:

  • Files must be in valid CSV format, with the caveat that the delimiter must be a single character. To use a character other than comma (such as a tab), set a custom delimiter using the delimiter option.
  • Files must be UTF-8 encoded.
  • If one of the following characters appears in a field, the field must be enclosed by double quotes:
    • delimiter (, by default)
    • double quote (")
    • newline (\n)
    • carriage return (\r)
  • If double quotes are used to enclose fields, then a double quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc"
  • If a column is of type BYTES, it can either be a valid UTF-8 string or a hex-encoded byte literal beginning with \x. For example, a field whose value should be the bytes 1, 2 would be written as \x0102.

Step 2. Host the files where the cluster can access them

Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for more information on the types of storage IMPORT can pull from, see the following:

Tip:

We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.

Step 3. Import the CSV

You will need to write an IMPORT TABLE statement that matches the schema of the table data you're importing.

For example, to import the data from employees.csv into an employees table, issue the following statement:

icon/buttons/copy
> IMPORT TABLE employees (
    emp_no INT PRIMARY KEY,
    birth_date DATE NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    gender STRING NOT NULL,
    hire_date DATE NOT NULL
  ) CSV DATA ('https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz');
       job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |  bytes   
--------------------+-----------+--------------------+--------+---------------+----------------+----------
 381866942129111041 | succeeded |                  1 | 300024 |             0 |              0 | 13258389
(1 row)

Repeat the above for each CSV file you want to import.

Note:

As of v21.2 IMPORT TABLE will be deprecated. We recommend using CREATE TABLE followed by IMPORT INTO to import data into a new table. For an example, read Import into a new table from a CSV file.

To import data into an existing table, use IMPORT INTO.

Note:

The column order in your schema must match the column order in the file being imported.

Note:

You will need to run ALTER TABLE ... ADD CONSTRAINT to add any foreign key relationships.

Configuration Options

The following options are available to IMPORT ... CSV:

Column delimiter

The delimiter option is used to set the Unicode character that marks where each column ends. Default: ,.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees (
    emp_no INT PRIMARY KEY,
    birth_date DATE NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    gender STRING NOT NULL,
    hire_date DATE NOT NULL
  )
  CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
        WITH delimiter = e'\t';

Comment syntax

The comment option determines which Unicode character marks the rows in the data to be skipped.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees (
    emp_no INT PRIMARY KEY,
    birth_date DATE NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    gender STRING NOT NULL,
    hire_date DATE NOT NULL
  )
  CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
        WITH comment = '#';

Skip header rows

The skip option determines the number of header rows to skip when importing a file.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees (
    emp_no INT PRIMARY KEY,
    birth_date DATE NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    gender STRING NOT NULL,
    hire_date DATE NOT NULL
  )
  CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
        WITH skip = '2';

Row limit

New in v21.1: The row_limit option determines the number of rows to import from a table. It is useful for finding errors quickly before executing a more time- and resource-consuming import.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees (
    emp_no INT PRIMARY KEY,
    birth_date DATE NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    gender STRING NOT NULL,
    hire_date DATE NOT NULL
  )
  CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
        WITH row_limit = '10';

Null strings

The nullif option defines which string should be converted to NULL.

Example usage:

icon/buttons/copy
> IMPORT TABLE employees (
    emp_no INT PRIMARY KEY,
    birth_date DATE NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    gender STRING NOT NULL,
    hire_date DATE NOT NULL
  )
  CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
        WITH nullif = '';

File compression

The compress option defines which decompression codec should be used on the CSV file to be imported. Options include:

  • gzip: Uses the gzip algorithm to decompress the file.
  • bzip: Uses the bzip algorithm to decompress the file.
  • none: Disables decompression.
  • auto: Default. Guesses based on file extension ('none' for .csv, 'gzip' for .gz, 'bzip' for .bz and .bz2).

Example usage:

icon/buttons/copy
> IMPORT TABLE employees (
    emp_no INT PRIMARY KEY,
    birth_date DATE NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    gender STRING NOT NULL,
    hire_date DATE NOT NULL
  )
  CSV DATA ('s3://acme-co/employees.csv.gz?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
        WITH compress = 'gzip';

See also


Yes No
On this page

Yes No