IMPORT (Experimental)

On this page

The IMPORT statement imports tabular data (e.g., CSVs) into a single table.

Warning:

This is an experimental feature. To enable it, you must run SET CLUSTER SETTING experimental.importcsv.enabled = true

Note:

For details about importing SQL dumps, see Import Data.

Glossary

Term	Definition
Import file	The tabular data file you want to import.
Processing node	The single node processing the `IMPORT` statement/
Temp directory	A location where the processing node can store data from the import file it converts to CockroachDB-compatible key-value data. This directory must be available to all nodes using the same address (i.e., cannot use the processing node's local file storage).

Functional Overview

Because importing data is a complex task, it can be useful to have a high-level understanding of the process.

A single node receives the IMPORT request, which becomes the processing node.
The processing node streams the contents of the import file, converting its contents into CockroachDB-compatible key-value data.
As the key-value data is generated, the node stores it in the temp directory.
Once the entire import file has been converted to key-value data, relevant nodes import key-value data from the temp directory.

After the import has completed, you should also delete the files from your temp directory.

Preparation

Before using IMPORT, you should have:

The schema of the table you want to import.
The tabular data you want to import (e.g., CSV), preferably hosted on cloud storage.
A location to store data before it is fully imported into all your nodes (referred to in this document as a "temp" directory). This location must be accessible to all nodes using the same address (i.e., cannot use a node's local file storage).

For ease of use, we recommend using cloud storage. However, if that isn't readily available to you, we also have a guide on easily creating your own file server.

Details

Import Targets

Imported tables must not exist and must be created in the IMPORT statement. If the table you want to import already exists, you must drop it with DROP TABLE.

You can only import a single table at a time.

Create Table

Your IMPORT statement must include a CREATE TABLE statement (representing the schema of the data you want to import) using one of the following methods:

A reference to a file that contains a CREATE TABLE statement
An inline CREATE TABLE statement

We also recommend all secondary indexes you want to use in the CREATE TABLE statement. It is possible to add secondary indexes later, but it is significantly faster to specify them during import.

Object Dependencies

When importing tables, you must be mindful of the following rules because IMPORT only creates single tables which must not already exist:

Objects that the imported table depends on must already exist
Objects that depend on the imported table can only be created after the import completes

Operational Requirements & Concerns

Because IMPORT has a number of moving parts, there are a number of operational concerns in executing the statement, the most important of which is ensuring that the processing node can execute IMPORT successfully.

Choose Node to Process Request

Because of IMPORT's current implementation, the entire task is executed on a single node. If your deployment is not entirely symmetric, sending the request to a random node might have undesirable effects. Instead, we recommend bypassing any load balancers, connecting to a machine directly, and running the IMPORT statement on it.

It's important to note, though, that after the single machine creates the CockroachDB-compatible key-value data, the process of importing the data is distributed among nodes in the cluster.

Note:

Future versions of IMPORT will let you distribute the entire process among many nodes.

Available Storage Requirements

The node's first-listed/default store directory must have enough available storage equal to or greater than the size of the file you're importing.

On cockroach start, if you set --max-disk-temp-storage, it must also be greater than the size of the file you're importing.

For example, if you're importing approximately 10GiB of data, the node that ends up running the IMPORT command must have at least 10GiB of available storage in its store directory.

Import File Location

You can store the tabular data you want to import using either a node's local storage or remote cloud storage (Amazon S3, Google Cloud Platform, etc.).

For simplicity's sake, we highly recommend using cloud/remote storage for the data you want to import.

However, if you do want to store the file locally to import it, there are a number of things to understand.

Importing Data From Local Storage

Note:

Because you must have remote/cloud storage available to complete the IMPORT process, we recommend using it instead of local file storage.

If you do not have access to cloud storage, you can easily create a file server using this guide.

Because CockroachDB is designed as a distributed system, the ergonomics of local file storage require some understanding to use successfully. Though we do not recommend this process, if you do want to use a locally stored file, this procedure is likely to cause you the fewest headaches:

Ensure the node you want to use has available storage space at least 2x the size of the data you want to import; 1x for the file itself, and 1x for the converted key-value data.

For example, if you want to import 10GiB of data, your node needs 20GiB of available storage.
Upload the tabular data file to a single node, and then connect to that node.
Execute the IMPORT statement, importing to the locally stored file with the nodelocal prefix, e.g., nodelocal://backup.csv.

However, the "temp" directory you choose must use a location available to all nodes in the cluster (i.e., you cannot use local file storage). You will need to use either cloud storage, a custom HTTP server, or NFS connected to all nodes in the cluster.

Temp Directory

To distribute the data you want to import to all nodes in your cluster, the IMPORT process requires the CockroachDB-compatible key-value data be stored in a location that is accessible to all nodes in the cluster using the same address. To achieve this you can use:

Cloud storage, such as Amazon S3 or Google Cloud Platform
Network file storage mounted to every node
HTTP file server

Note:

If you do not currently have any of these options available, you can easily create a file server.

The temp directory must have at least as much storage space as the size of the data you want to import.

Temp Directory Cleanup

After completing the IMPORT process, you must manually remove the key-value data stored in the temp directory.

Table Users and Privileges

Imported tables are treated as new tables, so you must GRANT privileges to them.

Performance

Currently, IMPORT uses a single node to convert your tabular data into key-value data, which means the node's CPU and RAM will be partially consumed by the IMPORT task in addition to serving normal traffic.

Later steps of the import process distribute work among many nodes and have less impact on the nodes' resources.

Synopsis

Required Privileges

Only the root user can run IMPORT.

Parameters

Parameter	Description
table_name	The name of the table you want to import/create.
create_table_file	The URL of a plain text file containing the `CREATE TABLE` statement you want to use (see this example for syntax).
table_elem_list	The table definition you want to use (see this example for syntax).
file_to_import	The URL of the file you want to import.
`WITH` kv_option	Control your import's behavior with these options. The temp option (which represents the temp directory's URL) is required.

Import File & Temp Directory URLs

URLs for the file you want to import and your temp directory must use the following format:

[scheme]://[host]/[path]?[parameters]

Location	scheme	host	parameters
Amazon S3	`s3`	Bucket name	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
Azure	`azure`	Container name	`AZURE_ACCOUNT_KEY`, `AZURE_ACCOUNT_NAME`
Google Cloud ¹	`gs`	Bucket name	N/A
HTTP	`http`	Remote host	N/A
NFS/Local ²	`nodelocal`	File system location	N/A

Considerations

¹ GCS connections use Google's default authentication strategy.
² Because CockroachDB is a distributed system, you cannot meaningfully store backups "locally" on nodes. The entire backup file must be stored in a single location, so attempts to store backups locally must point to an NFS drive to be useful.
The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.

Notes

¹ Only supports instance auth.

² You can easily create your own HTTP server with Caddy or nginx.

³ If using NFS for your temp directory, each node in the cluster must have access to the NFS using the same URL.

Import Options

You can control the IMPORT process's behavior using any of the following key-value pairs as a kv_option.

`temp`

A directory accessible by all nodes, which is used to store the CockroachDB-compatible key-value data before all nodes import the data.

Required?	Yes
Key	`temp`
Value	The URL of the temp directory
Example	`WITH temp = 'azure://acme-co/import-temp?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'`

`delimiter`

If not using comma as your column delimiter, you can specify another Unicode character as the delimiter.

Required?	No
Key	`delimiter`
Value	The unicode character that delimits columns in your rows
Example	To use tab-delimited values: `WITH temp = '...', delimiter = e'\t'`

`comment`

Do not import rows that begin with this character.

Required?	No
Key	`comment`
Value	The unicode character that identifies rows to skip
Example	`WITH temp = '...', comment = '#'`

`nullif`

Convert values to SQL NULL if they match the specified string.

Required?	No
Key	`nullif`
Value	The string that should be converted to NULL
Example	To use empty columns as NULL: `WITH temp = '...', nullif = ''`

Examples

Use Create Table Statement from a File

> IMPORT TABLE customers
CREATE USING 'azure://acme-co/customer-create-table.sql?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
;

Use Create Table Statement from a Statement

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
;

Import a Tab-Separated File

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
    delimiter = e'\t'
;

Skip Commented Lines

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
    comment = '#'
;

Use Blank Characters as NULL

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
    nullif = ''
;

IMPORT (Experimental)

Glossary

Functional Overview

Preparation

Details

Import Targets

Create Table

Object Dependencies

Operational Requirements & Concerns

Choose Node to Process Request

Available Storage Requirements

Import File Location

Importing Data From Local Storage

Temp Directory

Temp Directory Cleanup

Table Users and Privileges

Performance

Synopsis

Required Privileges

Parameters

Import File & Temp Directory URLs

Considerations

Notes

Import Options

temp

delimiter

comment

nullif

Examples

Use Create Table Statement from a File

Use Create Table Statement from a Statement

Import a Tab-Separated File

Skip Commented Lines

Use Blank Characters as NULL

See Also

Tell us about your experience

Select the problem area

Thank you for your feedback!

Explore More Documentation:

`temp`

`delimiter`

`comment`

`nullif`