IMPORT statement imports tabular data (e.g. CSVs) into a single table.
SET CLUSTER SETTING experimental.importcsv.enabled = true
|Import file||The tabular data file you want to import.|
|Processing node||The single node processing the
|Temp directory||A location where the processing node can store data from the import file it converts to CockroachDB-compatible key-value data.
This directory must be available to all nodes using the same address (i.e., cannot use the processing node's local file storage).
Because importing data is a complex task, it can be useful to have a high-level understanding of the process.
- A single node receives the
IMPORTrequest, which becomes the processing node.
- The processing node streams the contents of the import file, converting its contents into CockroachDB-compatible key-value data.
- As the key-value data is generated, the node stores it in the temp directory.
- Once the entire import file has been converted to key-value data, relevant nodes import key-value data from the temp directory.
After the import has completed, you should also delete the files from your temp directory.
IMPORT, you should have:
- The schema of the table you want to import.
- The tabular data you want to import (e.g., CSV), preferably hosted on cloud storage.
A location to store data before it is fully imported into all your nodes (referred to in this document as a "temp" directory). This location must be accessible to all nodes using the same address (i.e. cannot use a node's local file storage).
For ease of use, we recommend using cloud storage. However, if that isn't readily available to you, we also have a guide on easily creating your own file server.
You can only import a single table at a time.
IMPORT statement must include a
CREATE TABLE statement (representing the schema of the data you want to import) using one of the following methods:
- A reference to a file that contains a
- An inline
We also recommend all secondary indexes you want to use in the
CREATE TABLE statement. It is possible to add secondary indexes later, but it is significantly faster to specify them during import.
When importing tables, you must be mindful of the following rules because
IMPORT only creates single tables which must not already exist:
- Objects that the imported table depends on must already exist
- Objects that depend on the imported table can only be created after the import completes
Operational Requirements & Concerns
IMPORT has a number of moving parts, there are a number of operational concerns in executing the statement, the most important of which is ensuring that the processing node can execute
Choose Node to Process Request
IMPORT's current implementation, the entire task is executed on a single node. If your deployment is not entirely symmetric, sending the request to a random node might have undesirable effects. Instead, we recommend bypassing any load balancers, connecting to a machine directly, and running the
IMPORT statement on it.
It's important to note, though, that after the single machine creates the CockroachDB-compatible key-value data, the process of importing the data is distributed among nodes in the cluster.
IMPORTwill let you distribute the entire process among many nodes.
Available Storage Requirements
The node's first-listed/default
store directory must have enough available storage equal to or greater than the size of the file you're importing.
cockroach start, if you set
--max-disk-temp-storage, it must also be greater than the size of the file you're importing.
For example, if you're importing approximately 10GiB of data, the node that ends up running the
IMPORT command must have at least 10GiB of available storage in its
Import File Location
You can store the tabular data you want to import using either a node's local storage or remote cloud storage (Amazon S3, Google Cloud Platform, etc.).
For simplicity's sake, we highly recommend using cloud/remote storage for the data you want to import.
However, if you do want to store the file locally to import it, there are a number of things to understand.
Importing Data From Local Storage
IMPORTprocess, we recommend using it instead of local file storage.
If you do not have access to cloud storage, you can easily create a file server using this guide.
Because CockroachDB is designed as a distributed system, the ergonomics of local file storage require some understanding to use successfully. Though we don't recommend this process, if you do want to use a locally stored file, this procedure is likely to cause you the fewest headaches:
Ensure the node you want to use has available storage space at least 2x the size of the data you want to import; 1x for the file itself, and 1x for the converted key-value data.
For example, if you want to import 10GiB of data, your node needs 20GiB of available storage.
Upload the tabular data file to a single node, and then connect to that node.
IMPORTstatement, importing to the locally stored file with the
However, the "temp" directory you choose must use a location available to all nodes in the cluster (i.e., you cannot use local file storage). You will need to use either cloud storage, a custom HTTP server, or NFS connected to all nodes in the cluster.
To distribute the data you want to import to all nodes in your cluster, the
IMPORT process requires the CockroachDB-compatible key-value data be stored in a location that is accessible to all nodes in the cluster using the same address. To achieve this you can use:
- Cloud storage, such as Amazon S3 or Google Cloud Platform
- Network file storage mounted to every node
- HTTP file server
The temp directory must have at least as much storage space as the size of the data you want to import.
Temp Directory Cleanup
After completing the
IMPORT process, you must manually remove the key-value data stored in the temp directory.
Table Users and Privileges
Imported tables are treated as new tables, so you must
GRANT privileges to them.
IMPORT uses a single node to convert your tabular data into key-value data, which means the node's CPU and RAM will be partially consumed by the
IMPORT task in addition to serving normal traffic.
Later steps of the import process distribute work among many nodes and have less impact on the nodes' resources.
root user can run
|table_name||The name of the table you want to import/create.|
|create_table_file||The URL of a plain text file containing the
|table_elem_list||The table definition you want to use (see this example for syntax).|
|file_to_import||The URL of the file you want to import.|
||Control your import's behavior with these options. The temp option (which represents the temp directory's URL) is required.|
Import File & Temp Directory URLs
URLs for the file you want to import and your temp directory must use the following format:
|Google Cloud 1||
||File system location||N/A|
1 GCS connections use Google's default authentication strategy.
2 Because CockroachDB is a distributed system, you cannot meaningfully store backups "locally" on nodes. The entire backup file must be stored in a single location, so attempts to store backups locally must point to an NFS drive to be useful.
1 Only supports instance auth.
3 If using NFS for your temp directory, each node in the cluster must have access to the NFS using the same URL.
You can control the
IMPORT process's behavior using any of the following key-value pairs as a
A directory accessible by all nodes, which is used to store the CockroachDB-compatible key-value data before all nodes import the data.
|Value||The URL of the temp directory|
If not using comma as your column delimiter, you can specify another Unicode character as the delimiter.
|Value||The unicode character that delimits columns in your rows|
|Example||To use tab-delimited values:
Do not import rows that begin with this character.
|Value||The unicode character that identifies rows to skip|
Convert values to SQL NULL if they match the specified string.
|Value||The string that should be converted to NULL|
|Example||To use empty columns as NULL:
Use Create Table Statement from a File
> IMPORT TABLE customers CREATE USING 'azure://acme-co/customer-create-table.sql?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co' CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co' ;
Use Create Table Statement from a Statement
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co' ;
Import a Tab-Separated File
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co', delimiter = e'\t' ;
Skip Commented Lines
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co', comment = '#' ;
Use Blank Characters as NULL
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH temp = 'azure://acme-co/temp/?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co', nullif = '' ;