IMPORT statement imports tabular data (e.g., CSVs) into a single table.
IMPORT, you should have:
- The schema of the table you want to import.
The tabular data you want to import (e.g., CSV), preferably hosted on cloud storage. This location must be accessible to all nodes using the same address. This means that you cannot use a node's local file storage.
For ease of use, we recommend using cloud storage. However, if that isn't readily available to you, we also have a guide on easily creating your own file server.
You can only import a single table at a time.
You can specify the target database in the table name in the
IMPORT statement. If it's not specified there, the active database in the SQL session is used.
IMPORT statement must include a
CREATE TABLE statement (representing the schema of the data you want to import) using one of the following methods:
- A reference to a file that contains a
- An inline
We also recommend all secondary indexes you want to use in the
CREATE TABLE statement. It is possible to add secondary indexes later, but it is significantly faster to specify them during import.
- Files must be UTF-8 encoded.
- If the delimiter (
,by default), a double quote (
"), newline (
\n), or carriage return (
\r) appears in a field, the field must be enclosed by double quotes.
- If double quotes are used to enclose fields, then a double quote appearing inside a field must be escaped by preceding it with another double quote. For example:
- If a column is of type
BYTES, it can either be a valid UTF-8 string or a hex-encoded byte literal beginning with
\x. For example, a field whose value should be the bytes
2would be written as
When importing tables, you must be mindful of the following rules because
IMPORT only creates single tables which must not already exist:
- Objects that the imported table depends on must already exist
- Objects that depend on the imported table can only be created after the import completes
Available Storage Requirements
Each node in the cluster is assigned an equal part of the converted CSV data, and so must have enough temp space to store it. In addition, data is persisted as a normal table, and so there must also be enough space to hold the final, replicated data. The node's first-listed/default
store directory must have enough available storage to hold its portion of the data.
cockroach start, if you set
--max-disk-temp-storage, it must also be greater than the portion of the data a node will store in temp space.
Import File Location
You can store the tabular data you want to import using remote cloud storage (Amazon S3, Google Cloud Platform, etc.). Alternatively, you can use an HTTP server accessible from all nodes.
For simplicity's sake, it's strongly recommended to use cloud/remote storage for the data you want to import. Local files are supported; however, they must be accessible identically from all nodes in the cluster.
Table Users and Privileges
Imported tables are treated as new tables, so you must
GRANT privileges to them.
All nodes are used during tabular data conversion into key-value data, which means all nodes' CPU and RAM will be partially consumed by the
IMPORT task in addition to serving normal traffic.
Viewing and Controlling Import Jobs
After CockroachDB successfully initiates an import, it registers the import as a job, which you can view with
`IMPORT`job will cause it to restart from the beginning.
IMPORTstatement cannot be used within a transaction.
root user can run
||The name of the table you want to import/create.|
||The URL of a plain text file containing the
||The table definition you want to use (see this example for syntax).|
||The URL of the file you want to import.|
||Control your import's behavior with these options.|
Import File URLs
URLs for the files you want to import must use the following format:
|Google Cloud 1||
||File system location||N/A|
|S3-compatible services 4||
1 If the
implicit, all GCS connections use Google's default authentication strategy. If the
cloudstorage.gs.default.keycluster setting must be set to the contents of a service account file which will be used during authentication. If the
AUTHparameter is not specified, the
cloudstorage.gs.default.keysetting will be used if it is non-empty, otherwise the
implicitbehavior is used.
2 You can easily create your own HTTP server with Caddy or nginx. A custom root CA can be appended to the system's default CAs by setting the
cloudstorage.http.custom_cacluster setting, which will be used when verifying certificates from HTTPS URLs.
3 The file system backup location on the NFS drive is relative to the path specified by the
--external-io-dirflag set while starting the node. If the flag is set to
disabled, then imports from local directories and NFS drives are disabled.
4 A custom root CA can be appended to the system's default CAs by setting the
cloudstorage.http.custom_cacluster setting, which will be used when verifying certificates from an S3-compatible service.
You can control the
IMPORT process's behavior using any of the following key-value pairs as a
If not using comma as your column delimiter, you can specify another Unicode character as the delimiter.
|Value||The unicode character that delimits columns in your rows|
|Example||To use tab-delimited values:
Do not import rows that begin with this character.
|Value||The unicode character that identifies rows to skip|
Convert values to SQL NULL if they match the specified string.
|Value||The string that should be converted to NULL|
|Example||To use empty columns as NULL:
Use Create Table Statement from a File
> IMPORT TABLE customers CREATE USING 'azure://acme-co/customer-create-table.sql?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co' CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') ;
Use Create Table Statement from a Statement
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') ;
Import a Tab-Separated File
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH delimiter = e'\t' ;
Skip Commented Lines
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH comment = '#' ;
Use Blank Characters as NULL
> IMPORT TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, INDEX name_idx (name) ) CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co') WITH nullif = '' ;
IMPORT can sometimes fail with a "context canceled" error, or can restart itself many times without ever finishing. If this is happening, it is likely due to a high amount of disk contention. This can be mitigated by setting the
kv.bulk_io_write.max_rate cluster setting to a value below your max disk write speed. For example, to set it to 10MB/s, execute:
> SET CLUSTER SETTING kv.bulk_io_write.max_rate = '10MB';