Export Data with Changefeeds

On this page Carat arrow pointing down

When you create an Enterprise changefeed, you can include the initial_scan = 'only' option to specify that the changefeed should only complete a table scan. The changefeed emits messages for the table scan and then the job completes with a succeeded status. As a result, you can create a changefeed with initial_scan = 'only' to EXPORT data out of your database.

You can also schedule a changefeed to use a changefeed initial scan for exporting data on a regular cadence.

The benefits of using changefeeds for this use case instead of export, include:

Changefeeds emit the same CSV format as EXPORT. In v22.1, changefeeds emitted CSV data that wrapped some values in single quotes, which were not wrapped when exporting data with the EXPORT statement.

Message formats

By default, changefeeds emit messages in JSON format. You can use a different format by creating a changefeed with the format option and specifying one of the following:

  • json
  • csv
  • avro
  • parquet

Examples

Export data with a changefeed

To create a changefeed that will only complete an initial scan of a table(s), run the following:

icon/buttons/copy
CREATE CHANGEFEED FOR TABLE movr.users INTO '{scheme}://{host}:{port}?{query_parameters}' WITH initial_scan = 'only', format=csv;

Or, use CDC queries to filter the data that your changefeed emits:

icon/buttons/copy
CREATE CHANGEFEED INTO '{scheme}://{host}:{port}?{query_parameters}'
  WITH initial_scan = 'only', format=csv AS SELECT name, city FROM movr.users;

The job will return a job ID once it has started. You can use SHOW CHANGEFEED JOBS to check on the status:

icon/buttons/copy
SHOW CHANGEFEED JOB {job ID};

When the scan has completed you will find the output shows succeeded in the status field.

Create a scheduled changefeed to export filtered data

This example creates a nightly export of some filtered table data with a scheduled changefeed that will run just after midnight every night. The changefeed uses CDC queries to query the table and filter the data it will send to the sink:

icon/buttons/copy
CREATE SCHEDULE sf_skateboard FOR CHANGEFEED INTO 'external://cloud-sink' WITH format=csv
  AS SELECT current_location, id, type, status FROM vehicles
  WHERE city = 'san francisco' AND type = 'skateboard'
  RECURRING '1 0 * * *' WITH SCHEDULE OPTIONS on_execution_failure=retry, on_previous_running=start;

The schedule options control the schedule's behavior:

  • If it runs into a failure, on_execution_failure=retry will ensure that the schedule retries the changefeed immediately.
  • If the previous scheduled changefeed is still running, on_previous_running=start will start a new changefeed at the defined cadence.

Video Demo

For a demo on how to use changefeeds to export large amounts of data, watch the following video:

See also


Yes No
On this page

Yes No