UPSERT

On this page

Warning:

As of May 16, 2023, CockroachDB v21.2 is no longer supported. For more details, refer to the Release Support Policy.

The UPSERT statement inserts rows in cases where specified values do not violate uniqueness constraints and updates rows in cases where values do violate uniqueness constraints. UPSERT considers uniqueness only for primary key columns.

`UPSERT` vs. `INSERT ON CONFLICT`

Assuming that columns a and b are the primary key, the following UPSERT and INSERT ON CONFLICT statements are equivalent:

> UPSERT INTO t (a, b, c) VALUES (1, 2, 3);

> INSERT INTO t (a, b, c)
    VALUES (1, 2, 3)
    ON CONFLICT (a, b)
    DO UPDATE SET c = excluded.c;

If your statement considers uniqueness for columns other than primary key columns, you must use INSERT ON CONFLICT. For an example, see the Upsert that fails (conflict on non-primary key).

When inserting or updating all columns of a table, and the table has no secondary indexes, Cockroach Labs recommends using an UPSERT statement instead of the equivalent INSERT ON CONFLICT statement. Whereas INSERT ON CONFLICT always performs a read to determine the necessary writes, the UPSERT statement writes without reading, making it faster. This may be particularly useful if you are using a simple SQL table of two columns to simulate direct KV access. In this case, be sure to use the UPSERT statement.

For tables with secondary indexes, there is no performance difference between UPSERT and INSERT ON CONFLICT.

To learn more about how to perform and when to use an upsert in CockroachDB, PostgreSQL, and MySQL, see Upsert in SQL: What is an Upsert, and When Should You Use One?.

Considerations

An UPSERT statement affecting a proper subset of columns behaves differently depending on whether or not you specify the target columns in the statement.
- If you specify target columns (e.g., UPSERT INTO accounts (id, name) VALUES (2, 'b2');), the values of columns that do not have new values in the UPSERT statement will not be updated.
- If you do not specify the target columns (e.g., UPSERT INTO accounts VALUES (2, 'b2');), the value of columns that do not have new values in the UPSERT statement will be updated to their default values.
For examples, see Upsert a proper subset of columns.
A single multi-row UPSERT statement is faster than multiple single-row UPSERT statements. Whenever possible, use multi-row UPSERT instead of multiple single-row UPSERT statements.
If the input data contains duplicates, see Import data containing duplicate rows using DISTINCT ON.

Required privileges

The user must have the INSERT, SELECT and UPDATE privileges on the table.

Synopsis

Parameters

Parameter	Description
`common_table_expr`	See Common Table Expressions.
`table_name`	The name of the table.
`AS table_alias_name`	An alias for the table name. When an alias is provided, it completely hides the actual table name.
`column_name`	The name of a column to populate during the insert.
`select_stmt`	A selection query. Each value must match the data type of its column. Also, if column names are listed after `INTO`, values must be in corresponding order; otherwise, they must follow the declared order of the columns in the table.
`DEFAULT VALUES`	To fill all columns with their default values, use `DEFAULT VALUES` in place of `select_stmt`. To fill a specific column with its default value, leave the value out of the `select_stmt` or use `DEFAULT` at the appropriate position.
`RETURNING target_list`	Return values based on rows inserted, where `target_list` can be specific column names from the table, `*` for all columns, or computations using scalar expressions. Within a transaction, use `RETURNING NOTHING` to return nothing in the response, not even the number of rows affected.

How `UPSERT` transforms into `INSERT ON CONFLICT`

UPSERT considers uniqueness only for primary key columns. For example, assuming that columns a and b are the primary key, the following UPSERT and INSERT ON CONFLICT statements are equivalent:

> UPSERT INTO t (a, b, c) VALUES (1, 2, 3);

> INSERT INTO t (a, b, c)
    VALUES (1, 2, 3)
    ON CONFLICT (a, b)
    DO UPDATE SET c = excluded.c;

INSERT ON CONFLICT is more flexible and can be used to consider uniqueness for columns not in the primary key. For more details, see the Upsert that Fails (Conflict on Non-Primary Key) example below.

Examples

Upsert a row (no conflict)

In this example, the id column is the primary key. Because the inserted id value does not conflict with the id value of any existing row, the UPSERT statement inserts a new row into the table.

> SELECT * FROM accounts;

+----+----------+
| id | balance  |
+----+----------+
|  1 |  10000.5 |
|  2 | 20000.75 |
+----+----------+

> UPSERT INTO accounts (id, balance) VALUES (3, 6325.20);

> SELECT * FROM accounts;

+----+----------+
| id | balance  |
+----+----------+
|  1 |  10000.5 |
|  2 | 20000.75 |
|  3 |   6325.2 |
+----+----------+

Upsert multiple rows

In this example, the UPSERT statement inserts multiple rows into the table.

> SELECT * FROM accounts;

+----+----------+
| id | balance  |
+----+----------+
|  1 |  10000.5 |
|  2 | 20000.75 |
|  3 |   6325.2 |
+----+----------+

> UPSERT INTO accounts (id, balance) VALUES (4, 1970.4), (5, 2532.9), (6, 4473.0);

> SELECT * FROM accounts;

+----+----------+
| id | balance  |
+----+----------+
|  1 |  10000.5 |
|  2 | 20000.75 |
|  3 |   6325.2 |
|  4 |   1970.4 |
|  5 |   2532.9 |
|  6 |   4473.0 |
+----+----------+

Upsert that updates a row (conflict on primary key)

In this example, the id column is the primary key. Because the inserted id value is not unique, the UPSERT statement updates the row with the new balance.

> SELECT * FROM accounts;

+----+----------+
| id | balance  |
+----+----------+
|  1 |  10000.5 |
|  2 | 20000.75 |
|  3 |   6325.2 |
|  4 |   1970.4 |
|  5 |   2532.9 |
|  6 |   4473.0 |
+----+----------+

> UPSERT INTO accounts (id, balance) VALUES (3, 7500.83);

> SELECT * FROM accounts;

+----+----------+
| id | balance  |
+----+----------+
|  1 |  10000.5 |
|  2 | 20000.75 |
|  3 |  7500.83 |
|  4 |   1970.4 |
|  5 |   2532.9 |
|  6 |   4473.0 |
+----+----------+

Upsert that fails (conflict on non-primary key)

UPSERT will not update rows when the uniqueness conflict is on columns not in the primary key. In this example, the a column is the primary key, but the b column also has the UNIQUE constraint. Because the inserted b value is not unique, the UPSERT fails.

> SELECT * FROM unique_test;

+---+---+
| a | b |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+---+---+

> UPSERT INTO unique_test VALUES (4, 1);

pq: duplicate key value (b)=(1) violates unique constraint "unique_test_b_key"

In such a case, you would need to use the INSERT ON CONFLICT statement to specify the b column as the column with the UNIQUE constraint.

> INSERT INTO unique_test VALUES (4, 1) ON CONFLICT (b) DO UPDATE SET a = excluded.a;

> SELECT * FROM unique_test;

+---+---+
| a | b |
+---+---+
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
+---+---+

Upsert a proper subset of columns

> CREATE TABLE accounts (
    id INT PRIMARY KEY,
    name STRING,
    balance DECIMAL(10, 2) DEFAULT 0
);

> INSERT INTO accounts (id, name, balance) VALUES
    (1, 'a1', 10000.5),
    (2, 'b1', 20000.75),
    (3, 'c1',  6325.2);

> SELECT * FROM accounts;

+----+------+----------+
| id | name | balance  |
+----+------+----------+
|  1 |   a1 | 10000.50 |
|  2 |   b1 | 20000.75 |
|  3 |   c1 |  6325.20 |
+----+------+----------+

Upserting a proper subset of columns without specifying the column names will write the default values of the unspecified columns when there is a conflict on the primary key. The account with id of 1 has a balance of 0 (the column's default value) after the UPSERT:

> UPSERT INTO accounts VALUES (1, 'a2');

> SELECT * FROM accounts;

+----+------+----------+
| id | name | balance  |
+----+------+----------+
|  1 |   a2 |     0.00 |
|  2 |   b1 | 20000.75 |
|  3 |   c1 |  6325.20 |
+----+------+----------+

If the target column names are included in the UPSERT, then the subset of columns without values will not change when there is a conflict on the primary key. The balance of the account with id of 2 is unchanged after the UPSERT:

> UPSERT INTO accounts (id, name) VALUES (2, 'b2');

> SELECT * FROM accounts;

+----+------+----------+
| id | name | balance  |
+----+------+----------+
|  1 |   a2 |     0.00 |
|  2 |   b2 | 20000.75 |
|  3 |   c1 |  6325.20 |
+----+------+----------+

Import data containing duplicate rows using `DISTINCT ON`

If the input data to insert/update contains duplicate rows, you must use DISTINCT ON to ensure there is only one row for each value of the primary key.

For example:

> WITH
    -- the following data contains duplicates on the conflict column "id":
    inputrows AS (VALUES (8, 130), (8, 140))

  UPSERT INTO accounts (id, balance)
    (SELECT DISTINCT ON(id) id, balance FROM inputrows); -- de-duplicate the input rows

The DISTINCT ON clause does not guarantee which of the duplicates is considered. To force the selection of a particular duplicate, use an ORDER BY clause:

> WITH
    -- the following data contains duplicates on the conflict column "id":
    inputrows AS (VALUES (8, 130), (8, 140))

  UPSERT INTO accounts (id, balance)
    (SELECT DISTINCT ON(id) id, balance
     FROM inputrows
     ORDER BY balance); -- pick the lowest balance as value to update in each account

Note:

Using DISTINCT ON incurs a performance cost to search and eliminate duplicates. For best performance, avoid using it when the input is known to not contain duplicates.

Limit the size of rows

To help you avoid failures arising from misbehaving applications that bloat the size of rows, you can specify the behavior when a row or individual column family larger than a specified size is written to the database. Use the cluster settings sql.guardrails.max_row_size_log to discover large rows and sql.guardrails.max_row_size_err to reject large rows.

When you write a row that exceeds sql.guardrails.max_row_size_log:

INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements will log a LargeRow to the SQL_PERF channel.
SELECT, DELETE, TRUNCATE, and DROP are not affected.

When you write a row that exceeds sql.guardrails.max_row_size_err:

INSERT, UPSERT, and UPDATE statements will fail with a code 54000 (program_limit_exceeded) error.
CREATE TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, and RESTORE statements will log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.
SELECT, DELETE, TRUNCATE, and DROP are not affected.

You cannot update existing rows that violate the limit unless the update shrinks the size of the row below the limit. You can select, delete, alter, back up, and restore such rows. We recommend using the accompanying setting sql.guardrails.max_row_size_log in conjunction with SELECT pg_column_size() queries to detect and fix any existing large rows before lowering sql.guardrails.max_row_size_err.

Cockroach
University

Docs Hub

UPSERT

`UPSERT` vs. `INSERT ON CONFLICT`

Considerations

Required privileges

Synopsis

Parameters

How `UPSERT` transforms into `INSERT ON CONFLICT`

Examples

Upsert a row (no conflict)

Upsert multiple rows

Upsert that updates a row (conflict on primary key)

Upsert that fails (conflict on non-primary key)

Upsert a proper subset of columns

Import data containing duplicate rows using `DISTINCT ON`

Limit the size of rows

See also

Cockroach University

Docs Hub

Cockroach University

Docs Hub

UPSERT

UPSERT vs. INSERT ON CONFLICT

Considerations

Required privileges

Synopsis

Parameters

How UPSERT transforms into INSERT ON CONFLICT

Examples

Upsert a row (no conflict)

Upsert multiple rows

Upsert that updates a row (conflict on primary key)

Upsert that fails (conflict on non-primary key)

Upsert a proper subset of columns

Import data containing duplicate rows using DISTINCT ON

Limit the size of rows

See also

Cockroach
University

Cockroach
University

`UPSERT` vs. `INSERT ON CONFLICT`

How `UPSERT` transforms into `INSERT ON CONFLICT`

Import data containing duplicate rows using `DISTINCT ON`