Parallel Statement Execution

On this page Carat arrow pointing down
Warning:
CockroachDB v2.0 is no longer supported. For more details, see the Release Support Policy.

CockroachDB supports parallel execution of independent INSERT, UPDATE, UPSERT, and DELETE statements within a single transaction. Executing statements in parallel helps reduce aggregate latency and improve performance.

Why Use Parallel Statement Execution

SQL engines traditionally execute the SQL statements in a transaction sequentially. The server executes each statement to completion and sends the return value of each statement to the client. Only after the client receives the return value of a statement, it sends the next SQL statement to be executed.

In the case of a traditional single-node SQL database, statements are executed on the single machine, and so the execution does not result in any communication latency. However, in the case of a distributed and replicated database like CockroachDB, execution of statements can span multiple nodes. The coordination between nodes results in communication latency. Executing SQL statements sequentially results in higher cumulative latency.

With parallel statement execution, however, multiple SQL statements within a transaction are executed at the same time, thereby reducing the aggregate latency.

How Parallel Statement Execution Works

Let's understand how sequential and parallel execution works in the following scenario:

  • Suppose we want to update a user's last name, favorite movie, and favorite song on a social networking application.
  • The database has three tables that need to be updated: users, favorite_movies, and favorite_songs.

Then the traditional transaction to update the user's information is as follows:

> BEGIN;
> UPDATE users SET last_name = 'Smith' WHERE id = 1;
> UPDATE favorite_movies SET movies = 'The Matrix' WHERE user_id = 1;
> UPDATE favorite_songs SET songs = 'All this time' WHERE user_id = 1;
> COMMIT;

While executing the SQL statements in the transaction sequentially, the server sends a return value after executing a statement. The client can send the next statement to be executed only after it receives the return value of the previous statement. This is often described as a "conversational API," as demonstrated by the following conceptual diagram:

CockroachDB Parallel Statement Execution

The SQL statements in our sample scenario can be executed in parallel since they are independent of each other. To execute statements in parallel, the client should be able to send the next statement to be executed without waiting for the return value of the earlier statement. In CockroachDB, on appending the RETURNING NOTHING clause with SQL statements, the server sends an acknowledgment immediately, instead of waiting to complete the statement execution and sending the return value to the client. The client sends the next statement to be executed on receiving the acknowledgment. This allows CockroachDB to execute the statements in parallel. The statements are executed in parallel until CockroachDB encounters a barrier statement. A barrier statement is any statement without the RETURNING NOTHING clause. The server executes a barrier statement sequentially.

In our sample scenario, the transaction would be as follows:

> BEGIN;
> UPDATE users SET last_name = 'Smith' WHERE id = 1 RETURNING NOTHING;
> UPDATE favorite_movies SET movies = 'The Matrix' WHERE user_id = 1 RETURNING NOTHING;
> UPDATE favorite_songs SET songs = 'All this time' WHERE user_id = 1 RETURNING NOTHING;
> COMMIT;

In this case, because the UPDATE statements within the transaction are independent of each other, they can be executed in parallel without affecting the results. The COMMIT statement is the barrier statement and is executed sequentially. A barrier statement is executed only after all the parallel statements preceding it have finished executing.

The following conceptual diagram shows how the transaction is executed sequentially and in parallel. The diagram also shows how executing statements in parallel reduces the aggregate latency.

CockroachDB Parallel Statement Execution

Perceived delay in execution of barrier statements

As stated earlier, the server executes a barrier statement only after all the preceding parallel statements have finished executing. So it may seem as if the barrier statement is taking longer to execute, but it is waiting on the parallel statements. Even then, the total time required for parallel execution of statements followed by the sequential execution of the barrier statement should be less than the time required for the sequential execution of all statements.

Referring to the previous diagram, the server executes all UPDATE statements to completion before executing COMMIT. Hence it might seem as if COMMIT is taking longer to execute, but it is, in fact, waiting on the UPDATE statements to finish executing.

Error message mismatch

With sequential execution, as soon as an error happens, the transaction is aborted and an error message is sent to the client. However, with parallel execution, the message is sent not when the error is encountered but after the next barrier statement. This can result in the client receiving an error message that doesn't match the statement being executed. The following diagram illustrates this concept:

CockroachDB Parallel Statement Execution Error Mismatch

RETURNING NOTHING clause appended to dependent statements

If two consecutive statements are not independent, and yet a RETURNING NOTHING clause is added to the statements, CockroachDB detects the dependence and executes the statements sequentially. This means that you can use the RETURNING NOTHING clause with SQL statements without worrying about their dependence.

Revising our sample scenario, suppose we want to create a new user on the social networking app. We need to create entries for the last name of the user, their favorite movie, and favorite song. We need to insert entries into three tables: users, favorite_movies, and favorite_songs. The transaction would be as follows:

> BEGIN;
> INSERT INTO users VALUES last_name = 'Pavlo' WHERE id = 2 RETURNING NOTHING;
> INSERT INTO favorite_movies VALUES movies = 'Godfather' WHERE user_id = 2 RETURNING NOTHING;
> INSERT INTO facvorite_songs VALUES songs = 'Remember' WHERE user_id = 2 RETURNING NOTHING;
> COMMIT;

In this case, the second and third INSERT statements are dependent on the first INSERT statement because the movies and songs tables both have a foreign key constraint on the users table. So even though we append the RETURNING NOTHING clause to the first statement, CockroachDB executes the first statement sequentially. After the first statement is executed to completion, the second and third INSERT statements are executed in parallel. The following conceptual diagram shows how the transaction is executed in sequential and parallel modes:

CockroachDB Parallel Statement Hybrid Execution

When to Use Parallel Statement Execution

SQL statements within a single transaction can be executed in parallel if the statements are independent. CockroachDB considers SQL statements within a single transaction to be independent if their execution can be safely reordered without affecting their results.

For example, the following statements are considered independent since reordering the statements does not affect the results:

> INSERT INTO a VALUES (100);
> INSERT INTO b VALUES (100);
> INSERT INTO a VALUES (100);
> INSERT INTO a VALUES (200);

The following pairs of statements are dependent since reordering them will affect their results:

> UPDATE a SET b = 2 WHERE y = 1;
> UPDATE a SET b = 3 WHERE y = 1;
> UPDATE a SET y = true  WHERE y = false;
> UPDATE a SET y = false WHERE y = true;
Note:
Parallel statement execution in CockroachDB is different than parallel query execution in PostgreSQL. For PostgreSQL, parallel query execution refers to “creating multiple query processes that divide the workload of a single SQL statement and executing them in parallel”. For CockroachDB’s parallel statement execution, an individual SQL statement is not divided into processes. Instead, multiple independent SQL statements within a single transaction are executed in parallel.

See Also


Yes No
On this page

Yes No