Transactions
An RDBMS must be able to group SQL statements so that they are either all committed, which means they are applied to the database, or all rolled back, which means they are undone. A transaction is a logical, atomic unit of work that contains one or more SQL statements.
An illustration of the need for transactions is a funds transfer from a savings account to a checking account. The transfer consists of the following separate operations:
- Decrease the savings account.
- Increase the checking account.
- Record the transaction in the transaction journal.
PostgreSQL Database guarantees that all three operations succeed or fail as a unit. For example, if a hardware failure prevents a statement in the transaction from executing, then the other statements must be rolled back.
Transactions are one of the features that sets PostgreSQL Database apart from a file system. If you perform an atomic operation that updates several files, and if the system fails halfway through, then the files will not be consistent. In contrast, a transaction moves an Postgres database from one consistent state to another. The basic principle of a transaction is “all or nothing”: an atomic operation succeeds or fails as a whole.
A transaction is a logical, atomic unit of work that contains one or more SQL statements. A transaction groups SQL statements so that they are either all committed, which means they are applied to the database, or all rolled back, which means they are undone from the database. PostgreSQL Database assigns every transaction a unique identifier called a transaction ID.
All PostgreSQL transactions comply with the basic properties of a database transaction, known as ACID properties. ACID is an acronym for the following:
-
Atomicity
All tasks of a transaction are performed or none of them are. There are no partial transactions. For example, if a transaction starts updating 100 rows, but the system fails after 20 updates, then the database rolls back the changes to these 20 rows.
-
Consistency
The transaction takes the database from one consistent state to another consistent state. For example, in a banking transaction that debits a savings account and credits a checking account, a failure must not cause the database to credit only one account, which would lead to inconsistent data.
-
Isolation
The effect of a transaction is not visible to other transactions until the transaction is committed. For example, one user updating the hr.employees table does not see the uncommitted changes to employees made concurrently by another user. Thus, it appears to users as if transactions are executing serially.
-
Durability
Changes made by committed transactions are permanent. After a transaction completes, the database ensures through its recovery mechanisms that changes from the transaction are not lost.
The use of transactions is one of the most important ways that a database management system differs from a file system.
To illustrate the concept of a transaction, consider a banking database. When a customer transfers money from a savings account to a checking account, the transaction must consist of three separate operations:
-
Decrement the savings account
-
Increment the checking account
-
Record the transaction in the transaction journal
PostgreSQL must allow for two situations. If all three SQL statements maintain the accounts in proper balance, then the effects of the transaction can be applied to the database. However, if a problem such as insufficient funds, invalid account number, or a hardware failure prevents one or two of the statements in the transaction from completing, then the database must roll back the entire transaction so that the balance of all accounts is correct.
Figure 1 illustrates a banking transaction. The first statement subtracts $500 from savings account 3209. The second statement adds $500 to checking account 3208. The third statement inserts a record of the transfer into the journal table. The final statement commits the transaction.
A database transaction consists of one or more statements that together constitute an atomic change to the database. The statements in it can be either Data Manipulation Language (DML) statements or Data Definition Language (DDL) statements.
A transaction has a beginning and an end.
In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN
and COMMIT
commands. So our banking transaction would actually look like:
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
-- etc etc
COMMIT;
If, partway through the transaction, we decide we do not want to commit (perhaps we just noticed that Alice’s balance went negative), we can issue the command ROLLBACK
instead of COMMIT
, and all our updates so far will be canceled.
PostgreSQL actually treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN
command, then each individual statement has an implicit BEGIN
and (if successful) COMMIT
wrapped around it. A group of statements surrounded by BEGIN
and COMMIT
is sometimes called a transaction block.
Some client libraries issueBEGIN
andCOMMIT
commands automatically, so that you might get the effect of transaction blocks without asking. Check the documentation for the interface you are using.
When a transaction begins, Redrock Postgres assigns the transaction to an available undo segment to record the undo entries for the new transaction. A transaction ID is not allocated until an undo segment and transaction table slot are allocated, which occurs during the first DML/DDL statement. A transaction ID is unique to a transaction and represents the undo segment number, transaction slot number, and sequence number.
The following example execute an UPDATE
statement to begin a transaction and queries for details about the transaction:
BEGIN;
UPDATE hr.employees SET salary = salary;
SELECT xid, pg_xact_status(xid) AS status
FROM pg_current_xact_id_if_assigned() AS xid;
xid | status
----------+-------------
(5,16,1) | in progress
A transaction ends when any of the following actions occurs:
-
A user issues a
COMMIT
orROLLBACK
statement without aSAVEPOINT
clause.In a commit, a user explicitly or implicitly requested that the changes in the transaction be made permanent. Changes made by the transaction are permanent and visible to other users only after a transaction commits. The transaction shown in Figure 1 ends with a commit.
-
A user exits normally from most PostgreSQL Database utilities and tools, causing the current transaction to be implicitly committed. The commit behavior when a user disconnects is application-dependent and configurable.
Note: Applications should always explicitly commit or rollback transactions before program termination.
-
A client process terminates abnormally, causing the transaction to be implicitly rolled back using metadata stored in the transaction table and the undo.
After one transaction ends, the next executable SQL statement automatically starts the following transaction. The following example executes an UPDATE
to start a transaction, ends the transaction with a ROLLBACK
statement, and then executes an UPDATE
to start a new transaction (note that the transaction IDs are different):
BEGIN;
UPDATE hr.employees SET salary = salary;
SELECT xid, pg_xact_status(xid) AS status
FROM pg_current_xact_id_if_assigned() AS xid;
xid | status
----------+-------------
(6,16,1) | in progress
ROLLBACK;
SELECT pg_current_xact_id_if_assigned() AS xid;
xid
-----
BEGIN;
UPDATE hr.employees SET last_name = last_name;
SELECT xid, pg_xact_status(xid) AS status
FROM pg_current_xact_id_if_assigned() AS xid;
xid | status
----------+-------------
(7,16,1) | in progress
Redrock Postgres supports statement-level atomicity, which means that a SQL statement is an atomic unit of work and either completely succeeds or completely fails.
A successful statement is different from a committed transaction. A single SQL statement executes successfully if the database parses and runs it without error as an atomic unit, as when all rows are changed in a multirow update.
If a SQL statement causes an error during execution, then it is not successful and so all effects of the statement are rolled back. This operation is a statement-level rollback. This operation has the following characteristics:
-
A SQL statement that does not succeed causes the loss only of work it would have performed itself.
The unsuccessful statement does not cause the loss of any work that preceded it in the current transaction. For example, if the execution of the second UPDATE statement in Figure 1 causes an error and is rolled back, then the work performed by the first UPDATE statement is not rolled back. The first UPDATE statement can be committed or rolled back explicitly by the user.
-
The effect of the rollback is as if the statement had never been run.
Any side effects of an atomic statement, for example, triggers invoked upon execution of the statement, are considered part of the atomic statement. Either all work generated as part of the atomic statement succeeds or none does.
An example of an error causing a statement-level rollback is an attempt to insert a duplicate primary key. Single SQL statements involved in a deadlock, which is competition for the same data, can also cause a statement-level rollback. However, errors discovered during SQL statement parsing, such as a syntax error, have not yet been run and so do not cause a statement-level rollback.
A logical timestamp (logicaltime
) is a logical, internal time stamp used by Redrock Postgres Database. logicaltimes order events that occur within the database, which is necessary to satisfy the ACID properties of a transaction. Redrock Postgres uses logicaltimes to mark the logicaltime
before which all changes are known to be on disk so that recovery avoids applying unnecessary redo. The database also uses logicaltimes to mark the point at which no redo exists for a set of data so that recovery can stop.
logicaltimes occur in a monotonically increasing sequence. Redrock Postgres can use an logicaltime
like a clock because an observed logicaltime
indicates a logical point in time and repeated observations return equal or greater values. If one event has a lower logicaltime
than another event, then it occurred at an earlier time with respect to the database. Several events may share the same logicaltime
, which means that they occurred at the same time with respect to the database.
Every transaction has an logicaltime
. For example, if a transaction updates a row, then the database records the logicaltime at which this update occurred. Other modifications in this transaction have the same logicaltime
. When a transaction commits, the database records an logicaltime
for this commit.
Redrock Postgres increments logicaltimes in the shared memory area. When a transaction modifies data, the database writes a new logicaltime
to the undo assigned to the transaction. The log writer process then writes the commit record of the transaction immediately to the online redo log. The commit record has the unique logicaltime
of the transaction. Redrock Postgres also uses logicaltimes as part of its instance recovery and media recovery mechanisms.
Transaction control is the management of changes made by SQL statements and the grouping of SQL statements into transactions. In general, application designers are concerned with transaction control so that work is accomplished in logical units and data is kept consistent.
Transaction control involves using the following statements:
BEGIN
initiates a transaction block, that is, all statements after aBEGIN
command will be executed in a single transaction until an explicitCOMMIT
orROLLBACK
is given.- The
COMMIT
statement ends the current transaction and makes all changes performed in the transaction permanent.COMMIT
also erases all savepoints in the transaction and releases transaction locks. - The
ROLLBACK
statement reverses the work done in the current transaction; it causes all data changes since the lastCOMMIT
orROLLBACK
to be discarded. TheROLLBACK TO SAVEPOINT
statement undoes the changes since the last savepoint but does not end the entire transaction. - The
SAVEPOINT
statement identifies a point in a transaction to which you can later roll back.
The session in Table 1 illustrates the basic concepts of transaction control.
Table 1 Transaction Control
Time | Session | Explanation |
---|---|---|
t0 | BEGIN; | This statement begins a transaction block. |
t1 | UPDATE employees SET salary = 7000 WHERE last_name = ‘Banda’; | This statement updates the salary for Banda to 7000. |
t2 | SAVEPOINT after_banda_sal; | This statement creates a savepoint named after_banda_sal , enabling changes in this transaction to be rolled back to this point. |
t3 | UPDATE employees SET salary = 12000 WHERE last_name = ‘Greene’; | This statement updates the salary for Greene to 12000. |
t4 | SAVEPOINT after_greene_sal; | This statement creates a savepoint named after_greene_sal , enabling changes in this transaction to be rolled back to this point. |
t5 | ROLLBACK TO SAVEPOINT after_banda_sal; | This statement rolls back the transaction to t2, undoing the update to Greene’s salary at t3. The transaction has not ended. |
t6 | UPDATE employees SET salary = 11000 WHERE last_name = ‘Greene’; | This statement updates the salary for Greene to 11000 in transaction. |
t7 | ROLLBACK; | This statement rolls back all changes in transaction, ending the transaction. |
t8 | BEGIN; | This statement begins a new transaction block. |
t9 | UPDATE employees SET salary = 7050 WHERE last_name = ‘Banda’; | This statement updates the salary for Banda to 7050. |
t10 | UPDATE employees SET salary = 10950 WHERE last_name = ‘Greene’; | This statement updates the salary for Greene to 10950. |
t11 | COMMIT; | This statement commits all changes made in transaction, ending the transaction. The commit guarantees that the changes are saved in the online redo log files. |
An active transaction has started but not yet committed or rolled back. In Table 1, the first statement to modify data in the transaction is the update to Banda’s salary. From the successful execution of this update until the ROLLBACK
statement ends the transaction, the transaction is active.
Data changes made by a transaction are temporary until the transaction is committed or rolled back. Before the transaction ends, the state of the data is as follows:
-
Redrock Postgres has generated undo data information in the shared buffers.
The undo data contains the old data values changed by the SQL statements of the transaction.
-
Redrock Postgres has generated redo in the shared WAL buffers.
The redo log record contains the change to the data block and the change to the undo block.
-
Changes have been made to the database pages in shared buffers.
The data changes for a committed transaction, stored in the database pages in shared buffers, are not necessarily written immediately to the data files by the database writer (bgwriter). The disk write can happen before or after the commit.
-
The rows affected by the data change are locked.
Other users cannot change the data in the affected rows, nor can they see the uncommitted changes.
A savepoint is a user-declared intermediate marker within the context of a transaction. Internally, this marker resolves to an undo record position. Savepoints divide a long transaction into smaller parts.
If you use savepoints in a long transaction, then you have the option later of rolling back work performed before the current point in the transaction but after a declared savepoint within the transaction. Thus, if you make an error, you do not need to resubmit every statement. Table 1 creates savepoint after_banda_sal
so that the update to the Greene salary can be rolled back to this savepoint.
A rollback to a savepoint in an uncommitted transaction means undoing any changes made after the specified savepoint, but it does not mean a rollback of the transaction itself. When a transaction is rolled back to a savepoint, as when the ROLLBACK TO SAVEPOINT after_banda_sal
is run in Table 1, the following occurs:
-
Redrock Postgres rolls back only the statements run after the savepoint.
In Table 1, the
ROLLBACK TO SAVEPOINT
causes theUPDATE
for Greene to be rolled back, but not theUPDATE
for Banda. -
Redrock Postgres preserves the savepoint specified in the
ROLLBACK TO SAVEPOINT
statement, but all subsequent savepoints are lost.In Table 1, the
ROLLBACK TO SAVEPOINT
causes theafter_greene_sal
savepoint to be lost. -
Redrock Postgres releases all table and row locks acquired after the specified savepoint but retains all data locks acquired previous to the savepoint.
The transaction remains active and can be continued.
A rollback of an uncommitted transaction undoes any changes to data that have been performed by SQL statements within the transaction. After a transaction has been rolled back, the effects of the work done in the transaction no longer exist.
In rolling back an entire transaction, without referencing any savepoints, Redrock Postgres performs the following actions:
-
Undoes all changes made by all the SQL statements in the transaction by using the corresponding undo segments
The transaction table entry for every active transaction contains a pointer to all the undo data (in reverse order of application) for the transaction. The database reads the data from the undo segment, reverses the operation, and then marks the undo entry as applied. Thus, if a transaction inserts a row, then a rollback deletes it. If a transaction updates a row, then a rollback reverses the update. If a transaction deletes a row, then a rollback reinserts it. In Table 1, the
ROLLBACK
reverses the updates to the salaries of Greene and Banda. -
Releases all the locks of data held by the transaction
-
Erases all savepoints in the transaction
In Table 1, the
ROLLBACK
deletes the savepointafter_banda_sal
. Theafter_greene_sal
savepoint was removed by theROLLBACK TO SAVEPOINT
statement. -
Ends the transaction
In Table 1, the
ROLLBACK
leaves the database in the same state as it was after the initialCOMMIT
was executed.
The duration of a rollback is a function of the amount of data modified.
A commit ends the current transaction and makes permanent all changes performed in the transaction. In Table 1, a second transaction begins with BEGIN
statement and ends with an explicit COMMIT
statement. The changes that resulted from the two UPDATE
statements are now made permanent.
When a transaction commits, the following actions occur:
-
A logical timestamp (
logicaltime
) is generated for theCOMMIT
.The internal transaction table for the associated undo tablespace records that the transaction has committed. The corresponding unique
logicaltime
of the transaction is assigned and recorded in the transaction table. -
The session process writes remaining WAL records in the WAL buffers to the WAL log and writes the transaction
logicaltime
to the WAL log. This atomic event constitutes the commit of the transaction. -
Redrock Postgres releases locks held on rows and tables.
Users who were enqueued waiting on locks held by the uncommitted transaction are allowed to proceed with their work.
-
Redrock Postgres deletes savepoints.
In Table 1, no savepoints existed in the transaction so no savepoints were erased.
-
Redrock Postgres performs a commit cleanout.
If modified blocks containing data from the committed transaction are still in the shared buffers, and if no other session is modifying them, then the database removes lock-related transaction information from the blocks. Ideally, the
COMMIT
cleans out the blocks so that a subsequentSELECT
does not have to perform this task.Note: Because a block cleanout generates redo, a query may generate redo and thus cause blocks to be written during the next checkpoint.
-
Redrock Postgres marks the transaction complete.
After a transaction commits, users can view the changes.
Typically, a commit is a fast operation, regardless of the transaction size. The speed of a commit does not increase with the size of the data modified in the transaction. The lengthiest part of the commit is the physical disk I/O occured in WAL writes. However, the amount of time spent in transaction commit is reduced because walwriter
has been incrementally writing the contents of the WAL buffer in the background.
The default behavior is for session to write redo to the WAL log synchronously and for transactions to wait for the buffered redo to be on disk before returning a commit to the user. However, for lower transaction commit latency, application developers can specify that redo be written asynchronously so that transactions need not wait for the redo to be on disk and can return from the COMMIT
call immediately.