This section discusses the processes in an Redrock Postgres database.
In database jargon, PostgreSQL uses a client/server model. A PostgreSQL session consists of the following cooperating processes (programs):
- A server process, which manages the database files, accepts connections to the database from client applications, and performs database actions on behalf of the clients. The database server program is called
- The user’s client (frontend) application that wants to perform database operations. Client applications can be very diverse in nature: a client could be a text-oriented tool, a graphical application, a web server that accesses the database to display web pages, or a specialized database maintenance tool. Some client applications are supplied with the PostgreSQL distribution; most are developed by users.
As is typical of client/server applications, the client and the server can be on different hosts. In that case they communicate over a TCP/IP network connection. You should keep this in mind, because the files that can be accessed on a client machine might not be accessible (or might only be accessible using a different file name) on the database server machine.
The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this it starts (“forks”) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original
postgres process. Thus, the supervisor server process is always running, waiting for client connections, whereas client and associated server processes come and go. (All of this is of course invisible to the user. We only mention it here for completeness.)
Redrock Postgres supports multithreaded model, it enables Postgres processes to execute as operating system threads in separate address spaces. When Postgres Database is installed, the database runs in process mode. You must set the
threaded_execution parameter to
on to run the database in threaded mode. In threaded mode, some background processes on UNIX and Linux run as processes containing one thread, whereas the remaining Postgres processes run as threads within processes. Thus, an “Postgres process” is not always equivalent to an “operating system process.”
Table 1 shows a list of background processes. Each of these processes is responsible for a specific portion of the system’s functions.
Table 1. Background processes
|background writer||In this process, dirty pages on the shared buffer pool are written to a persistent storage (e.g., HDD, SSD) on a regular basis gradually.|
|checkpointer||In this process, checkpoint process is performed.|
|startup||When the database server is started, check the control file and WAL log file. If the database crashed, it lauches a crash recovery.|
|logical replication launcher||The
|WAL writer||This process writes and flushes periodically the WAL data on the WAL buffer to persistent storage.|
|statistics collector||In this process, statistics information such as for
|logger||This process writes error messages into log files.|
|archiver||In this process, archiving logging is executed.|
There is a separate server process called the background writer, whose function is to issue writes of “dirty” (new or modified) shared buffers. When the number of clean shared buffers appears to be insufficient, the background writer writes some dirty buffers to the file system and marks them as clean. This reduces the likelihood that server processes handling user queries will be unable to find clean buffers and have to write dirty buffers themselves. However, the background writer does cause a net overall increase in I/O load, because while a repeatedly-dirtied page might otherwise be written only once per checkpoint interval, the background writer might write it several times as it is dirtied in the same interval. There are some parameters can be used to tune the behavior for local needs.
It handles all checkpoints. Checkpoints are automatically dispatched after a certain amount of time has elapsed since the last one, and it can be signaled to perform requested checkpoints as well. (The GUC parameter that mandates a checkpoint every so many WAL segments is implemented by having backends signal when they fill WAL segments; the checkpointer itself doesn’t watch for the condition.)
Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. (The change records were previously flushed to the WAL files.) In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk. Hence, after a checkpoint, log segments preceding the one containing the redo record are no longer needed and can be recycled or removed. (When WAL archiving is being done, the log segments must be archived before being recycled or removed.)
The checkpoint requirement of flushing all dirty data pages to disk can cause a significant I/O load. For this reason, checkpoint activity is throttled so that I/O begins at checkpoint start and completes before the next checkpoint is due to start; this minimizes performance degradation during checkpoints.
The server’s checkpointer process automatically performs a checkpoint every so often. A checkpoint is begun every checkpoint_timeout seconds, or if max_wal_size is about to be exceeded, whichever comes first. The default settings are 5 minutes and 1 GB, respectively. If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if
checkpoint_timeout has passed. (If WAL archiving is being used and you want to put a lower limit on how often files are archived in order to bound potential data loss, you should adjust the archive_timeout parameter rather than the checkpoint parameters.) It is also possible to force a checkpoint by using the SQL command
max_wal_size causes checkpoints to occur more often. This allows faster after-crash recovery, since less work will need to be redone. However, one must balance this against the increased cost of flushing dirty data pages more often. If full_page_writes is set (as is the default), there is another factor to consider. To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content. In that case, a smaller checkpoint interval increases the volume of output to the WAL log, partially negating the goal of using a smaller interval, and in any case causing more disk I/O.
Checkpoints are fairly expensive, first because they require writing out all currently dirty buffers, and second because they result in extra subsequent WAL traffic as discussed above. It is therefore wise to set the checkpointing parameters high enough so that checkpoints don’t happen too often. As a simple sanity check on your checkpointing parameters, you can set the checkpoint_warning parameter. If checkpoints happen closer together than
checkpoint_warning seconds, a message will be output to the server log recommending increasing
max_wal_size. Occasional appearance of such a message is not cause for alarm, but if it appears often then the checkpoint control parameters should be increased. Bulk operations such as large
COPY transfers might cause a number of such warnings to appear if you have not set
max_wal_size high enough.
To avoid flooding the I/O system with a burst of page writes, writing dirty buffers during a checkpoint is spread over a period of time. That period is controlled by checkpoint_completion_target, which is given as a fraction of the checkpoint interval. The I/O rate is adjusted so that the checkpoint finishes when the given fraction of
checkpoint_timeout seconds have elapsed, or before
max_wal_size is exceeded, whichever is sooner. With the default value of 0.5, PostgreSQL can be expected to complete each checkpoint in about half the time before the next checkpoint starts. On a system that’s very close to maximum I/O throughput during normal operation, you might want to increase
checkpoint_completion_target to reduce the I/O load from checkpoints. The disadvantage of this is that prolonging checkpoints affects recovery time, because more WAL segments will need to be kept around for possible use in recovery. Although
checkpoint_completion_target can be set as high as 1.0, it is best to keep it less than that (perhaps 0.9 at most) since checkpoints include some other activities besides writing dirty buffers. A setting of 1.0 is quite likely to result in checkpoints not being completed on time, which would result in performance loss due to unexpected variation in the number of WAL segments needed.
PostgreSQL has an optional but highly recommended feature called autoanalyze, whose purpose is to automate the execution of
ANALYZE commands. In the default configuration, autoanalyzing is enabled and the related configuration parameters are appropriately set.
The “autoanalyze daemon” actually consists of multiple processes. There is a persistent daemon process, called the autoanalyze launcher, which is in charge of starting autoanalyze worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker within each database every autoanalyze_naptime seconds. (Therefore, if the installation has N databases, a new worker will be launched every
autoanalyze_naptime / N seconds.) A maximum of autoanalyze_max_workers worker processes are allowed to run at the same time. If there are more than
autoanalyze_max_workers databases to be processed, the next database will be processed as soon as the first worker finishes. Each worker process will check each table within its database and execute
ANALYZE as needed. log_autoanalyze_min_duration can be set to monitor autoanalyze workers’ activity.
There is no limit on how many workers might be in a single database, but workers do try to avoid repeating work that has already been done by other workers. Note that the number of running workers does not count towards max_connections or superuser_reserved_connections limits.
If the total number of tuples inserted, updated, or deleted since the last
ANALYZE exceeds the “analyze threshold”, the table is analyzed. The analyze threshold is defined as:
analyze threshold = analyze base threshold + analyze scale factor * number of tuples
Temporary tables cannot be accessed by autoanalyze. Therefore, appropriate analyze operations should be performed via session SQL commands.
The default thresholds and scale factors are taken from
postgresql.conf, but it is possible to override them (and many other autoanalyze control parameters) on a per-table basis; see Storage Parameters for more information. If a setting has been changed via a table’s storage parameters, that value is used when processing that table; otherwise the global settings are used. See Automatic Analyzing Parameters for more details on the global settings.
The “logical replication daemon” actually consists of multiple processes. There is a persistent daemon process, called the logical replication launcher, which is in charge of starting logical replication worker processes for all databases. Immediately after the database instance startups, the launcher immediately starts the worker process to replicate for every enabled subscription. While the instance is running, the launcher will check for new created subscriptions, attempting to start one worker to replicate for every new subscription within each database. A maximum of max_logical_replication_workers worker processes are allowed to run at the same time. If there are more than
max_logical_replication_workers subscriptions to be processed, the left subscriptions will not be processed as there are no more free worker processes.
The subscriber also requires the
max_replication_slots be set to configure how many replication origins can be tracked. In this case it should be set to at least the number of subscriptions that will be added to the subscriber.
max_logical_replication_workers must be set to at least the number of subscriptions, again plus some reserve for the table synchronization. Additionally the
max_worker_processes may need to be adjusted to accommodate for replication workers, at least (
1). Note that some extensions and parallel queries also take worker slots from
The “undo maintenance daemon” actually consists of multiple processes. There is a persistent daemon process, called the undo launcher, which is in charge of starting undo worker processes for all databases. Immediately after the database instance startups, the launcher immediately starts the worker process to maintain the active undo segments in the instance. While the instance is running, the launcher will distribute the work across time, attempting to start one worker to maintain active undo segments within each database every 60 minutes. A maximum of 5 worker processes are allowed to run at the same time. If there are more than 5 undo segments to be processed, the next undo segment will be processed as soon as the first worker finishes. Each worker process will check transaction table in the undo segment and execute
ROLLBACK when found an orphan transaction.
At the same time, the undo maintenance daemon is also responsible for cleaning up the database Recycle Bin. The launcher starts worker processes to clean up dropped objects (such as tables, indexes, constraints, types) in the Recycle Bin.
There is a separate server process called the WAL Writer, It attempts to keep regular backends from having to write out (and fsync) WAL pages. Also, it guarantees that transaction commit records that weren’t synced to disk immediately upon commit (ie, were “asynchronously committed”) will reach disk within a knowable time — which, as it happens, is at most three times the
wal_writer_delay cycle time.
Note that as with the background writer for shared buffers, regular backends are still empowered to issue WAL writes and fsyncs when the walwriter doesn’t keep up. This means that the WAL writer is not an essential process and can shutdown quickly when requested.