Process Architecture

This section discusses the processes in an Redrock Postgres database.

Multiprocess Model

In database jargon, PostgreSQL uses a client/server model. A PostgreSQL session consists of the following cooperating processes (programs):

A server process, which manages the database files, accepts connections to the database from client applications, and performs database actions on behalf of the clients. The database server program is called postgres.
The user’s client (frontend) application that wants to perform database operations. Client applications can be very diverse in nature: a client could be a text-oriented tool, a graphical application, a web server that accesses the database to display web pages, or a specialized database maintenance tool. Some client applications are supplied with the PostgreSQL distribution; most are developed by users.

As is typical of client/server applications, the client and the server can be on different hosts. In that case they communicate over a TCP/IP network connection. You should keep this in mind, because the files that can be accessed on a client machine might not be accessible (or might only be accessible using a different file name) on the database server machine.

The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this it starts (“forks”) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the supervisor server process is always running, waiting for client connections, whereas client and associated server processes come and go. (All of this is of course invisible to the user. We only mention it here for completeness.)

Multithreaded Model

Redrock Postgres supports multithreaded model, it enables Postgres processes to execute as operating system threads in separate address spaces. When Postgres Database is installed, the database runs in process mode. You must set the threaded_execution parameter to on to run the database in threaded mode. In threaded mode, some background processes on UNIX and Linux run as processes containing one thread, whereas the remaining Postgres processes run as threads within processes. Thus, an “Postgres process” is not always equivalent to an “operating system process.”

Background Processes

Table 1 shows a list of background processes. Each of these processes is responsible for a specific portion of the system’s functions.

Table 1. Background processes

process	description
background writer	In this process, dirty pages on the shared buffer pool are written to a persistent storage (e.g., HDD, SSD) on a regular basis gradually.
checkpointer	In this process, checkpoint process is performed.
startup	When the database server is started, check the control file and WAL log file. If the database crashed, it lauches a crash recovery.
analyze launcher	The `analyze worker` processes are invoked for analyze process periodically. (More precisely, it requests to create the analyze workers to the postgres server.)
logical replication launcher	The `logical replication worker` processes are invoked for logical replication process periodically.
undo launcher	The `undo worker` processes are invoked for undo maintenance process periodically.
WAL writer	This process writes and flushes periodically the WAL data on the WAL buffer to persistent storage.
statistics collector	In this process, statistics information such as for `pg_stat_activity` and for `pg_stat_database`, etc. is collected.
logger	This process writes error messages into log files.
archiver	In this process, archiving logging is executed.

Background Writer

There is a separate server process called the background writer, whose function is to issue writes of “dirty” (new or modified) shared buffers. When the number of clean shared buffers appears to be insufficient, the background writer writes some dirty buffers to the file system and marks them as clean. This reduces the likelihood that server processes handling user queries will be unable to find clean buffers and have to write dirty buffers themselves. However, the background writer does cause a net overall increase in I/O load, because while a repeatedly-dirtied page might otherwise be written only once per checkpoint interval, the background writer might write it several times as it is dirtied in the same interval. There are some parameters can be used to tune the behavior for local needs.

Checkpointer

It handles all checkpoints. Checkpoints are automatically dispatched after a certain amount of time has elapsed since the last one, and it can be signaled to perform requested checkpoints as well. (The GUC parameter that mandates a checkpoint every so many WAL segments is implemented by having backends signal when they fill WAL segments; the checkpointer itself doesn’t watch for the condition.)

Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. (The change records were previously flushed to the WAL files.) In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk. Hence, after a checkpoint, log segments preceding the one containing the redo record are no longer needed and can be recycled or removed. (When WAL archiving is being done, the log segments must be archived before being recycled or removed.)

The checkpoint requirement of flushing all dirty data pages to disk can cause a significant I/O load. For this reason, checkpoint activity is throttled so that I/O begins at checkpoint start and completes before the next checkpoint is due to start; this minimizes performance degradation during checkpoints.

The server’s checkpointer process automatically performs a checkpoint every so often. A checkpoint is begun every checkpoint_timeout seconds, or if max_wal_size is about to be exceeded, whichever comes first. The default settings are 5 minutes and 1 GB, respectively. If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout has passed. (If WAL archiving is being used and you want to put a lower limit on how often files are archived in order to bound potential data loss, you should adjust the archive_timeout parameter rather than the checkpoint parameters.) It is also possible to force a checkpoint by using the SQL command CHECKPOINT.

Reducing checkpoint_timeout and/or max_wal_size causes checkpoints to occur more often. This allows faster after-crash recovery, since less work will need to be redone. However, one must balance this against the increased cost of flushing dirty data pages more often. If full_page_writes is set (as is the default), there is another factor to consider. To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content. In that case, a smaller checkpoint interval increases the volume of output to the WAL log, partially negating the goal of using a smaller interval, and in any case causing more disk I/O.

Checkpoints are fairly expensive, first because they require writing out all currently dirty buffers, and second because they result in extra subsequent WAL traffic as discussed above. It is therefore wise to set the checkpointing parameters high enough so that checkpoints don’t happen too often. As a simple sanity check on your checkpointing parameters, you can set the checkpoint_warning parameter. If checkpoints happen closer together than checkpoint_warning seconds, a message will be output to the server log recommending increasing max_wal_size. Occasional appearance of such a message is not cause for alarm, but if it appears often then the checkpoint control parameters should be increased. Bulk operations such as large COPY transfers might cause a number of such warnings to appear if you have not set max_wal_size high enough.

To avoid flooding the I/O system with a burst of page writes, writing dirty buffers during a checkpoint is spread over a period of time. That period is controlled by checkpoint_completion_target, which is given as a fraction of the checkpoint interval. The I/O rate is adjusted so that the checkpoint finishes when the given fraction of checkpoint_timeout seconds have elapsed, or before max_wal_size is exceeded, whichever is sooner. With the default value of 0.5, PostgreSQL can be expected to complete each checkpoint in about half the time before the next checkpoint starts. On a system that’s very close to maximum I/O throughput during normal operation, you might want to increase checkpoint_completion_target to reduce the I/O load from checkpoints. The disadvantage of this is that prolonging checkpoints affects recovery time, because more WAL segments will need to be kept around for possible use in recovery. Although checkpoint_completion_target can be set as high as 1.0, it is best to keep it less than that (perhaps 0.9 at most) since checkpoints include some other activities besides writing dirty buffers. A setting of 1.0 is quite likely to result in checkpoints not being completed on time, which would result in performance loss due to unexpected variation in the number of WAL segments needed.

Autoanalyze Daemon

PostgreSQL has an optional but highly recommended feature called autoanalyze, whose purpose is to automate the execution of ANALYZE commands. In the default configuration, autoanalyzing is enabled and the related configuration parameters are appropriately set.

The “autoanalyze daemon” actually consists of multiple processes. There is a persistent daemon process, called the autoanalyze launcher, which is in charge of starting autoanalyze worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker within each database every autoanalyze_naptime seconds. (Therefore, if the installation has N databases, a new worker will be launched every autoanalyze_naptime / N seconds.) A maximum of autoanalyze_max_workers worker processes are allowed to run at the same time. If there are more than autoanalyze_max_workers databases to be processed, the next database will be processed as soon as the first worker finishes. Each worker process will check each table within its database and execute ANALYZE as needed. log_autoanalyze_min_duration can be set to monitor autoanalyze workers’ activity.

There is no limit on how many workers might be in a single database, but workers do try to avoid repeating work that has already been done by other workers. Note that the number of running workers does not count towards max_connections or superuser_reserved_connections limits.

If the total number of tuples inserted, updated, or deleted since the last ANALYZE exceeds the “analyze threshold”, the table is analyzed. The analyze threshold is defined as:

analyze threshold = analyze base threshold + analyze scale factor * number of tuples

where the analyze base threshold is autoanalyze_base_threshold, the analyze scale factor is autoanalyze_scale_factor, and the number of tuples is pg_class.reltuples.

Temporary tables cannot be accessed by autoanalyze. Therefore, appropriate analyze operations should be performed via session SQL commands.

The default thresholds and scale factors are taken from postgresql.conf, but it is possible to override them (and many other autoanalyze control parameters) on a per-table basis; see Storage Parameters for more information. If a setting has been changed via a table’s storage parameters, that value is used when processing that table; otherwise the global settings are used. See Automatic Analyzing Parameters for more details on the global settings.

Logical Replication Daemon

The “logical replication daemon” actually consists of multiple processes. There is a persistent daemon process, called the logical replication launcher, which is in charge of starting logical replication worker processes for all databases. Immediately after the database instance startups, the launcher immediately starts the worker process to replicate for every enabled subscription. While the instance is running, the launcher will check for new created subscriptions, attempting to start one worker to replicate for every new subscription within each database. A maximum of max_logical_replication_workers worker processes are allowed to run at the same time. If there are more than max_logical_replication_workers subscriptions to be processed, the left subscriptions will not be processed as there are no more free worker processes.

The subscriber also requires the max_replication_slots be set to configure how many replication origins can be tracked. In this case it should be set to at least the number of subscriptions that will be added to the subscriber. max_logical_replication_workers must be set to at least the number of subscriptions, again plus some reserve for the table synchronization. Additionally the max_worker_processes may need to be adjusted to accommodate for replication workers, at least (max_logical_replication_workers + 1). Note that some extensions and parallel queries also take worker slots from max_worker_processes.

Undo Maintenance Daemon

The “undo maintenance daemon” actually consists of multiple processes. There is a persistent daemon process, called the undo launcher, which is in charge of starting undo worker processes for all databases. Immediately after the database instance startups, the launcher immediately starts the worker process to maintain the active undo segments in the instance. While the instance is running, the launcher will distribute the work across time, attempting to start one worker to maintain active undo segments within each database every 60 minutes. A maximum of 5 worker processes are allowed to run at the same time. If there are more than 5 undo segments to be processed, the next undo segment will be processed as soon as the first worker finishes. Each worker process will check transaction table in the undo segment and execute ROLLBACK when found an orphan transaction.

At the same time, the undo maintenance daemon is also responsible for cleaning up the database Recycle Bin. The launcher starts worker processes to clean up dropped objects (such as tables, indexes, constraints, types) in the Recycle Bin.

WAL Writer

There is a separate server process called the WAL Writer, It attempts to keep regular backends from having to write out (and fsync) WAL pages. Also, it guarantees that transaction commit records that weren’t synced to disk immediately upon commit (ie, were “asynchronously committed”) will reach disk within a knowable time — which, as it happens, is at most three times the wal_writer_delay cycle time.

Note that as with the background writer for shared buffers, regular backends are still empowered to issue WAL writes and fsyncs when the walwriter doesn’t keep up. This means that the WAL writer is not an essential process and can shutdown quickly when requested.