CPU

This event occurs when a process is active in CPU or is waiting for CPU.

Context

The central processing unit (CPU) is the component of a computer that runs instructions. For example, CPU instructions perform arithmetic operations and exchange data in memory. If a query increases the number of instructions that it performs through the database engine, the time spent running the query increases. CPU scheduling is giving CPU time to a process. Scheduling is orchestrated by the kernel of the operating system.

How to tell when this wait occurs

This CPU wait event indicates that a backend process is active in CPU or is waiting for CPU. You know that it’s occurring when a query shows the following information:

The pg_stat_activity.state column has the value active.
The wait_event_type and wait_event columns in pg_stat_activity are both null.

To see the backend processes that are using or waiting on CPU, run the following query.

SELECT * FROM pg_stat_activity
  WHERE state = 'active'
    AND wait_event_type IS NULL
    AND wait_event IS NULL;

Likely cause of CPU scheduling

From an operating system perspective, the CPU is active when it isn’t running the idle process. The CPU is active while it performs a computation, but it’s also active when it waits on memory I/O. This type of I/O dominates a typical database workload.

Likely causes of increased waits

When the CPU wait event occurs more than normal, possibly indicating a performance problem, typical causes include the following.

Likely causes of sudden spikes

The most likely causes of sudden spikes are as follows:

Your application has opened too many simultaneous connections to the database. This scenario is known as a “connection storm.”
Your application workload changed in any of the following ways:
- New queries
- An increase in the size of your dataset
- Index maintenance or creation
- New functions
- New operators
- An increase in parallel query execution
Your query execution plans have changed. In some cases, a change can cause an increase in buffers. For example, the query is now using a sequential scan when it previously used an index. In this case, the queries need more CPU to accomplish the same goal.

Likely causes of long-term high frequency

The most likely causes of events that recur over a long period:

Too many backend processes are running concurrently on CPU. These processes can be parallel workers.
Queries are performing suboptimally because they need a large number of buffers.

Corner cases

If none of the likely causes turn out to be actual causes, the following situations might be occurring:

The CPU is swapping processes in and out.
CPU context switching has increased.
PostgreSQL code is missing wait events.

Actions

If the CPU wait event dominates database activity, it doesn’t necessarily indicate a performance problem. Respond to this event only when performance degrades.

Investigate whether the database is causing the CPU increase

Use system management tools (such as the top command for Linux, Task Manager for Windows, etc.) to check the CPU usage of the database process, and investigate whether the database is causing the CPU to increase.

Determine whether the number of connections increased

Your action depends on whether the number of connections increased or decreased during the period of increased CPU wait events. To know the current number of connections, run the following query.

SELECT count(*) FROM pg_stat_activity;

The connections increased

If the number of connections went up, compare the number of backend processes consuming CPU to the number of CPUs on the host. The following scenarios are possible:

The number of backend processes consuming CPU is less than the number of CPUs on the host.

In this case, the number of connections isn’t an issue. However, you might still try to reduce CPU utilization.
The number of backend processes consuming CPU is greater than the number of CPUs on the host.

In this case, consider the following options:
- Decrease the number of backend processes connected to your database. For example, implement a connection pooling solution such as pgbouncer.
- Upgrade your host hardware specifications to get a higher number of CPUs.

The connections didn’t increase

You can use pg_stat_statements or turning on log_min_duration_statement, to look for a correlation between top SQL statements and CPU usage. The following scenarios are possible:

CPU usage and top SQL statements are correlated.

In this case, find the top SQL statements that are linked to the CPU usage, and look for plan changes. You can use either of the following techniques:
- Examine statements that are linked to the CPU usage to see whether they can use less CPU. Run an EXPLAIN command, explain the plans manually and compare them to the expected execution plan. And focus on the plan nodes that have the most impact.
- Look for an increase in block hits per second and local block hits per second.
CPU usage and top SQL statements aren’t correlated.

In this case, determine whether any of the following occurs:
- The application is rapidly connecting to and disconnecting from the database.
  
  Diagnose this behavior by turning on log_connections and log_disconnections, then analyzing the PostgreSQL logs.
- The OS is overloaded.
  
  In this case, the backend processes are consuming CPU for a longer time than usual. If the operating system is overloaded, you can use system management tools to look at the process list and the percentage of CPU consumed by each process.

Respond to workload changes

If your workload has changed, look for the following types of changes:

New queries

Check whether the new queries are expected. If so, ensure that their execution plans and the number of executions per second are expected.
An increase in the size of the data set

Determine whether partitioning, if it’s not already implemented, might help. This strategy might reduce the number of pages that a query needs to retrieve.
Index maintenance or creation

Check whether the schedule for the maintenance is expected. A best practice is to schedule maintenance activities outside of peak activities.
New functions

Check whether these functions perform as expected during testing. Specifically, check whether the number of executions per second is expected.
New operators

Check whether they perform as expected during the testing.
An increase in running parallel queries

Determine whether any of the following situations has occurred:The relations or indexes involved have suddenly grown in size so that they differ significantly from min_parallel_table_scan_size or min_parallel_index_scan_size.Recent changes have been made to parallel_setup_cost or parallel_tuple_cost.Recent changes have been made to max_parallel_workers or max_parallel_workers_per_gather.