CPU
This event occurs when a process is active in CPU or is waiting for CPU.
The central processing unit (CPU) is the component of a computer that runs instructions. For example, CPU instructions perform arithmetic operations and exchange data in memory. If a query increases the number of instructions that it performs through the database engine, the time spent running the query increases. CPU scheduling is giving CPU time to a process. Scheduling is orchestrated by the kernel of the operating system.
This CPU
wait event indicates that a backend process is active in CPU or is waiting for CPU. You know that it’s occurring when a query shows the following information:
- The
pg_stat_activity.state
column has the valueactive
. - The
wait_event_type
andwait_event
columns inpg_stat_activity
are bothnull
.
To see the backend processes that are using or waiting on CPU, run the following query.
SELECT * FROM pg_stat_activity
WHERE state = 'active'
AND wait_event_type IS NULL
AND wait_event IS NULL;
From an operating system perspective, the CPU is active when it isn’t running the idle process. The CPU is active while it performs a computation, but it’s also active when it waits on memory I/O. This type of I/O dominates a typical database workload.
When the CPU wait event occurs more than normal, possibly indicating a performance problem, typical causes include the following.
The most likely causes of sudden spikes are as follows:
- Your application has opened too many simultaneous connections to the database. This scenario is known as a “connection storm.”
- Your application workload changed in any of the following ways:
- New queries
- An increase in the size of your dataset
- Index maintenance or creation
- New functions
- New operators
- An increase in parallel query execution
- Your query execution plans have changed. In some cases, a change can cause an increase in buffers. For example, the query is now using a sequential scan when it previously used an index. In this case, the queries need more CPU to accomplish the same goal.
The most likely causes of events that recur over a long period:
- Too many backend processes are running concurrently on CPU. These processes can be parallel workers.
- Queries are performing suboptimally because they need a large number of buffers.
If none of the likely causes turn out to be actual causes, the following situations might be occurring:
- The CPU is swapping processes in and out.
- CPU context switching has increased.
- PostgreSQL code is missing wait events.
If the CPU
wait event dominates database activity, it doesn’t necessarily indicate a performance problem. Respond to this event only when performance degrades.
Use system management tools (such as the top command for Linux, Task Manager for Windows, etc.) to check the CPU usage of the database process, and investigate whether the database is causing the CPU to increase.
Your action depends on whether the number of connections increased or decreased during the period of increased CPU wait events. To know the current number of connections, run the following query.
SELECT count(*) FROM pg_stat_activity;
If the number of connections went up, compare the number of backend processes consuming CPU to the number of CPUs on the host. The following scenarios are possible:
-
The number of backend processes consuming CPU is less than the number of CPUs on the host.
In this case, the number of connections isn’t an issue. However, you might still try to reduce CPU utilization.
-
The number of backend processes consuming CPU is greater than the number of CPUs on the host.
In this case, consider the following options:
- Decrease the number of backend processes connected to your database. For example, implement a connection pooling solution such as pgbouncer.
- Upgrade your host hardware specifications to get a higher number of CPUs.
You can use pg_stat_statements
or turning on log_min_duration_statement
, to look for a correlation between top SQL statements and CPU usage. The following scenarios are possible:
-
CPU usage and top SQL statements are correlated.
In this case, find the top SQL statements that are linked to the CPU usage, and look for plan changes. You can use either of the following techniques:
- Examine statements that are linked to the CPU usage to see whether they can use less CPU. Run an
EXPLAIN
command, explain the plans manually and compare them to the expected execution plan. And focus on the plan nodes that have the most impact. - Look for an increase in block hits per second and local block hits per second.
- Examine statements that are linked to the CPU usage to see whether they can use less CPU. Run an
-
CPU usage and top SQL statements aren’t correlated.
In this case, determine whether any of the following occurs:
-
The application is rapidly connecting to and disconnecting from the database.
Diagnose this behavior by turning on
log_connections
andlog_disconnections
, then analyzing the PostgreSQL logs. -
The OS is overloaded.
In this case, the backend processes are consuming CPU for a longer time than usual. If the operating system is overloaded, you can use system management tools to look at the process list and the percentage of CPU consumed by each process.
-
If your workload has changed, look for the following types of changes:
-
New queries
Check whether the new queries are expected. If so, ensure that their execution plans and the number of executions per second are expected.
-
An increase in the size of the data set
Determine whether partitioning, if it’s not already implemented, might help. This strategy might reduce the number of pages that a query needs to retrieve.
-
Index maintenance or creation
Check whether the schedule for the maintenance is expected. A best practice is to schedule maintenance activities outside of peak activities.
-
New functions
Check whether these functions perform as expected during testing. Specifically, check whether the number of executions per second is expected.
-
New operators
Check whether they perform as expected during the testing.
-
An increase in running parallel queries
Determine whether any of the following situations has occurred:The relations or indexes involved have suddenly grown in size so that they differ significantly from
min_parallel_table_scan_size
ormin_parallel_index_scan_size
.Recent changes have been made toparallel_setup_cost
orparallel_tuple_cost
.Recent changes have been made tomax_parallel_workers
ormax_parallel_workers_per_gather
.