Redrock Postgres Documentation
Home Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Using EXPLAIN

PostgreSQL devises a query plan for each query it receives. Choosing the right plan to match the query structure and the properties of the data is absolutely critical for good performance, so the system includes a complex planner that tries to choose good plans. You can use the EXPLAIN command to see what query plan the planner creates for any query.

EXPLAIN Basics

The structure of a query plan is a tree of plan nodes. Nodes at the bottom level of the tree are scan nodes: they return raw rows from a table. There are different types of scan nodes for different table access methods: sequential scans, index scans, and bitmap index scans. There are also non-table row sources, such as VALUES clauses and set-returning functions in FROM, which have their own scan node types. If the query requires joining, aggregation, sorting, or other operations on the raw rows, then there will be additional nodes above the scan nodes to perform these operations. Again, there is usually more than one possible way to do these operations, so different node types can appear here too. The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type and information. Additional lines might appear, indented from the node’s summary line, to show additional properties of the node.

Here is a trivial example, just to show what the output looks like:

EXPLAIN (costs off)
  SELECT * FROM t WHERE id < 1000
    ORDER BY id LIMIT 10;

            QUERY PLAN
-----------------------------------
 Limit
   ->  Sort
         Sort Key: id
         ->  Seq Scan on t
               Filter: (id < 1000)

For details of specific plan nodes, see Query Plan.

Cost Estimation

The SQL command EXPLAIN has a parameter COSTS that controls the cost estimate of the query plan node to be displayed in the output. It includes information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. This parameter defaults to TRUE. For example:

EXPLAIN
  SELECT * FROM t WHERE id < 1000
    ORDER BY id LIMIT 10;

                             QUERY PLAN
---------------------------------------------------------------------
 Limit  (cost=22939.84..22939.87 rows=10 width=69)
   ->  Sort  (cost=22939.84..22942.48 rows=1057 width=69)
         Sort Key: id
         ->  Seq Scan on t  (cost=0.00..22917.00 rows=1057 width=69)
               Filter: (id < 1000)

Format of output: Plan Node (cost=Startup Cost..Total Cost rows=Plan Rows width=Plan Width)。

Startup Cost

The estimated cost of returning the first row.

This is in an arbitrary unit, with the aim of being correlated to the startup time.

Total Cost

The estimated cost of returning all rows, by the operation and its descendents.

The Postgres query planner often has several different ways it could resolve the same query. It calculates a cost — which is hopefully correlated with the amount of time taken — for each potential plan, and then picks the one with the smallest cost. It’s worth bearing in mind that the costs are unit-free — they’re not designed to convert into time, or disk reads. They’re just supposed to be bigger for slower operations, and smaller for faster ones.

If the planner thinks it can finish early due to a parent Limit operation, the total cost of the Limit operation will reflect this, but the total cost of its child operations will not.

Plan Rows

The number of rows, per-loop, that the planner expects to be returned by the operation.

Poor row estimates can lead to suboptimal query planning. They can often be improved by running ANALYZE (not the EXPLAIN parameter), by increasing statistics, or by allowing Postgres to understand correlation between columns using multivariate statistics.

If the planner thinks it can finish early due to a Limit operation, the plan rows of the Limit operation will reflect this, but the plan rows of its child operations will not.

Plan Width

The estimated average size of each row returned by the operation, in bytes.

Execution Statistics

The SQL command EXPLAIN has a parameter ANALYZE that carrys out the command and show actual run times and other statistics. This parameter defaults to FALSE.

By running the command for real, we can see which parts of the query really were expensive, and can compare these to their estimated values. For example:

EXPLAIN (analyze)
  SELECT * FROM t WHERE id < 1000
    ORDER BY id LIMIT 10;

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22939.84..22939.87 rows=10 width=69) (actual time=55.361..55.362 rows=10 loops=1)
   ->  Sort  (cost=22939.84..22942.48 rows=1057 width=69) (actual time=55.360..55.360 rows=10 loops=1)
         Sort Key: id
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Seq Scan on t  (cost=0.00..22917.00 rows=1057 width=69) (actual time=0.020..55.297 rows=999 loops=1)
               Filter: (id < 1000)
               Rows Removed by Filter: 999001
 Planning time: 0.061 ms
 Execution time: 55.385 ms

Format of output: (actual time=Actual Startup Time..Actual Total Time rows=Actual Rows loops=Actual Loops)。

Actual Startup Time

The amount of time, in milliseconds, it takes to get the first row out of the operation.

Sometimes, this is very close to the setup time — for example on a sequential scan which returns all the rows in a table.

Other times, though, it’s more or less the total time. For example, in order to return the first row of a sort, you have to sort all of the rows to work out which one comes first.

Actual Total Time

The actual amount of time in milliseconds spent on this operation and all of its children. It’s a per-loop average, rounded to the nearest thousandth of a millisecond.

Working out individual operation timings, especially where CTEs and SubPlans are concerned, can be tricky.

Actual Rows

The number of rows returned by the operation per loop.

The important thing to notice about this is that it’s an average of all the loops executed, rounded to the nearest integer. This means that while “Actual Loops” multiplied by “Actual Rows” is a pretty good approximation for total number of rows returned most of the time, it can be off by as much as half of the number of loops. Usually it’s just slightly confusing as you see a row or two appear or disappear between operations, but on operations with lots of loops there is potential for serious miscalculation.

Poor row estimates, where Actual Rows differs substantially from Plan Rows, can lead to suboptimal query planning. Estimates can often be improved by running ANALYZE (not the EXPLAIN parameter), by increasing statistics, or by allowing Postgres to understand correlation between columns using multivariate statistics.

Actual Loops

The number of times the operation is executed. For a lot of operations it will have a value of one, but when it is not, there are three different cases:

  1. Some operations can be executed more than once. For example, “Nested Loops” run their “Inner” child once for every row returned by their “Outer” child.
  2. When an operation that would normally only consist of one loop is split across multiple processes, each partial operation is counted as a Loop.
  3. The number of loops can be zero when an operation doesn’t need to be executed at all. For example if a table read is planned to provide candidates for an inner join, but there turns out to be no rows on the other side of the join, the operation can be effectively eliminated.

I/O Statistics

The SQL command EXPLAIN has a parameter BUFFERS that controls the buffer usage of the query plan node to be displayed in the output. This parameter may only be used when ANALYZE is also enabled. It defaults to FALSE.

Here is a simple example:

EXPLAIN (analyze, buffers, costs off)
  SELECT * FROM t WHERE id < 1000
    ORDER BY id LIMIT 10;

                               QUERY PLAN
------------------------------------------------------------------------
 Limit (actual time=86.844..86.845 rows=10 loops=1)
   Buffers: shared read=3473
   ->  Sort (actual time=86.842..86.843 rows=10 loops=1)
         Sort Key: id
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared read=3473
         ->  Seq Scan on t (actual time=0.344..86.717 rows=999 loops=1)
               Filter: (id < 1000)
               Rows Removed by Filter: 999001
               Buffers: shared read=3473
 Planning time: 5.658 ms
 Execution time: 86.971 ms

Format of output:

Buffers: shared hit=Shared Hit read=Shared Read dirtied=Shared Dirtied written=Shared Written, local hit=Local Hit read=Local Read dirtied=Local Dirtied written=Local Written, temp read=Temp Read written=Temp Written

I/O Timings: read=I/O Read Time write=I/O Write Time

Metrics are displayed only when the statistic counters are not zero.

Shared Hit

The number of blocks from indexes/tables that were found in the shared buffer area.

This includes values for child operations and is a total value (i.e. not per-loop).

Once you have subtracted child operation values, the proportion of shared hit to shared read blocks is an interesting measure of cache performance.

The ratio of shared blocks (hit plus read) to rows processed (including any filtered out) is also interesting: a high ratio means Postgres is reading a lot of data for what it is doing, which can be a sign of bloat.

Shared Read

The number of blocks from tables/indexes that were read from disk (or the OS cache).

This includes values for child operations and is a total value (i.e. not per-loop).

Once you have subtracted child operation values, the proportion of shared hit to shared read blocks is an interesting measure of cache performance.

The ratio of shared blocks (hit plus read) to rows processed (including any filtered out) is also interesting: a high ratio means Postgres is reading a lot of data for what it is doing, which can be a sign of bloat.

Shared Dirtied

The number of blocks from tables/indexes that were modified by the operation.

This includes values for child operations and is a total value (i.e. not per-loop).

A SELECT query can report shared dirtied blocks when setting hint bits on recently-committed rows.

Shared Written

The number of blocks from tables/indexes that were evicted from the shared buffer area.

This includes values for child operations and is a total value (i.e. not per-loop).

Local Hit

The number of blocks from temporary tables and indexes that were found in the cache.

This includes values for child operations and is a total value (i.e. not per-loop).

Local Read

The number of blocks read from temporary tables and indexes.

This includes values for child operations and is a total value (i.e. not per-loop).

Local Dirtied

The number of blocks from temporary tables and indexes that have been modified by the operation.

This includes values for child operations and is a total value (i.e. not per-loop).

Local Written

The number of blocks from temporary tables and indexes that were evicted from the local buffer area.

This includes values for child operations and is a total value (i.e. not per-loop).

Temp Read

The number of blocks read from short-term data used to calculate hashes, sorts, Materialize operations, and similar.

This includes values for child operations and is a total value (i.e. not per-loop).

Your setting of work_mem determines how much memory is available to Postgres per operation.

Temp Written

The number of blocks from short-term data (used to calculate hashes, sorts, Materialize operations, and similar) that were evicted from the work memory area.

This includes values for child operations and is a total value (i.e. not per-loop).

Your setting of work_mem determines how much memory is available to Postgres per operation.

I/O Read Time

The actual amount of time spent reading blocks, in milliseconds.

This includes values for child operations and is a total value (ie not per-loop).

Requires track_io_timing and BUFFERS to be on, but is unaffected by parameter TIMING.

I/O Write Time

The actual amount of time spent writing blocks, in milliseconds.

This includes values for child operations and is a total value (ie not per-loop).

Requires track_io_timing and BUFFERS to be on, but is unaffected by parameter TIMING.