pgCenter’s wait events profiler

As you might know, in the last pgCenter release the new tool has been added - wait events profiler. In this post, I'd like to explore this tool and propose some use-cases for it.

First of all, what are “wait events”? PostgreSQL official documentation doesn't give an explanation for wait events (it just provides a list of all wait events). In short, wait events are points in time where backends have to wait until a particular event occurs. This may be waiting for obtaining locks, IO, inter-process communication, interacting with client or something else. Stats about wait events are provided by pg_stat_activity view in wait_event_type and wait_event columns.

Using EXPLAIN we always can understand what query does. But EXPLAIN is aimed to work with query planner and doesn't show time when query got stuck in waitings. For this, you can use pgCenter's wait events profiler.

How does it work? First, you need to know PID of profiled backend. It can be found using pg_stat_activity, or, if you're connected to Postgres directly, with pg_backend_pid(). Next, in the second terminal, run 'pgcenter profile' and pass backend PID as an argument. That’s it. pgCenter connects to Postgres and using wait events stats from pg_stat_activity will start collecting data. When query finishes, pgCenter shows you distribution of wait events, like this:



$ pgcenter profile -U postgres -P 19241
LOG: Profiling process 19241 with 10ms sampling
------ ------------ -----------------------------
% time      seconds wait_event         query: update t1 set a = a + 100;
------ ------------ -----------------------------
72.15     30.205671 IO.DataFileRead
20.10      8.415921 Running
5.50       2.303926 LWLock.WALWriteLock
1.28       0.535915 IO.DataFileWrite
0.54       0.225117 IO.WALWrite
0.36       0.152407 IO.WALInitSync
0.03       0.011429 IO.WALInitWrite
0.03       0.011355 LWLock.WALBufMappingLock
------ ------------ -----------------------------
99.99     41.861741

In this example, a massive UPDATE is profiled. The query took around 40 seconds, and most of the time has been spent on reading data from data files. The 'Running' entry shows length of the time when query was active doing “real” work. The remaining time was spent doing things related to writing WAL and data files.

Let's look at the example with VACUUM FULL. Obviously, the majority of the time would be spent on IO blocking.
------ ------------ -----------------------------
% time      seconds wait_event             query: vacuum full t1;
------ ------------ -----------------------------
81.77     16.697358 IO.DataFileRead
12.24      2.500114 Running
2.36       0.480942 IO.DataFileImmediateSync
1.62       0.330561 LWLock.WALWriteLock
0.78       0.158484 IO.WALWrite
0.73       0.148711 IO.DataFileExtend
0.34       0.068863 IO.DataFileWrite
0.11       0.022844 IO.BufFileWrite
0.06       0.011417 IO.BufFileRead
------ ------------ -----------------------------
100.00    20.419294

Another example is parallel query. As you can see 14% of the time is spent on communication between master process and its children. Unfortunately, profile tool doesn't account wait events of parallel workers because there is no guaranteed way to associate background parallel workers with their master process.
------ ------------ -----------------------------
% time      seconds wait_event        query: select * from t1 order by a;
------ ------------ -----------------------------
44.94      8.153308 Running
34.79      6.310833 IO.BufFileRead
14.01      2.541087 IPC.MessageQueueReceive
3.51       0.637213 IO.BufFileWrite
2.33       0.422113 IO.DataFileRead
0.25       0.045842 IPC.ParallelFinish
0.13       0.023498 Client.ClientWrite
------ ------------ -----------------------------
99.96     18.133894

Let me summarize some key points about pgCcenter's profiler:
  • it is there for profiling long queries, completion of which takes seconds;
  • it doesn't count the waiting time of parallel workers.
  • internally, it works with pg_stat_activity;
  • the texts of long queries may be truncated in the output, because it might break wait events table;
  • profiler is configurable, see built-in for help.
This post gave you an overview of the new pgCenter's wait events profiler that can be used for observing long queries and understanding what they spent their time on. If you haven’t tried it yet, I’d suggest you download pgCenter and try it on your long queries. Let me know what you think!

Comments

You might also be interested in

Deep dive into Postgres stats: pg_stat_progress_vacuum

Deep dive into postgres stats: pg_stat_bgwriter

Deep dive into postgres stats: pg_stat_replication