Warning: pgstat wait timeout... pgstat what???

pgstat wait timeout, what does it mean?

Today I would like to tell about warning message "pgstat wait timeout". Pretty uninformative message, right?

If you see this error in the PostgreSQL logs, it means that the statistics collector does not work. Meanwhile, statistics plays a very important role in PostgreSQL. Query planner uses statistics to choose the optimal execution plan for the query. If you have broken statistics, the query planner might build the wrong plans. Wrong query plans usually cause bad performance.

Let's try to figure out why that happens.

The first thing I would like to briefly recall is about statistics collector. There is a dedicated database process - stats collector process. This process collects information on the activity within the database. In particular, collector gather statistical data about the tables, indexes, user-defined functions, the number of rows in tables, about how many blocks read from disk or from the buffers. It also collects information about operations like vacuum or analyze.
Besides the query planner, information provided by stats collector is accessible by the DBAs via pg_ctalog and no up-to-date information if stats collector does not work can be confusing.

There are several configuration parameters, responsible for the statistics collector (according to PostgreSQL 9.4 documentation):
The parameter track_activities enables monitoring of the current command being executed by any of server processes.
The parameter track_counts controls whether statistics are collected about table and index accesses.
The parameter track_functions enables tracking of usage of user-defined functions.
The parameter track_io_timing enables monitoring of block read and write times.

Finally, the statistics collector stores statistic data in temporary files. Directory where the files are located can be set using stats_temp_directory parameter in postgresql.conf. By default, this is pg_stat_tmp directory in the cluster data directory. Every time you turn the database server off or restart it, a snapshot of the current statistics will be stored in the pg_stat directory.

Now, back to our warning message "pgstat wait timeout". This error can occur in the following cases:

Huge IO. 
In the case when the storage subsystem is overloaded, autovacuum worker processes can not write statistics data into "stats_temp_directory" and we can see these WARNING messages. iostat utility from sysstat package can be used to confirm this hypothesis. This utility provides information about drives utilization. The value about 90-100% of utilization suggests you to consider this version.

Invalid stats_temp_directory. 
The second possibility, the least likely, is a wrong path to the temporary directory where statistics is stored. For example, temporary storage use RAM disk which for some reason is not available. In such situation, you should check the value of stats_temp_directory and then see how things are going with this directory at the file system level. Make sure it exists and permissions are correct. If the parameter specifies a non-existent directory, you can either create this directory manually, or specify a new one in the stats_temp_directory option and just reload (restart is not required) postgresql service.

Invalid Localhost IP.
The third case, the most common, occurs when an operations engineer makes changes to the network configuration. The most frequent case - disable ipv6 via sysctl. What happens in this situation? When you disable ipv6, all ipv6 addresses disappear (localhost for example). PostgreSQL can not handle such situation and that makes impossible to update and keep fresh statistics. As a result, we can see WARNING messages. The solution in this situation is that we should rollback the changes in the network configuration back, check the /etc/hosts configuration file that it contains the correct values, enable ipv6. Or alternatively just restart postrgesql service.

As you can see, there are not a lot of issues, which can cause this problem and it is an easy task to check and fix them.
And last, starting with PostgreSQL 9.4.1 message "pgstat wait timeout" was renamed to a more informative and understandable "using stale statistics instead of current ones because stats collector is not responding". And the severity of the message is reduced from WARNING to LOG. Thanks Tom Lane.


Popular Posts