Deep dive into postgres stats: pg_stat_user_indexes

Everything you always wanted to know about Postgres stats

One of the major strengths of PostgreSQL are indexes - they are flexible and cover almost all use cases - different access methods allow indexing wide range of data types. However, with these great benefits, there are also some disadvantages:
  • Obviously, every index consumes disk space and for large tables indexes might be large too, especially indexes based on whole columns, however even in this case PostgreSQL offers special BRIN indexes, but unfortunately they aren't as general purpose as btree indexes and don't fit for majority of use cases.
  • Another, less obvious disadvantage is that indexes existence implies additional write operations when data are updated within tables - all index entries that point to the affected table's rows must be updated too. For some indexes, GIN for example, update operations are very expensive.
Due to these reasons, data modification transactions on tables with indexes become slower, and then the more indexes table has, the slower its operations.
This creates additional work for VACUUM, of course. Every index should be maintained, old version of entries must be cleaned hence, VACUUM operations take longer.

All these reasons related to tables with high number of data modification operations, like INSERT/UPDATE/DELETE and read-intensive tables are less susceptible to those side-effects. It's important to understand that these aspects might significantly influence performance, so general recommendation is to avoid unused indexes. Don't build indexes for every columns or "popular" columns - look at real queries, explain them and if they really require an index, build it.

One of the many duties of a DBA is timely detection of unused indexes and their removal. The main tool used here is the pg_stat_user_indexes and accessory pg_statio_user_indexes. These views used for estimation of indexes usage. The pg_stat_user_indexes contains idx_scan field - information on number of times particular index has been used. pg_statio_user_indexes contains details on IO related to the indexes - number of pages read from shared buffers or from disk - this information might be useful in cases when several tablespaces are used with different performance characteristics and you should decide what indexes to place and where they should be placed.

Thus, indexes with zero idx_scan are the main candidates for removal. Also, I'd recommend before index deletion to make sure that the stats from pg_stat_user_indexes are collected over a prolonged period. Perhaps there are indexes which aren't used often, per month, for example, when doing analytics reports. Also it is possible to reset stats periodically and recheck indexes usage. In case of streaming replication, it should be checked on all hosts in the cluster. For resetting stats and checking when it was last time reset use the pg_stat_reset() function and pg_stat_database.stats_reset field.

Now let me show you some real-life queries which designed for finding unused indexes.

Here is the first one from DataEgret toolkit which shows not only the idx_scan and also information about write activity of related table and its size.

The second, isn't an only query, it's an article from Josh Berkus where he introduced his "Finding Unused Indexes Query" which searches indexes and puts them into special categories - I strongly recommend to read this article, check the query and familiarize with advanced techniques used there.

And third, is the PostgreSQL's wiki page related to finding unused indexes, it's also a good point to continue investigate index maintenance topic further.

Thank you for reading and if you have any questions let me know.

Comments

  1. For heavily-modified tables, it might be worth creating the index just before running the monthly report, then dropping it. In that case, you don't have the index maintenance hitting you all month.

    ReplyDelete

Post a Comment

Popular Posts