a9s PostgreSQL Performance Considerations
Caching
PostgreSQL uses a combination of techniques to keep data often used in memory. PostgreSQL has its "own" memory segment and it also relies on the OS's caching to avoid disk I/O.
This blog post gives a good, brief overview: Understanding caching in PostgreSQL - An in-depth guide
This blog post introduces several PostgreSQL modules useful for system and query performance analysis (with bad English and it is slightly outdated though): Caching in PostgreSQL
Monitoring
Several modules exist that make performance monitoring easier in PostgreSQL.
pg_autoexplain
The auto_explain module provides a means for logging execution plans of slow statements automatically, without having to run EXPLAIN by hand. This is especially helpful for tracking down un-optimized queries in large applications.
What is an execution plan? When computing a query PostgreSQL creates execution plans and chooses the "quickest" amongst them. See the documentation for EXPLAIN for more information. Also, this chapter might be relevant Chapter 61. How the Planner Uses Statistics.
pg_stat_statements
The pg_stat_statements module provides a means for tracking execution statistics of all SQL statements executed by a server.
The pg_stat_statements module is certainly interesting for our Logstash plugin because it will allow us to monitor PostgreSQL service instance and e.g. give a list of the "ten slowest queries" and some hints why they are so slow. See the blogpost Understanding caching in PostgreSQL - An in-depth guide for a usage example.
pg_buffercache
The pg_buffercache module provides a way to view PostgreSQL's internal cache. It could be interesting for our Logstash plugin.
pgfincore
The pgfincore is similar to pg_buffercache but instead works on OS caches and thus we can figure out which of PostgreSQL's pages are currently in memory.
pg_prewarm
The pg_prewarm module provides a convenient way to load relation data into either the operating system buffer cache or the PostgreSQL buffer cache.
pg_hibernator
This PostgreSQL extension is a set-it-and-forget-it solution to save and restore the PostgreSQL shared-buffers contents, across PostgreSQL server restarts.
The module stores "block identifiers" on disk and loads those blocks into PostgreSQL's shared buffer after a restart. This means that "not a lot" of additional permanent storage space is required.
PostgreSQL configuration
shared_buffers
In this blogpost Tuning shared_buffers and wal_buffers Robert Haas (a PostgreSQL contributor) argues why the configuration option shared_buffers should be kept between 25% and 40% of the host's total RAM. Note that the blogpost is from 2012 so when he says "not more than 8GB" that should be regarded as outdated.
He also gives an explanation on how wal_buffers should be configured.
effective_cache_size
The configuration option effective_cache_size is used by PostgreSQL for its execution plans. It uses the value of effective_cache_size to determine whether to do a table scan or an index scan. It should reflect the size of shared_buffers plus the OS's page cache. Default: 4GB.
See this page for more information.
Further reading
a9s PostgreSQL custom parameters
Documentation of which values can be changed within a9s PostgreSQL.
Inside PostgreSQL buffers
Greg Smith describes the internals of data organization and caching for PostgreSQL 8.3 which is still applicable today in Inside PostgreSQL buffers
Understand cache hit rate and index usage
Craig Kertiens writes about how to determine cache hit rates (which could be exposed to Logstash) and index usage: http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance