kopongo.com

home

Profiling Postgres with PQA

10 Oct 2008

If you need to understand what your database is really doing, you're going to need to profile its activity. If you are using postgres, then PQA (Practical Query Analyzis) is your best bet.

PQA takes data from your production logs and creates a report for you. First you need to get logs from your database. Chances are, you have disabled full statement logging in your db, so you need to turn those back on.

I find it easiest to have the PQA settings settings commented-out in postgres.conf file. When you want to run profiling, you uncomment them for the duration of the run.

# Standard settings:
log_duration = false
log_min_duration_statement = -1

# PQA Settings 
#log_duration = true
#log_min_duration_statement = 0

You need to install the PQA. It's a regular gem, so you can install it like so:

sudo gem install pqa

Now you need to gather the logs. First, you need to enable the logging of the database statements on the database, as shown in the settings above.

Second, we tell postgres to re-read its config file. Like this:

% psql -U postgres
postgres -> select pg_reload_conf();

Telling it to reload the config will not take down the DB, but it will re-read postgres.conf, and change any logging-related behavior. Postgres is now logging. Check the logs to make sure you did not make a mistake. If everything is good, tell it to rotate logfiles, so you don't run your profiling with a half-empty one.

% psql -U postgres
postgres -> select pg_rotate_logfile();

You're now collecting data. How long you collect data for depends on how predictable the traffic is. One hour of logs can usually tell you quite a bit.

After enough time has passed, re-comment those statements in postgres.conf, and once again tell postgres to reload the config file, and rollover the log files.

It's likely that in your data collection run, many files were generated. In this case, you need to collate them back together. For postgres files following the standard naming convention (postgresql-YYYY-MM-DD-HHMMSS.log) I've got a hacked-up script for this:

files = Dir.glob("postgres*.log")
sorted = files.sort_by do |file|
  reg = Regexp.new(/postgresql-2008-\d\d-\d\d_(\d+).log/)
  match = reg.match file
  number = match[1]
  number.to_i
end

sorted.each do |f|
  cmd = "cat #{f} >> very_big_postgres_log.log"
  `#{cmd}`
end

Now you have a single big logfile. The format of data accepted by PQA is here. Postgres 8.2 logging seems to have log incompatibilities with PQA. A tiny bit of log massaging needs to be done. Basically, you need to take out the timezone from the log line. You can do this on the command line with cut, or with ruby:

% ruby -p -e '$_.gsub!(/PDT/, "")' very_big_postgres_log.log > fixed_postgres_log.log

Then you do the analysis:

%  pqa -normalize -logtype pglog -file fixed_postgres_log.log