Cleaning up postgres
When you use postgres, every now and then, you need to clean out your database. Of course, as the project moves on, the tables that you need to clean change as well: some get added, some dropped. And then there's the indexes.
I usually do not get around to updating the cleaning scripts unless I notice something is wrong with the database, and then I have to deal with figuring out all the new tables and such.
This can be a lot simpler if you use the postgres metadata tables. The 'information_schema.tables' table has the list of all tables for a specific database. You can therefore get a list of tables with a simple:
select
table_name
from
INFORMATION_SCHEMA.TABLES
where
table_schema = 'public'
This gives you the list of tables, like this:
table_name
-------------------------------
first_table
second_table
(2 rows)
So how do you get the VACUUM?
Change the select to this:
select
'VACUUM ANALYZE VERBOSE ' || table_name || ';'
from
INFORMATION_SCHEMA.TABLES
where
table_schema = 'public'
The '||' does string concatenation in SQL. Not the quite the most obvious thing, but what are you gonna do?
So this that last command gives you this:
?column?
-------------------------------------------------------
VACUUM ANALYZE VERBOSE first_table;
VACUUM ANALYZE VERBOSE second_table;
So now you can try this one by hand, or put it in a cron job somewhere...
% psql -d mydb_production -f sql_script.sql -o vacuuum_tables.sql
% psql -d mydb_production -f vacuum_tables.sql
I also like to recompute my indexes after I vacuum. I add this to the first cleaning script, a simple thing like so:
select
'REINDEX INDEX ' || indexname || ';'
from
pg_indexes
where
schemaname ='public';
Now you get freshly computed indexes every time you run this. Nice!
svn tricks and rails on sundays 0
I've got a few projects that I work on when I get the time. Since I usually work on all of them at the same time, it seems none of them moves forward very fast. I got curious to see how much work I am actually doing over time, and came up with a few little SVN hacks.
First, get the svn logs, pipe into a file:
% cd <head_of_the_svn_tree>
% svn log -q | egrep '^r' > activity.csv
Right, that gives us a file with all of the project checkins. The 'egrep' part strips out all of the annoying dashes that come with the svn log. The data looks like of like this:
r2 | danielw | 2006-12-20 00:38:13 +0200 (Wed, 20 Dec 2006)
r1 | danielw | 2006-12-20 00:33:41 +0200 (Wed, 20 Dec 2006)
Now, with some command-line tricks I can break down the activity a little more:
% svn log -q | egrep '^r' | cut -d '|' -f 2 | sort | uniq -c | sort -n
This breaks down the log and counts the number of checkins per person. You can point it to a URL as well. Results on one of my SVN trees gives something like this:
6 carl
123 danielw
What I am really interested in is how this activity progresses over time. I don't know how to do this on the command line, but SQL could do this in no time. We need to create a database and a table to hold the data. In postgres, like this:
% createdb work_activity
% psql -d work_activity
work_activity => create table svn_activity (revision varchar, who varchar, date timestamp);
Now we need to populate this with data. Since the end of that SVN line has got some funny timestamps, we'll get AWK to strip that out for us. Also, since the standard postgres column delimiter is the tab (\t), we'll delimit our records like that. Also, let's use the rails project to get more interesting stats.
% svn log -q http://svn.rubyonrails.org/rails/trunk > activity_rails.txt
% cat activity_rails.txt | egrep '^r' | awk '{print $1"\t"$3"\t"$5}' > activity_rails.data
This puts all of the data into a file, which we can now load into the DB in a single easy command:
% psql -d work_activity -c 'COPY svn_activity FROM STDIN' < activity_rails.data
Now it's all in the database, and we can do loads of fancy queries on it:
% psql -d work_activity -c "select date_trunc('month', date), count(*) from svn_activity group by 1 order by 1;"
date_trunc | count
---------------------+-------
2004-11-01 00:00:00 | 30
2004-12-01 00:00:00 | 259
2005-01-01 00:00:00 | 218
2005-02-01 00:00:00 | 219
2005-03-01 00:00:00 | 227
2005-04-01 00:00:00 | 199
2005-05-01 00:00:00 | 99
2005-06-01 00:00:00 | 172
2005-07-01 00:00:00 | 304
2005-08-01 00:00:00 | 63
2005-09-01 00:00:00 | 263
2005-10-01 00:00:00 | 306
2005-11-01 00:00:00 | 265
2005-12-01 00:00:00 | 93
2006-01-01 00:00:00 | 79
2006-02-01 00:00:00 | 163
2006-03-01 00:00:00 | 347
2006-04-01 00:00:00 | 162
2006-05-01 00:00:00 | 60
2006-06-01 00:00:00 | 116
2006-07-01 00:00:00 | 96
2006-08-01 00:00:00 | 162
2006-09-01 00:00:00 | 216
2006-10-01 00:00:00 | 130
2006-11-01 00:00:00 | 139
2006-12-01 00:00:00 | 97
2007-01-01 00:00:00 | 155
2007-02-01 00:00:00 | 92
2007-03-01 00:00:00 | 101
2007-04-01 00:00:00 | 65
2007-05-01 00:00:00 | 192
2007-06-01 00:00:00 | 115
2007-07-01 00:00:00 | 39
2007-08-01 00:00:00 | 43
2007-09-01 00:00:00 | 278
2007-10-01 00:00:00 | 236
2007-11-01 00:00:00 | 105
Looks like a very healthy project. Ok, let's find out on what day of the week rails developers have been most prolific:
psql -d work_activity -c "select extract(dow from date) as day, count(*) from svn_activity group by 1 order by 1;"
day | count
-----+-------
0 | 1040
1 | 969
2 | 874
3 | 755
4 | 790
5 | 688
6 | 789
(7 rows)
Day 0 is sunday! Thanks for the hard work, guys.
Query for Postgres database sizes 0
If you need to figure out how big your postgres databases are, this query can come in pretty useful.
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM
pg_database
JOIN
pg_shadow ON pg_database.datdba = pg_shadow.usesysid
ORDER BY
pg_database_size(pg_database.datname) desc;
You may need to log in as postgres to get the privilege to look at the pg_shadow tables.
postgres RI_ConstraintTrigger Error 1
A problem that I recently started running into is a strange Postgres FK constraints error, which only shows up when running tests. The error looks something like this:
PGError: ERROR: permission denied: "RI_ConstraintTrigger_XXXXX" is a system trigger
This is something that came in with FoxyFixtures, and there's a discussion about this problem and a patch in the rails trac. So what's this about?
When loading fixtures, rails tries to disable all foreign keys so that there are no problems inserting the data in whatever order. The calls work like this:
ALTER TABLE table_name DISABLE TRIGGER ALL
insert a bunch of fixtures
ALTER TABLE table_name ENABLE TRIGGER ALL
However, postgres keeps some kind of metadata regarding the FKs to be enforced called
RI_Constraint_something. Disabling the triggers on a table causes these to be deleted as well.
The problems is that these other triggers belong to the superuser and not you. So, you got this
permissions problem.
You may not have superuser permissions for your postgres instance, but it's probably best that the database you work with does not have superuser permissions, since that's the way it's going to be in production.
One workaround for to disable the foreign key checking in a different way. Instead of running the DISABLE TRIGGER, you can make the constraints DEFERRED postgres docs. This way, you run like this:
SET CONSTRAINTS ALL DEFERRED
insert a bunch of fixtures
SET CONSTRAINTS ALL IMMEDIATE
This allows you to load your fixtures, but keep all contraint-checking.
The easiest way to patch this is to override the behavior of the disable_referential_integrity method.
I created a new file, active_record_fk_hack.rb, stuck this in there:
module ActiveRecord
module ConnectionAdapters
class PostgreSQLAdapter < AbstractAdapter
def disable_referential_integrity(&block)
transaction {
begin
execute "SET CONSTRAINTS ALL DEFERRED"
yield
ensure
execute "SET CONSTRAINTS ALL IMMEDIATE"
end
}
end
end
end
end
then, in your environment.rb, add this at the end:
require 'active_record_fk_hack'
