kopongo.com

home

svn tricks and rails on sundays

25 Nov 2007

I've got a few projects that I work on when I get the time. Since I usually work on all of them at the same time, it seems none of them moves forward very fast. I got curious to see how much work I am actually doing over time, and came up with a few little SVN hacks.

First, get the svn logs, pipe into a file:

% cd <head_of_the_svn_tree>
% svn log -q | egrep '^r' > activity.csv

Right, that gives us a file with all of the project checkins. The 'egrep' part strips out all of the annoying dashes that come with the svn log. The data looks like of like this:

r2 | danielw | 2006-12-20 00:38:13 +0200 (Wed, 20 Dec 2006)
r1 | danielw | 2006-12-20 00:33:41 +0200 (Wed, 20 Dec 2006)

Now, with some command-line tricks I can break down the activity a little more:

% svn log -q | egrep '^r' | cut -d '|' -f 2 | sort | uniq -c | sort -n  

This breaks down the log and counts the number of checkins per person. You can point it to a URL as well. Results on one of my SVN trees gives something like this:

6  carl 
123  danielw

What I am really interested in is how this activity progresses over time. I don't know how to do this on the command line, but SQL could do this in no time. We need to create a database and a table to hold the data. In postgres, like this:

 % createdb work_activity
 % psql -d work_activity
 work_activity => create table svn_activity (revision varchar, who varchar, date timestamp);

Now we need to populate this with data. Since the end of that SVN line has got some funny timestamps, we'll get AWK to strip that out for us. Also, since the standard postgres column delimiter is the tab (\t), we'll delimit our records like that. Also, let's use the rails project to get more interesting stats.

% svn log -q http://svn.rubyonrails.org/rails/trunk > activity_rails.txt
% cat activity_rails.txt | egrep '^r' | awk '{print $1"\t"$3"\t"$5}' > activity_rails.data

This puts all of the data into a file, which we can now load into the DB in a single easy command:

% psql -d work_activity -c 'COPY svn_activity FROM STDIN' < activity_rails.data

Now it's all in the database, and we can do loads of fancy queries on it:

% psql -d work_activity -c "select date_trunc('month', date), count(*) from svn_activity group by 1 order by 1;"

     date_trunc      | count 
---------------------+-------
 2004-11-01 00:00:00 |    30
 2004-12-01 00:00:00 |   259
 2005-01-01 00:00:00 |   218
 2005-02-01 00:00:00 |   219
 2005-03-01 00:00:00 |   227
 2005-04-01 00:00:00 |   199
 2005-05-01 00:00:00 |    99
 2005-06-01 00:00:00 |   172
 2005-07-01 00:00:00 |   304
 2005-08-01 00:00:00 |    63
 2005-09-01 00:00:00 |   263
 2005-10-01 00:00:00 |   306
 2005-11-01 00:00:00 |   265
 2005-12-01 00:00:00 |    93
 2006-01-01 00:00:00 |    79
 2006-02-01 00:00:00 |   163
 2006-03-01 00:00:00 |   347
 2006-04-01 00:00:00 |   162
 2006-05-01 00:00:00 |    60
 2006-06-01 00:00:00 |   116
 2006-07-01 00:00:00 |    96
 2006-08-01 00:00:00 |   162
 2006-09-01 00:00:00 |   216
 2006-10-01 00:00:00 |   130
 2006-11-01 00:00:00 |   139
 2006-12-01 00:00:00 |    97
 2007-01-01 00:00:00 |   155
 2007-02-01 00:00:00 |    92
 2007-03-01 00:00:00 |   101
 2007-04-01 00:00:00 |    65
 2007-05-01 00:00:00 |   192
 2007-06-01 00:00:00 |   115
 2007-07-01 00:00:00 |    39
 2007-08-01 00:00:00 |    43
 2007-09-01 00:00:00 |   278
 2007-10-01 00:00:00 |   236
 2007-11-01 00:00:00 |   105

Looks like a very healthy project. Ok, let's find out on what day of the week rails developers have been most prolific:

psql -d work_activity -c "select extract(dow from date) as day, count(*) from svn_activity group by 1 order by 1;"  

 day | count 
-----+-------
   0 |  1040
   1 |   969
   2 |   874
   3 |   755
   4 |   790
   5 |   688
   6 |   789
(7 rows)

Day 0 is sunday! Thanks for the hard work, guys.