Tuesday, January 09, 2007

Tool to tally contributor patch counts

I wrote a little Factor program which runs displays the number of patches submitted by each contributor to the Factor darcs repository. You can find it in demos/contributors.

It works as follows:
  • spawns a darcs process, asking it to emit a changelog in XML
  • parses the XML and extracts author attributes of patch tags
  • Computes the tally

This code uses the new hash-prune word which is only found in 0.88. It is like prune (which removes duplicates from a sequence) except that it does not retain order, and is much faster.

Here is the output, with actual e-mail addresses censored:
{ 
{ 1535 { "slava@..." } }
{ 270 { "chris.double@..." } }
{ 226 { "erg@t..." } }
{ 180 { "wayo.cavazos@..." } }
{ 50 { "matthew.willis@..." } }
{ 33 { "microdan@..." } }
{ 11 { "Benjamin Pollack <benjamin.pollack@...>" } }
{ 7 { "chapman.alex@..." } }
{ 4 { "Kevin Reid <kpreid@...>" } }
{ 2 { "lypanov@..." } }
{ 1 { "agl@..." } }
}


I'd like to do more with XML in the future, so I can hopefully suggest some new abstractions to Daniel, and help clean up the naming scheme of the XML processing words. I think Factor has the potential to simplify XML processing considerably over many other languages.

Here is the code:
REQUIRES: libs/process libs/xml ;
USING: memory io process sequences prettyprint kernel arrays
xml xml-utils ;
IN: contributors

: changelog ( -- xml )
image parent-dir cd
"darcs changes --xml-output" "r" <process-stream> read-xml ;

: authors ( xml -- seq )
children-tags [ "author" <name-tag> prop-name ] map ;

: patch-count ( authors author -- n )
swap [ = ] subset-with length ;

: patch-counts ( authors -- assoc )
dup hash-prune [ [ patch-count ] keep 2array ] map-with ;

: contributors ( -- )
changelog authors patch-counts sort-keys reverse . ;

PROVIDE: demos/contributors ;

MAIN: demos/contributors contributors ;

3 comments:

Anonymous said...

You might also be interested in the list_authors.hs script in the darcs repository for darcs. You can call it by itself to generate the AUTHORS file, or with a stats switch to display a patch count.

Slava Pestov said...

Ah, I wasn't aware that this functionality already exists. Oh well, I had fun implementing it anyway.

Anonymous said...

Well, it's not distributed as part of the official functionality (though it does use darcs code). Right now it's hard coded to "canonicalise" author names in the darcs repo, for example, so that the various addresses for David Roundy get merged into one entry. But it might be useful if somebody spun it out into a generic utility, say a darcs query authors command. Patches welcome!