Thursday, April 03, 2008

Converting a table to a tree

I found an interesting Haskell snippet in a reddit comment.

From the comment:
Here's a slightly longer function I created a while ago:
listToForest :: Eq a => [[a]] -> Forest a
listToForest = map toTree . groupBy ((==) `on` head) . filter (/= [])
where toTree = Node . (head . head) <*> (listToForest . map tail)

The function turns a table of results, such as that which might come back from a database query, into a tree. e.g.
[[A, B, C],
[A, B, D],
[A, E, F]]

Becomes:
    A 
/ \
B E
/ \ \
C D F

I decided to try implementing this in Factor, and I came up with three versions.

All versions represent a tree as its root node, and each node is a hashtable mapping the key of the child to the child node.

The first version does not use any utility words other than what's in the standard library; it consists of a single, rather long word:
: list>forest ( list -- forest )
[ empty? not ] subset
H{ } clone [ [ >r unclip r> [ ?push ] change-at ] curry each ] keep
[ list>forest ] assoc-map ;

The second version uses a couple of utilities:
: push-at ( value key assoc -- )
[ ?push ] change-at ;

: classify ( seq classifier -- assoc )
H{ } clone [ [ slip push-at ] 2curry each ] keep ;

: list>forest ( list -- forest )
[ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;

The third version has a different implementation of classify which uses extra/fry syntax sugar:
: push-at ( value key assoc -- )
[ ?push ] change-at ;

: classify ( seq classifier -- assoc )
H{ } clone [ '[ @ , push-at ] each ] keep

: list>forest ( list -- forest )
[ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;

If this was a one-off thing I'd use the first version. If I was writing production code, I would use one of the second two, and I might even put the utilities in a shared vocabulary since they're generally useful. In the last version, the only stack manipulation operator is keep, and everything else just happens to be on the stack in the correct order. This is how good stack code is written.

Some explanation:
  • ?push is like push except it makes a new vector if the input is f
  • slip calls a quotation underneath the top of the stack.
  • push-at adds a value to the sequence stored at a key in a hashtable.
  • classify takes a sequence and a quotation which decomposes a sequence element into a key and a value. It builds an association mapping all the unique keys to corresponding values.
  • unclip takes a sequence and outputs the rest of the sequence followed by the first element.
  • Haskell's groupBy is a generalization of my classify; it works with an equivalence relation rather than a "fibration" into a set with an implied equivalence relation (wrong term, but I'm not sure what the correct term is).
  • My classify runs in linear time and I suspect groupBy is quadratic, but I could be wrong.

I'd be interested in seeing a version of this function in idiomatic Common Lisp or Scheme. I suspect it would be somewhat longer, but not horribly so.

4 comments:

Anonymous said...

cool stuff. thnx Slava

Anonymous said...

Implementation in common lisp. Not sure how idiomatic it is. I use the iterate package. It is a fairly direct translation from the Haskell code. I don't know if there is something like group-by in common lisp, so I implement it using hash-tabels.

I don't know how to keep the layout of the code (pre and code don't work in comments).

(defun group-by (f lst)
(let ((table (make-hash-table)))
(iter (for x in lst)
(for k = (funcall f x))
(setf (gethash k table) (cons x (gethash k table '()))))
(iter (for (nil x) in-hashtable table)
(collect x))))

(defun to-tree (lst)
(cons (caar lst) (list-to-forest (mapcar #'cdr lst))))

(defun list-to-forest (lst)
(mapcar #'to-tree (group-by #'car (remove-if #'null lst))))

Anonymous said...

I'd call this "listsToTrie", rather than "listToForest". It's not a collection of trees, but a single trie. The fact that the outer structure is a list isn't interesting -- we could use a set or any other collection. The ordering doesn't matter. The inner bits being lists does matter, as that order is collected in the structure of the final trie.

Anonymous said...

Six lines of idiomatic CL (modulo damage inflicted by blogger):

(defun paths->hash-tree (paths &optional (root (make-hash-table)))
  (dolist (path paths root)
    (reduce (lambda (parent child)
              (setf (gethash child parent)
                    (gethash child parent (make-hash-table))))
            path :initial-value root)))

Factor and Haskell are (predictably) much shorter, but CL does pretty good. Of course, I can't read those versions, so perhaps I've missed the point.