From the comment:
Here's a slightly longer function I created a while ago:listToForest :: Eq a => [[a]] -> Forest a
listToForest = map toTree . groupBy ((==) `on` head) . filter (/= [])
where toTree = Node . (head . head) <*> (listToForest . map tail)
The function turns a table of results, such as that which might come back from a database query, into a tree. e.g.[[A, B, C],
[A, B, D],
[A, E, F]]
Becomes:A
/ \
B E
/ \ \
C D F
I decided to try implementing this in Factor, and I came up with three versions.
All versions represent a tree as its root node, and each node is a hashtable mapping the key of the child to the child node.
The first version does not use any utility words other than what's in the standard library; it consists of a single, rather long word:
: list>forest ( list -- forest )
[ empty? not ] subset
H{ } clone [ [ >r unclip r> [ ?push ] change-at ] curry each ] keep
[ list>forest ] assoc-map ;
The second version uses a couple of utilities:
: push-at ( value key assoc -- )
[ ?push ] change-at ;
: classify ( seq classifier -- assoc )
H{ } clone [ [ slip push-at ] 2curry each ] keep ;
: list>forest ( list -- forest )
[ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;
The third version has a different implementation of
classify
which uses extra/fry
syntax sugar:: push-at ( value key assoc -- )
[ ?push ] change-at ;
: classify ( seq classifier -- assoc )
H{ } clone [ '[ @ , push-at ] each ] keep
: list>forest ( list -- forest )
[ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;
If this was a one-off thing I'd use the first version. If I was writing production code, I would use one of the second two, and I might even put the utilities in a shared vocabulary since they're generally useful. In the last version, the only stack manipulation operator is
keep
, and everything else just happens to be on the stack in the correct order. This is how good stack code is written.Some explanation:
?push
is likepush
except it makes a new vector if the input isf
slip
calls a quotation underneath the top of the stack.push-at
adds a value to the sequence stored at a key in a hashtable.classify
takes a sequence and a quotation which decomposes a sequence element into a key and a value. It builds an association mapping all the unique keys to corresponding values.unclip
takes a sequence and outputs the rest of the sequence followed by the first element.- Haskell's
groupBy
is a generalization of myclassify
; it works with an equivalence relation rather than a "fibration" into a set with an implied equivalence relation (wrong term, but I'm not sure what the correct term is). - My
classify
runs in linear time and I suspectgroupBy
is quadratic, but I could be wrong.
I'd be interested in seeing a version of this function in idiomatic Common Lisp or Scheme. I suspect it would be somewhat longer, but not horribly so.
4 comments:
cool stuff. thnx Slava
Implementation in common lisp. Not sure how idiomatic it is. I use the iterate package. It is a fairly direct translation from the Haskell code. I don't know if there is something like group-by in common lisp, so I implement it using hash-tabels.
I don't know how to keep the layout of the code (pre and code don't work in comments).
(defun group-by (f lst)
(let ((table (make-hash-table)))
(iter (for x in lst)
(for k = (funcall f x))
(setf (gethash k table) (cons x (gethash k table '()))))
(iter (for (nil x) in-hashtable table)
(collect x))))
(defun to-tree (lst)
(cons (caar lst) (list-to-forest (mapcar #'cdr lst))))
(defun list-to-forest (lst)
(mapcar #'to-tree (group-by #'car (remove-if #'null lst))))
I'd call this "listsToTrie", rather than "listToForest". It's not a collection of trees, but a single trie. The fact that the outer structure is a list isn't interesting -- we could use a set or any other collection. The ordering doesn't matter. The inner bits being lists does matter, as that order is collected in the structure of the final trie.
Six lines of idiomatic CL (modulo damage inflicted by blogger):
(defun paths->hash-tree (paths &optional (root (make-hash-table)))
(dolist (path paths root)
(reduce (lambda (parent child)
(setf (gethash child parent)
(gethash child parent (make-hash-table))))
path :initial-value root)))
Factor and Haskell are (predictably) much shorter, but CL does pretty good. Of course, I can't read those versions, so perhaps I've missed the point.
Post a Comment