Don't fear the cat-amorphism (nor the hylomorphism)

Posted on May 7, 2014 by Florian Hofmann (fho@f12n.de)

This weekend I spend some time on recursion schemes that can be expressed by morphisms.

Knowing that this might sound daunting to many dabbling Haskellers (like I am), I decided to write a real short MergeSort hylomorphism quickstarter.

TL;DR: Morphisms provide a simple yet powerful alternative to hand-rolled high performance traversals of recursive datastructures.

For those who need a refresher: MergeSort works by creating a balanced binary tree from the input list and directly collapsing it back into itself while treating the children as sorted lists and merging these with an O(n) algorithm.

First the usual prelude:

> {-# LANGUAGE DeriveFunctor #-}
> {-# LANGUAGE TypeFamilies #-}

> import Data.Functor.Foldable 
> import Data.List (splitAt, unfoldr)

We will use a binary tree like this. Note that there is no explicit recursion used, but NodeF has two holes. These will eventually filled later.

> data TreeF c f = EmptyF | LeafF c | NodeF f f
>   deriving (Eq, Show, Functor)

Aside: We could use this as a normal binary tree by wrapping it in Fix: type Tree a = Fix (TreeF a) But this would require us to write our tree like Fix (NodeF (Fix (LeafF 'l')) (Fix (LeafF 'r'))) which would get tedious fast. Luckily Edward build a much better way to do this into recursion-schemes. I will touch on this later.

Without further ado we start to write a Coalgebra, which in my book is just a scary name for “function that is used to construct data structures”.

> unflatten :: [a] -> TreeF a [a]
> unflatten (  []) = EmptyF
> unflatten (x:[]) = LeafF x
> unflatten (  xs) = NodeF l r   
>   where (l,r) = splitAt (length xs `div` 2) xs

From the type signature it’s immediately obvious, that we take a list of ’a’s and use it to create a part of our tree.

The nice thing is that due to the fact that we haven’t committed to a type in our tree nodes we can just put lists in there.

Aside: At this point we could use this Coalgebra to construct (unsorted) binary trees from lists:

> example1 = 
>   ana unflatten [1,3] == Fix (NodeF (Fix (LeafF 1)) (Fix (LeafF 3)))

On to our sorting, tree-collapsing Algebra. Which again is just a creepy word for “function that is used to deconstruct datastructures”.

The function mergeList is defined below and just merges two sorted lists into one sorted list in O(n), I would probably take this from the ordlist package if I were to implement this for real.

Again we see that we can just construct our sorted output list from a TreeF that apparently contains just lists.

> flatten :: Ord a => TreeF a [a] -> [a]
> flatten EmptyF      = []
> flatten (LeafF c)   = [c]
> flatten (NodeF l r) = mergeLists l r

Aside: We could use a Coalgebra to deconstruct trees:

> example2 = 
>   cata flatten (Fix (NodeF (Fix (LeafF 3)) (Fix (LeafF 1)))) == [1,3]

Now we just combine the Coalgebra and the Algebra with one from the functions from Edwards recursion-schemes library:

> mergeSort :: Ord a => [a] -> [a]
> mergeSort = hylo flatten unflatten

> example3 = mergeSort [5,2,7,9,1,4] == [1,2,4,5,7,9]

What have we gained?

We have implemented a MergeSort variant in 9 lines of code, not counting the mergeLists function below. Not bad, but this implementation is not much longer.

On the other hand the morphism based implementation cleanly describes what happens during construction and deconstruction of our intermediate structure.

My guess is that, as soon as the algorithms get more complex, this will really make a difference.

At this point I wasn’t sure if this was useful or remotely applicable. Telling someone “I spend a whole weekend learning about Hylomorphism” isn’t something the cool developer kids do.

It appeared to me that maybe I should have a look at the Core to see what the compiler finally comes up with (edited for brevity):

mergeSort :: [Integer] -> [Integer]
mergeSort =
   (x :: [Integer]) ->
    case x of wild {
      [] -> [];
      : x1 ds ->
        case ds of _ {
          [] -> : x1 ([]);
          : ipv ipv1 ->
            unfoldr
              lvl9
              (let {
                 p :: ([Integer], [Integer])
                 p =
                   case $wlenAcc wild 0 of ww { __DEFAULT ->
                   case divInt# ww 2 of ww4 { __DEFAULT ->
                   case tagToEnum# (<# ww4 0) of _ {
                     False ->
                       case $wsplitAt# ww4 wild of _ { (# ww2, ww3
#) -> (ww2, ww3) };
                     True -> ([], wild)
                   }
                   }
                   } } in
               (case p of _ { (x2, ds1) -> mergeSort x2 },
                case p of _ { (ds1, y) -> mergeSort y }))
        }
    }

While I am not really competent in reading Core and this is actually the first time I bothered to try, it is immediately obvious that there is no trace of any intermediate tree structure.

This is when it struck me. I was dazzled and amazed. And am still. Although we are writing our algorithm as if we are working on a real tree structure the library and the compiler are able to just remove the whole intermediate step.

Aftermath:

In the beginning I promised a way to work on non-functor data structures. Actually that was how I began to work with the recursion-schemes library.

We are able to create a ‘normal’ version of our tree from above:

> data Tree c = Empty | Leaf c | Node (Tree c) (Tree c)
>   deriving (Eq, Show)

But we can not use this directly with our (Co-)lgebras. Luckily Edward build a little bit of type magic into the library:

> type instance Base (Tree c) = (TreeF c)

> instance Unfoldable (Tree c) where
>   embed EmptyF      = Empty
>   embed (LeafF c)   = Leaf c
>   embed (NodeF l r) = Node l r

> instance Foldable (Tree c) where
>   project Empty      = EmptyF
>   project (Leaf c)   = LeafF c
>   project (Node l r) = NodeF l r

Without going into detail by doing this we establish a relationship between Tree and TreeF and teach the compiler how to translate between these types.

Now we can use our Alebra on our non functor type:

> example4 = cata flatten (Node (Leaf 'l') (Leaf 'r')) == "lr"

The great thing about this is that, looking at the Core output again, there is no traces of the TreeF structure to be found. As far as I can tell, the algorithm is working directly on our Tree type.

Acknowledgements:

Edward Kmett: for his great library Stefan Dresselhaus; for introducing me to morphisms

Literature:

Appendix:

> mergeLists :: Ord a => [a] -> [a] -> [a]
> mergeLists = curry $ unfoldr c where
>   c ([], [])     = Nothing
>   c ([], y:ys)   = Just (y, ([], ys))
>   c (x:xs, [])   = Just (x, (xs, []))
>   c (x:xs, y:ys) | x <= y = Just (x, (xs, y:ys)) 
>                  | x > y  = Just (y, (x:xs, ys))