Friday, November 11, 2016

Recursive Data Types


I was pointed in the direction of this lecture. These are notes I made.

Imagine you want to build a hierarchy of professors and their students who too go on to be professors. You might like to model it like this:

case class ProfF[A](
  name: String,
  year: Int,
  students: List[A]
)

And have case classes to allow us to make recursive types because "type aliases can't be recursive but classes can" (using recursive type aliases result in "types [that] are infinite").

What's more, with ProfF "depending on how we wrap it we can represent just the pure data or the data annotated with database id" thus:

case class Prof(value: ProfF[Prof])

or

case class IdProf(id: Int, value: ProfF[IdProf])

But this is too tightly coupled with ProfF so let's refactor them like so:

case class Prof[F[_]](value: F[Prof])
case class IdProf[F[_]](id: Int, value: F[IdProf])

and too tightly coupled to the type of the id, so let's refactor that too:

case class Prof[F[_]](value: F[Prof[F]])
case class IdProf[F[_], A](id: A, value: F[IdProf[F, A]])

But there is nothing new under the sun as actually these useful data structures already exist in Scalaz.

case class Fix[F[_]](unfix: F[Fix[F]])
case class Cofree[F[_], A](head: A, tail: F[Cofree[F, A]])

and let's also throw in:

case class Free[F[_], A](resume: A \/ F[Free[F, A]])

Which is a type that says resume is either A or F[Free[F, A]]. It's like an Either is Scala. "\/[A, B] is isomorphic to scala.Either[A, B], but \/ is right-biased, so methods such as map and flatMap apply only in the context of the right case" (from here).

"In the same way you think of Cofree as a structure with labels at each node, Free is a structure with labels at the leaves". That is, where Cofree will always have an AFree can either have an A but no further branches (ie, it's a leaf) or it has branches and no A (ie,  it's a branch to another Free).

Terminology

If is a functor, Free is the free monad. 

Also, if is a functor, Cofree is a comonad. More on this later.

"If we have a Fix we can always get an F out by calling unfix, and we can do that with Cofree by calling tail. The general name for this operation is project and data types that do this are called recursive types. And note that we can't do it with Free because you might not have an F; you might have an A

"But you can always go the other way. If you have the F you can construct a Free. Same with Fix. But you can't do this with Cofree because you also need an A. So this operation (the dual) is called embed and data types with this behavior are corecursive types."

This is used in the wonderfully named Matryoshka library.

What is it good for?

Well, we can build a recursive data structure to be sent to a persistence layer with this:

type Prof = Fix[ProfF]

and receive a similar (technically corecursive) data structure back but that now contains DB-assigned primary keys with this:

type IdProf = Cofree[ProfF, Int]

These are the basics you need to understand Doobie, a functional library for accessing JDBC.


No comments:

Post a Comment