Sunday, May 7, 2023

The Joy of Sets?

The standard Scala Set violates basic algebra. Let's take two sets, x and y:

@ val x = Set(1, 2, 3) 
x: Set[Int] = Set(1, 2, 3)

@ val y = Set(3, 2, 1) 
y: Set[Int] = Set(3, 2, 1)

@ x == y 
res2: Boolean = true

If they're equal, we should be able to substitute one for another. Let's see how that goes:

@ def giveHead(s: Set[Int]): Int = s.head 
defined function giveHead

@ giveHead(x) 
res6: Int = 1

@ giveHead(y) 
res7: Int = 3

So, I get different results depending on which of two equal objects I call the method - crazy. 

topkek
Isn’t Map.toList impure or something like that?
tpolecat
It's pure but non-congruent with equality. m1 === m2 doesn't imply m1.toList === m2.toList because they could differ in their iteration order. That's why it's not Foldable. You can get those instances from alleycats I believe.
The unordered business is there to guarantee that Eq and Foldable are congruent (i.e., that a === b implies a.toList === b.toList) which is not necessarily true of data types like Set and Map: these are only UnorderedFoldable which requires you to fold into a commutative monoid which destroys any information you might get from observing iteration order. Which is to say "unordered" doesn't necessarily say anything about execution order, it just says any ordering that might be there won't be observable. 
Although often forgotten, sets are not functors.

You cannot parTraverse a Set (or Maps) in Cats. It just won't compile. This is a property of that particular data structure but it's not limited to Sets:
Fabio Labella @SystemFw Nov 16 16:55
You cannot traverse an [FS2] Stream, because it implies it's finite. Look at evalMap instead if the use case is "evaluate an action on each element". 
Instead, you must parUnorderedTraverse a Set.
Adam Rosien @arosien Nov 10 20:17
every Traverse is also an UnorderedTraverse, so they should act the same in that case (and since every CommutativeApplicative is also an Applicative, you can still call unorderedTraverse on something that has a Traverse)
To traverse a set (albeit unordered), the UnorderedTraverse will put the Set into a higher kinded type G[_]. If we want to take advantage of this say for List[_], we need a concrete implementation of a Parallel[List]. Among other things, this defines how to map from List[_] to Set[_] and back again. This brings us back to the problems with set's toList function - but at least now we're explicit in how we want this handled. 

1 comment:

  1. Unfortunately, this is a long-standing issue caused by the inheritance-based design of scala collections. On one hand it's nice that you have all the operations available on most of the collections. On the other hand, making Set inherit from Iterable (the Scala one that provides ++ and size, etc.) means that the semantics of methods provided by Iterable are often not useful and surprising.

    Breaking that inheritance could mean that you would have to call `.iterable` on a set to make it iterable and then be very explicit about how that works. Either, by really creating an ordered collection out of it with an explicitly arbitrary ordering, or by at least specifying that the resulting Iterable has weird properties and should probably be avoided.

    This was also discussed at length at the mailing list a long time ago: https://groups.google.com/g/scala-user/c/ImqlClXTrS4/m/OWENIdbZVOkJ

    ReplyDelete