It seems that I am not the only person to suffer from the problem of OutOfMemoryErrors while performing matrix multiplication (and other operations) in Spark. This is because "Spark keeps the partition that it is working on in memory (and does not spill to disk even if it is running OOM)" (from the mailing lists).
For instance, computePrincipalComponentsAndExplainedVariance can't handle more than 65535 columns and does all the work on the driver ("local matrix" in the docs) by delegating to Breeze which in turn appears to use a native library (lapack).
There are various proposed solutions. Playing with spark.sql.shuffle.partitions which "configures the number of partitions to use when shuffling data for joins or aggregations" (docs). Tuning this appears interesting. "Spark uses a different data structure for shuffle book-keeping when the number of partitions is greater than 2000" (from StackOverflow).
But this seems far too fiddly. It would be better if I never had to worry about such things. To this end, I've created my own simple library to perform "truly scalable matrix multiplication on Spark".
Note that this is the general case of multiplying two different matrices. In the special case that you're multiplying a matrix by its own transpose, you might prefer the approach taken in a previous post on this blog.