Insanely high CPU usage |
The temptation at this point is to add more CPU resources but this won't help much.
When your Spark jobs that are not computationally intensive are using large amounts of CPU, there's an obvious suspect. Let's check time spent in Garbage Collection:
Insanely large GC Times |
Shuffle per worker seems modest but look at those GC Times. In a five hours job, nearly two hours is spent just garbage collecting.
And this is something of a surprise to people new to Spark. Sure, it delivers on its promise to process more data than can fit in memory but if you want it to be performant, you need to give it as much memory as possible.
No comments:
Post a Comment