This week, I went to Nick Zeeb's talk at Skills Matter where I pestered him about atomics.
He was talking about the latest new technology from those clever boys and girls down at LMAX, the Coalescing Ring Buffer.
[First, one interesting aside: the mod operator (%) is expensive. Therefore, when you can, choose a value to mod against that is a power of 2 (let's call this value X). When it comes to mod-ing a variable against it, AND that variable with the value X-1. This is much faster.]
My interest in the system was piqued by their use of AtomicReferenceArray, a class that is not much discussed yet is pretty critical to some multithreaded apps. He confirmed my experience that this class is slow and you should avoid it unless you really have to.
How to increase the efficiency of AtomicReferenceArray
AtomicReferenceArray is slow as each call to set flushes the cache. Nick recommended the use of the method lazySet. The JavaDocs merely says this method "eventually sets the element at position i to the given value." Nick expanded on this and said that this method offers a write to memory in a reasonable time but you wouldn't want to use it if the value were critical and had to be written immediately (the Coalescing Ring Buffer uses them for the non-critical purging of values).
What's more, lazySet offers ordering semantics (ie, everything visible to that thread before the call to lazySet is available to other threads after).
Moran Tzafrir of Tel Aviv University points out another interesting optimization that would avoid atomics altogether:
"When you do code optimization, you might remove the “volatile” keyword in very specific situation. You have global variables G1…Gn, and always you update G1…Gn together. And Gn is updated last. In this case you might consider “volatile” only to Gn. (Note this does not guaranty atomic/transactional update to G1…Gn, it just guaranty that after you write to Gn all preceding updates are visible to the other threads)... Gn should be read first. (On the read scenario)" [1]
Basically, if you batch updates, you would not have to use a relatively expensive AtomicReferenceArray.set(...) every time, but just make the variable written last and read first volatile.
Linux specific testing
Nick also emphasised the need for testing your suppositions. He mentions these JVM parameters in the slides but I'll put them here for easy access:
Setting the frequency of your CPU:
sudo cpufreq-set -c 0 -f 1700000
Run the tests with:
sudo nice -n -20 taskset -c 1,3 java -XX:+PrintCompilation -XX:+PrintGC -server -XX:CompileThreshold=100000 -Xmx1g -Xms1g ...
[1] https://groups.google.com/forum/?fromgroups=#!searchin/art-of-multiprocessor-programming/moran$20tzafrir/art-of-multiprocessor-programming/3y20qh7ooK0/z_lmxMVqk48J
He was talking about the latest new technology from those clever boys and girls down at LMAX, the Coalescing Ring Buffer.
[First, one interesting aside: the mod operator (%) is expensive. Therefore, when you can, choose a value to mod against that is a power of 2 (let's call this value X). When it comes to mod-ing a variable against it, AND that variable with the value X-1. This is much faster.]
My interest in the system was piqued by their use of AtomicReferenceArray, a class that is not much discussed yet is pretty critical to some multithreaded apps. He confirmed my experience that this class is slow and you should avoid it unless you really have to.
How to increase the efficiency of AtomicReferenceArray
AtomicReferenceArray is slow as each call to set flushes the cache. Nick recommended the use of the method lazySet. The JavaDocs merely says this method "eventually sets the element at position i to the given value." Nick expanded on this and said that this method offers a write to memory in a reasonable time but you wouldn't want to use it if the value were critical and had to be written immediately (the Coalescing Ring Buffer uses them for the non-critical purging of values).
What's more, lazySet offers ordering semantics (ie, everything visible to that thread before the call to lazySet is available to other threads after).
Moran Tzafrir of Tel Aviv University points out another interesting optimization that would avoid atomics altogether:
"When you do code optimization, you might remove the “volatile” keyword in very specific situation. You have global variables G1…Gn, and always you update G1…Gn together. And Gn is updated last. In this case you might consider “volatile” only to Gn. (Note this does not guaranty atomic/transactional update to G1…Gn, it just guaranty that after you write to Gn all preceding updates are visible to the other threads)... Gn should be read first. (On the read scenario)" [1]
Basically, if you batch updates, you would not have to use a relatively expensive AtomicReferenceArray.set(...) every time, but just make the variable written last and read first volatile.
Linux specific testing
Nick also emphasised the need for testing your suppositions. He mentions these JVM parameters in the slides but I'll put them here for easy access:
Setting the frequency of your CPU:
sudo cpufreq-set -c 0 -f 1700000
Run the tests with:
sudo nice -n -20 taskset -c 1,3 java -XX:+PrintCompilation -XX:+PrintGC -server -XX:CompileThreshold=100000 -Xmx1g -Xms1g ...
[1] https://groups.google.com/forum/?fromgroups=#!searchin/art-of-multiprocessor-programming/moran$20tzafrir/art-of-multiprocessor-programming/3y20qh7ooK0/z_lmxMVqk48J
No comments:
Post a Comment