Friday, August 25, 2017

More HBase tuning

I've come across two HBase clusters where writes have been extremely fast and reads extremely slow. Note: these reads were not full scans (which would unsurprisingly be slow) but batched get() calls.

This post is about making HBase read things faster. "The rule of thumb is to have your hot data set in RAM. Does not fit? Increase RAM, increase # of servers." (from here).

How much data?

Check how big the HBase table is

hadoop fs -du -h  /hbase/data/default/ 


The more compressed your data, the more likely it is that it will all fit into RAM. Compressing data in the cache may increase CPU usages but reduce IO. You can check which native libraries Hadoop knows about with:

hadoop/bin/hadoop checknative -a

but note: "If the native libs are NOT present, you will see lots of Got brand-new compressor reports in your logs" (from here).

Wait for the cache to fill

It takes time for the cache to be populated. Run vmstat (on Unix-like OSs) and watch the number of blocks read (bi). It will be large to begin with (thousands or tens of thousands) then shrinks after the app has been running for a while, down to basically zero.

You can watch the progress in the region server's logs:

$ grep "BucketCacheStatsExecutor" hbase/logs/hbase-ubuntu-regionserver-ip-172-30-0-139.log
2017-08-23 09:15:28,192 INFO  [BucketCacheStatsExecutor] bucket.BucketCache: failedBlockAdditions=0, totalSize=5.86 GB, freeSize=4.28 GB, usedSize=1.58 GB, cacheSize=1.56 GB, accesses=49844, hits=2231, IOhitsPerSecond=7, IOTimePerHit=0.03, hitRatio=4.48%, cachingAccesses=49844, cachingHits=2231, cachingHitsRatio=4.48%, evictions=0, evicted=0, evictedPerRun=NaN
2017-08-23 09:20:28,192 INFO  [BucketCacheStatsExecutor] bucket.BucketCache: failedBlockAdditions=0, totalSize=5.86 GB, freeSize=837.15 MB, usedSize=5.04 GB, cacheSize=4.97 GB, accesses=150478, hits=33665, IOhitsPerSecond=104, IOTimePerHit=0.03, hitRatio=22.37%, cachingAccesses=150478, cachingHits=33665, cachingHitsRatio=22.37%, evictions=1, evicted=13644, evictedPerRun=13644.0
2017-08-23 09:25:28,192 INFO  [BucketCacheStatsExecutor] bucket.BucketCache: failedBlockAdditions=5, totalSize=5.86 GB, freeSize=552.66 MB, usedSize=5.32 GB, cacheSize=5.25 GB, accesses=299660, hits=95870, IOhitsPerSecond=207, IOTimePerHit=0.03, hitRatio=31.99%, cachingAccesses=299660, cachingHits=95870, cachingHitsRatio=31.99%, evictions=7, evicted=95977, evictedPerRun=13711.0

Types of caches

There are two types of caches: memcache and blockcache. The memcache is write-through cache. the blockcache is for read-only access.

You may want "to reduce the block size of the data stored in disk. Why? When a row is requested by client, the block corresponding to where the row is stored on disk (store file) is read into memory (cache) before sending it back the requested data to the client. So by decreasing the block size more relevant data can be stored in cache which can improve read performance." (from here).

Types of Blocks

From here: "HBase blocks and HDFS blocks are different things:

  • HBase blocks are the unit of indexing (as well as caching and compression) in HBase and allow for fast random access. 
  • HDFS blocks are the unit of the filesystem distribution and data locality"
Types of sizes

From here:

"Generally there are 2 sizes involved:

1. HBase Filesize
2. HBase Blocksize

#1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)

#2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells."

Heap or off-Heap?

Making the cache off-heap really improved matters for me:

"When bucket cache is enabled in HBase, it will act as L2 (level 2 similar to processor cache hierarchy) cache and the default block cache will act as L1 cache. What that means is data need to be copied from L2 cache into L1 cache for HBase to service the query request. So on a read request when bucket cache is enabled, data block will be copied from disk onto bucket cache and then onto the block cache before served to the client making the request. On further reads of the same data, it will either be served directly from block cache or copied from bucket cache into block cache before serving the client request avoiding disk reads. So if there is a 10 node cluster with 128 GB of RAM each, it is possible to have almost 1 TB of data stored in HBase cache with bucket cache enabled and not get impacted by JVM GC which is not the case using just the default block cache." (from here).

So, trying assign more heap memory seems like a fool's errand. That will just cause a lot of GC. Instead, use off-heap memory.

No comments:

Post a Comment