Friday, October 7, 2016

HBase Troubleshooting


Nasty little problem. Connecting to HBase kept timing out after lots of errors that looked like:

client.RpcRetryingCaller: Call exception, tries=17, retries=35, retryTime=208879ms, msg row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=HOST,60020,1475135661112, seqNum=0

while logging in the HBase shell seemed to be OK.

HBase uses Zookeeper to keep a record of its regions. So, first I tried checking that:

> echo ruok | nc ZOOKEEPER_HOST 2181
imok

Running a repair didn't seem to help either.

Time to look at the HBase transaction logs. They kept repeating every second or so something like:

master.splitLogManager: total tasks =3 unassigned = 0 tasks={ /hbase-secure/splitWAL/WALs/HOST,600201474532814834-splitting ... status = in_progess ...

WALs are write ahead logs. "HBase data updates are stored in a place in memory called memstore for fast write. In the event of a region server failure, the contents of the memstore are lost because they have not been saved to disk yet. To prevent data loss in such a scenario, the updates are persisted in a WAL file before they are stored in the memstore".

These WALs appeared to be constantly splitting. So, a quick look at the WALs directory  was in order. It looked something like this:

drwr-x-r-x - hbase app-hdfs     0 2016-07-29 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532814834-splitting
drwr-x-r-x - hbase app-hdfs     0 2016-09-29 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532435621
drwr-x-r-x - hbase app-hdfs     0 2016-09-30 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532365837
drwr-x-r-x - hbase app-hdfs     0 2016-07-28 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532463823-splitting
.
.

Splitting is natural but shouldn't take too long (a split file from months ago is definitely a bad sign).

We were lucky that this was a test environment so we could happily delete these splitting files and restart HBase (YMMV. Think long and hard about doing that in a production environment...) But the problem went away for us.

No comments:

Post a Comment