Thursday, February 26, 2015

Cassandra for the Busy - Part 2


Interesting problem I had this week that's obvious when you think about it.

The Config

I have a cluster of 5 Cassandra (version 2.0.11.83) nodes. I execute:

DROP KEYSPACE IF EXISTS my_keyspace
CREATE KEYSPACE my_keyspace WITH REPLICATION = { 
    'class' :              'SimpleStrategy',
    'replication_factor' : 2 
}
CREATE TABLE my_table (
    my_int_id    varint,
    my_string_id varchar,
    my_value     decimal,
    PRIMARY KEY (my_int_id, my_string_id)
)

I then start hammering my cluster with some CQL like:

INSERT INTO my_table 
    (my_int_id, my_string_id, my_value)
    VALUES (1, "2", 42.0)

or such like. The code I use to write to the DB looks something like:

RegularStatement  toPrepare         = (RegularStatement) (new SimpleStatement(cql).setConsistencyLevel(ConsistencyLevel.TWO));
PreparedStatement preparedStatement = cassandraSession.prepare(toPrepare);
.
.
BoundStatement    bounded           = preparedStatement.bind(new BigInteger(myIntId), myStringId, new BigDecimal("42.0"));
cassandraSession.execute(bounded);


The Error

I then kill a single node and my client starts barfing with:

Not enough replica available for query at consistency TWO (2 required but only 1 alive)

which baffled me initially because there were still 4 nodes in the cluster:

dse$ bin/nodetool -h 192.168.1.15 status my_keyspace
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns   Host ID                               Rack
UN  192.168.1.15  83.98 KB   256     40.0%  86df3d65-0e19-49b2-a49e-bc89d6347fa4  rack1
UN  192.168.1.16  82.78 KB   256     42.0%  8fghe485-0e19-49b2-a49e-bc89d6347fa5  rack1
UN  192.168.1.17  84.91 KB   256     38.0%  9834edf1-0e19-49b2-a49e-bc89d6347fa6  rack1
UN  192.168.1.18  88.22 KB   256     41.0%  873fdeab-0e19-49b2-a49e-bc89d6347fa7  rack1
DN  192.168.1.19  86.36 KB   256     39.0%  2bdf9211-0e19-49b2-a49e-bc89d6347fa8  rack1

Of course, the problem is that your tolerance to a node fault is:

    Tolerance = Replication Factor - Consistency Level

For me, that was Tolerance = 2 - 2 = 0 tolerance. Oops.

This is not a problem on all inserts. Running

dse$ bin/cqlsh -uUSER - pPASSWORD
Connected to PhillsCassandraCluster at 192.168.1.15:9160.
[cqlsh 4.1.1 | Cassandra 2.0.11.83 | DSE 4.6.0 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.

and inserting worked fine. But this is because cqlsh uses a different consistency level.

cqlsh> consistency
Current consistency level is ONE.

so, let's change that:

cqlsh> consistency TWO

Consistency level set to TWO.

Now do the inserts fail? Well sometimes. It depends on which nodes we attempt to write to, something transparent to the user.


The Problem

Cassandra will try to put a piece of data on REPLICATION_FACTOR number of nodes. These nodes will be determined by hashing their primary key. If a node is down, Cassandra will not choose another. It simply cannot be written to at this time (of course, it might get the data if/when the node joins the cluster).

Upon inserting, the client decides how many nodes must be successfully written to before the INSERT is declared a success. This number is given by your CONSISTENCY_LEVEL. If the contract cannot be fulfilled then Cassandra will barf.


Further Reading

Application Failure Scenarios with Cassandra

Lightweight Transactions in Cassandra 2.0


No comments:

Post a Comment