Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Cassandra MembershipTable implementation? #1218

Closed
globeport opened this issue Dec 31, 2015 · 11 comments
Closed

Apache Cassandra MembershipTable implementation? #1218

globeport opened this issue Dec 31, 2015 · 11 comments
Assignees

Comments

@globeport
Copy link

Would it be possible to implement the cluster MembershipTable in cassandra? Ive just had a quick scan through the protocol and I can see there is a requirement for optimistic concurrency but not full blown transactions. Can anyone confirm whether cassandras lightweight transactions can support a MembershipTable implementation? I'll take a look at developing this if someone with more knowledge can just confirm it should be possible.

Thanks,
Stuart

@gabikliot
Copy link
Contributor

I will take a look.
Just out of curiosity, why Apache Zookeeper is not enough?

@gabikliot gabikliot self-assigned this Dec 31, 2015
@veikkoeeva
Copy link
Contributor

@gabikliot From experience I can tell that Cassandra is a lot easier to set up and operate. Maybe only me? :)

@gabikliot
Copy link
Contributor

Briefly: for Orleans's membership table the underlying system store has to support:

  1. conditional writes into a single row.
  2. batched writes into multiple rows (2 rows in our case), such that the write is both atomic and isolated: a) writes to all rows either happen or not , b) one cannot see a write to one of the rows but not see the write to another row.

I have been reading a bit about Cassandra, without spending too much time on that, and it does seem pretty clear that cassandras lightweight transactions definitely do support 1.

As for 2 - batched atomic isolated writes - I am not sure. I wasn't able to find a definite answer.
a) That links talks about atomic but not isolated batched, but it is Cassandra version 1.2. Maybe later version fixes it?
b) That link has a question at the bottom about isolation level of conditional batch updates without an answer.
c) That links clearly says there is no isolation for batch updates.

So it actually looks like Cassandra does NOT satisfy 2.
However, something is still not 100% clear to me. Orleans only uses batched writes into the same partition. Maybe Cassandra does not support isolated batch writes into different partitions, but does support it into the same partition (that is what Azure Table provides)?

The question now is: if Cassandra does not satisfy 2 even in the same partition, can it still be used for Orleans's membership and what would we loose in Orleans's membership if we used non-isolated batched writes?

If you are interested in proceeding this route, my suggestion would be:

  1. Become an expert in Cassandra (at least in the above aspects we have mentioned) and be 100% confident in what it guarantees. For example, when we looked at using Zookeeper for Orleans membership, there were a number of options on the table re data modelling and how ZK an be potentially used. Only after having a very good understanding of how ZK works, was @shayhatsor able to come up with the best solution.
  2. After we have a clear understating of the options and limitation, I can think of what would happen in Orleans MBR if we used non-isolated batch writes, or maybe something can be "relatively easily" changed to work correctly with non-isolated batch writes.

@shayhatsor
Copy link
Member

However, something is still not 100% clear to me. Orleans only uses batched writes into the same partition. Maybe Cassandra does not support isolated batch writes into different partitions, but does support it into the same partition (that is what Azure Table provides)?

I know nothing about Cassandra, but I've just found this here:

All updates in a batch operation belonging to a given partition key are performed in isolation.

and from your link (c)

For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server. However, transactional row updates within a partition key are isolated: clients cannot read a partial update.

@gabikliot
Copy link
Contributor

Yes, indeed @shayhatsor , I saw all those and that is indeed what confused me. It looks like it "maybe" works for the same partition, but I am not 100% sure. The documentation could have been more explicit I think (the first link you send is for individual row, which indeed is pretty clear fully isolated). Basically, this is the only place I found a mention of isolation levels within the partition.

I would recommend @globeport to establish the batch isolation level beyond any doubt, maybe by asking in some Cassandra forums.

@globeport
Copy link
Author

Thanks guys. I'm pretty sure that batch updates in Cassandra are not isolated, but I'll get some clarification. Apache Zookeeper would work just fine. However, the system I'm building is using Cassandra as the primary datastore and therefore it would be nice to proceed with one less moving part if possible. If there was an 'easy change' that could remove the requirement for isolated batches that would be ideal.

@globeport
Copy link
Author

http://grokbase.com/t/cassandra/user/155jx7wdt1/batch-isolation-within-a-single-partition

Actually, infact it does seem that Cassandra supports batch isolation for row updates with the same partition key. So I think it may indeed be possible to use Cassandra as is, providing the data is partitioned with this in mind.

@gabikliot
Copy link
Contributor

Great! No problem then. I recommend reading both the IMembershipTable interface, the describes the semantics of the operations, and also to use Azure Table and ZK implementation as references.

@sergeybykov
Copy link
Contributor

Looks like this got resolved.

@ChrisBellew
Copy link

@globeport did you manage to implement a Cassandra provider?

Thanks

@Arshia001
Copy link

This is a really old thread, but let me provide some insight into the problem:

Orleans needs to update both the membership version and the target membership entry at the same time, thus needing to update two tables in a transaction (when viewed from an RDBMS point of view).

With Cassandra there are two ways to address this problem:

  1. The BATCH statement: it's effectively a more limited version of SQL transactions (as with every other feature of Cassandra). The official guide describes BATCH as: Combines multiple data modification language (DML) statements (such as INSERT, UPDATE, and DELETE) to achieve atomicity and isolation when targeting a single partition, or only atomicity when targeting multiple partitions. This requires that all modifications be made to the same partition; so, one could probably use the same partition key for both tables and that'd be it.
  2. Better yet, one could use static columns. A static column is one that has a value per every unique partition key. So, one would move all important data from the membership version table (namely, the version column) to static columns in the membership table. This way, a single conditional update (using the IF statement) can do all the work and since atomicity is guaranteed for operations on a single row, no transaction or batch statement is needed. This is the approach I ultimately chose for my implementation of Cassandra-based IMembershipTable.

Note, both solutions require the entire membership data to exist in one partition only. However, it's hard to imagine a cluster with so many nodes that storing membership data in one partition would have a negative impact on performance (at least for me), so it shouldn't be a problem.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants