Partitioning

The doc is for 1.0 but I think it covers a lot of detail that is omitted by 3.x documentations

Take away

Column family data is partitioned across the nodes based on the row key
random partitioner (most cases) and byte order partitioner
Sequential writes can cause hot spots, although they mentioned it in byte order partitioner, but write to a wide row will also only hit a single node?

'Data partitioning determines how data is distributed across the nodes in the cluster. Three factors are involved with data distribution:'

The RandomPartitioner (org.apache.cassandra.dht.RandomPartitioner) is the default partitioning strategy for a Cassandra cluster, and in almost all cases is the right choice
The RandomPartitioner uses consistent hashing to determine which node stores which row. Unlike naive modulus-by-node-count, consistent hashing ensures that when nodes are added to the cluster, the minimum possible set of data is effected

It is not recommended due to

Sequential writes can cause hot spots: If your application tends to write or update a sequential block of rows at a time, then these writes are not distributed across the cluster; they all go to one node. This is frequently a problem for applications dealing with timestamped data
- TODO: this is similar to Bigtable's tall and narrow suggestion?
More administrative overhead to load balance the cluster
Uneven load balancing for multiple column families