http://docs.datastax.com/en/archived/cassandra/1.0/docs/cluster_architecture/partitioning.html
The doc is for 1.0 but I think it covers a lot of detail that is omitted by 3.x documentations
- Column family data is partitioned across the nodes based on the row key
- random partitioner (most cases) and byte order partitioner
- Sequential writes can cause hot spots, although they mentioned it in byte order partitioner, but write to a wide row will also only hit a single node?
'Data partitioning determines how data is distributed across the nodes in the cluster. Three factors are involved with data distribution:'
- A partitioner that determines which node to store the data on
- The number of copies of data,
- The RandomPartitioner (org.apache.cassandra.dht.RandomPartitioner) is the default partitioning strategy for a Cassandra cluster, and in almost all cases is the right choice
- The RandomPartitioner uses consistent hashing to determine which node stores which row. Unlike naive modulus-by-node-count, consistent hashing ensures that when nodes are added to the cluster, the minimum possible set of data is effected
- The ByteOrderPartitioner orders rows lexically by key bytes
- Using the ordered partitioner allows range scans over rows
It is not recommended due to
- Sequential writes can cause hot spots: If your application tends to write or update a sequential block of rows at a time, then these writes are not distributed across the cluster; they all go to one node. This is frequently a problem for applications dealing with timestamped data
- TODO: this is similar to Bigtable's tall and narrow suggestion?
- More administrative overhead to load balance the cluster
- Uneven load balancing for multiple column families