GFS
's problems
- GFS master
- One machine not large enough for large FS
- Single bottleneck for metadata operations
- Fault tolerant, not HA
- Predictable performance
- no guarantees of latency
Colossus
's goal
- Bigger
- Faster
- More predictable tail latency
GFS master replaced by Colossus
GFS chunkserver replaced by D
. D Server
, act as a GFS chunkserver running on Borg.
Its the lowest level of storage layer, and is designed as the only app to direct read/write data on physical disk, all of physical disks have been connected to server running D.
Colossus client
: probably the most complex part of the system- lots of functions go directly in the client, such as
- software RAID
- application encoding chosen
- lots of functions go directly in the client, such as
Curators
: foundation of Colossus, its scalable metadata service- can scale out horizontally
- built on top of a NoSQL database like
BigTable
- allow Colossus to scale up by over a 100x over the largest GFS
D servers
: simple network attached disksCustodians
: background storage managers, handle such as disk space balancing, and RAID construction- ensures the durability and availability
- ensures the system is working efficiently
Data
: there are hot data (e.g. newly written data) and cold data- Mixing flash and spinning disks
- really efficient storage organization
- just enough flash to push the I/O density per gigabyte of data
- just enough disks to fill them all up
- use flash to serve really hot data, and lower latency
- regarding to disks
- equal amounts of hot data across disks
- each disk has roughly same bandwidth
- spreads new writes evenly across all the disks so disk spindles are busy
- equal amounts of hot data across disks
- rest of disks filled with cold data
- moves older cold data to bigger drives so disks are full
- really efficient storage organization
Colossus
choose Bigtable
to record meta data due to BigTable solves many of the hard problems:
- Automatically shards data across tablets
- Locates tablets via metadata lookups
- Easy to use semantics
- Efficient point lookups and scans
- File system metadata kept in an 0n-memory locality group
The idea of Use colossus to store Colossus's meta data
- metadata is ~ 1/10000 the size of data
- If we host a Colossus on Colossus, 100PB data -> 10TB meta data, 10TB meta data -> 1GB metadata, 1GB metadata -> 100KB data
- The data is smaller enough to put into Chubby
- LSM tree minimize random write by LSM tree. For GFS/Colossus, it will trigger communicate with Master/CFS only when creating new data block, most of other time just communication with ChunkServer/D Server. Meanwhile, the compaction/merge also decrease the frequency of creating new data blocks.
Here is a picture represent this:
(picture from: https://levy.at/blog/22)
Metadata in bigtable
GFS master -> CFS
Colossus
cluster example
- Storage Architecture and Challenges
- Google File System及其继任者Colossus
- 谷歌Colossus文件系统的设计经验
- Google and evolution of big-data
- Evolution of Google FS Talk
- A peek behind the VM at the Google Storage infrastructure ★★★
- Storage Architecture and Challenges by Andrew Fikes
- Colossus: Successor to the Google File System (GFS)
- Google File System II: Dawn of the Multiplying Master Nodes