Colossus

Why

GFS's problems

GFS master
- One machine not large enough for large FS
- Single bottleneck for metadata operations
- Fault tolerant, not HA
Predictable performance
- no guarantees of latency

Colossus's goal

Bigger
Faster
More predictable tail latency

GFS master replaced by Colossus
GFS chunkserver replaced by D. D Server, act as a GFS chunkserver running on Borg.
Its the lowest level of storage layer, and is designed as the only app to direct read/write data on physical disk, all of physical disks have been connected to server running D.

Arch

Colossus client: probably the most complex part of the system
- lots of functions go directly in the client, such as
  - software RAID
  - application encoding chosen
Curators: foundation of Colossus, its scalable metadata service
- can scale out horizontally
- built on top of a NoSQL database like BigTable
- allow Colossus to scale up by over a 100x over the largest GFS
D servers: simple network attached disks
Custodians: background storage managers, handle such as disk space balancing, and RAID construction
- ensures the durability and availability
- ensures the system is working efficiently
Data: there are hot data (e.g. newly written data) and cold data
Mixing flash and spinning disks
- really efficient storage organization
  - just enough flash to push the I/O density per gigabyte of data
  - just enough disks to fill them all up
- use flash to serve really hot data, and lower latency
- regarding to disks
  - equal amounts of hot data across disks
    - each disk has roughly same bandwidth
    - spreads new writes evenly across all the disks so disk spindles are busy
- rest of disks filled with cold data
  - moves older cold data to bigger drives so disks are full

How

Colossus choose Bigtable to record meta data due to BigTable solves many of the hard problems:

Automatically shards data across tablets
Locates tablets via metadata lookups
Easy to use semantics
Efficient point lookups and scans
File system metadata kept in an 0n-memory locality group

The idea of Use colossus to store Colossus's meta data

metadata is ~ 1/10000 the size of data
If we host a Colossus on Colossus, 100PB data -> 10TB meta data, 10TB meta data -> 1GB metadata, 1GB metadata -> 100KB data
The data is smaller enough to put into Chubby
LSM tree minimize random write by LSM tree. For GFS/Colossus, it will trigger communicate with Master/CFS only when creating new data block, most of other time just communication with ChunkServer/D Server. Meanwhile, the compaction/merge also decrease the frequency of creating new data blocks.

Here is a picture represent this:

(picture from: https://levy.at/blog/22)

Metadata in bigtable

GFS master -> CFS

Colossus cluster example

More info

Storage Architecture and Challenges
Google File System及其继任者Colossus
谷歌Colossus文件系统的设计经验
Google and evolution of big-data
Evolution of Google FS Talk
A peek behind the VM at the Google Storage infrastructure ★★★
- discussions
- A discussion between Kirk McKusick and Sean Quinlan about the origin and evolution of the Google File System.
Storage Architecture and Challenges by Andrew Fikes
Colossus: Successor to the Google File System (GFS)
Google File System II: Dawn of the Multiplying Master Nodes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colossus.md

colossus.md

Colossus

Why

Arch

How

More info

Files

colossus.md

Latest commit

History

colossus.md

File metadata and controls

Colossus

Why

Arch

How

More info