Skip to content

Latest commit

 

History

History
107 lines (82 loc) · 5.01 KB

colossus.md

File metadata and controls

107 lines (82 loc) · 5.01 KB

Colossus

Why

GFS's problems

  • GFS master
    • One machine not large enough for large FS
    • Single bottleneck for metadata operations
    • Fault tolerant, not HA
  • Predictable performance
    • no guarantees of latency

Colossus's goal

  • Bigger
  • Faster
  • More predictable tail latency

GFS master replaced by Colossus
GFS chunkserver replaced by D. D Server, act as a GFS chunkserver running on Borg.
Its the lowest level of storage layer, and is designed as the only app to direct read/write data on physical disk, all of physical disks have been connected to server running D.

Arch

colossus_arch


  • Colossus client: probably the most complex part of the system
    • lots of functions go directly in the client, such as
      • software RAID
      • application encoding chosen
  • Curators: foundation of Colossus, its scalable metadata service
    • can scale out horizontally
    • built on top of a NoSQL database like BigTable
    • allow Colossus to scale up by over a 100x over the largest GFS
  • D servers: simple network attached disks
  • Custodians: background storage managers, handle such as disk space balancing, and RAID construction
    • ensures the durability and availability
    • ensures the system is working efficiently
  • Data: there are hot data (e.g. newly written data) and cold data
  • Mixing flash and spinning disks
    • really efficient storage organization
      • just enough flash to push the I/O density per gigabyte of data
      • just enough disks to fill them all up
    • use flash to serve really hot data, and lower latency
    • regarding to disks
      • equal amounts of hot data across disks
        • each disk has roughly same bandwidth
        • spreads new writes evenly across all the disks so disk spindles are busy
    • rest of disks filled with cold data
      • moves older cold data to bigger drives so disks are full

How

colossus_arch


Colossus choose Bigtable to record meta data due to BigTable solves many of the hard problems:

  • Automatically shards data across tablets
  • Locates tablets via metadata lookups
  • Easy to use semantics
  • Efficient point lookups and scans
  • File system metadata kept in an 0n-memory locality group

The idea of Use colossus to store Colossus's meta data

  • metadata is ~ 1/10000 the size of data
  • If we host a Colossus on Colossus, 100PB data -> 10TB meta data, 10TB meta data -> 1GB metadata, 1GB metadata -> 100KB data
  • The data is smaller enough to put into Chubby
  • LSM tree minimize random write by LSM tree. For GFS/Colossus, it will trigger communicate with Master/CFS only when creating new data block, most of other time just communication with ChunkServer/D Server. Meanwhile, the compaction/merge also decrease the frequency of creating new data blocks.

Here is a picture represent this:
colossus_arch
(picture from: https://levy.at/blog/22)

Metadata in bigtable

colossus_arch


GFS master -> CFS

colossus_arch


Colossus cluster example

colossus_arch


More info