Skip to content
Thomas Leibovici edited this page Aug 2, 2016 · 4 revisions

Quick start: Configuring robinhood v3 on a Lustre filesystem

Table of Contents

Installation

This section briefly list the steps for installing and configuring robinhood on a Lustre filesystem. This doesn't deal with policy configuration as they are described in other tutorials.

For more details about installation, software and hardware requirements, tunings, etc. refer to: Admin guide: Installation.

Lustre

Lustre changelog feature makes it possible for robinhood to update its database incrementally without rescanning the filesystem. To activiate this feature:

  • Enable Lustre MDT changelogs (remember the returned client id):
    lctl --device <fsname>-MDT0000 changelog_register
  • Make sure the changelog mask contains the following records: HSM CREAT UNLNK TRUNC SATTR CTIME MTIME CLOSE RENME RNMTO RMDIR HLINK LYOUT
    lctl get_param mdd.<fsname>-MDT*.changelog_mask

Robinhood

Requirements

  • Robinhood must run on a Lustre client. For maximum compatibility, it is recommended to run the same major version of Lustre as the servers.
  • Robinhood uses a MySQL or MariaDB database storage backend. It is recommended to install the DB server on the same host as robinhood to ensure a minimum latency for DB operations.
Install and start MySQL on RHEL 6:
 yum install mysql-server
 service mysqld start

Install and start MariaDB on RHEL 7:

 yum install mariadb-server
 systemctl start mariadb.service

/!\ Default database configuration is not suitable for production and will result in very low performances. See Admin guide: database tunings for recommended database configuration.

Installation

  • Download 'robinhood-lustre' and 'robinhood-adm' packages from http://sourceforge.net/projects/robinhood/files/robinhood and install them on the robinhood host.
    • Make sure to get the 'robinhood-lustre' package for the version of Lustre you run, for example robinhood-lustre-3.0-1.'''lustre2.5'''.el6.x86_64.rpm for lustre 2.5.

Configuration

  • Create robinhood database, using rbh-config helper (provided by 'robinhood-adm' package).
 rbh-config create_db <db_name>    'localhost' 'rbh_password'
    • A common name for robinhood database name is 'rbh_fsname.
    • Write the selected password to a file only readable by 'root' (600), for example in /etc/robinhood.d/.dbpassword.
  • Create a robinhood configuration file, starting with a simple robinhood template:
 cp /etc/robinhood.d/templates/basic.conf /etc/robinhood.d/<fsname>.conf
  • Edit the configuration file:
    • In 'General' block, set Lustre filesystem root path, and 'lustre' filesystem type:
 fs_path = "/fs/root";
 fs_type = lustre;
    • In 'ListManager' block, set database connection parameters:
 # database name passed to 'rbh-config create_db'
 db = <db_name>;
 password_file = "/etc/robinhood.d/.dbpassword" ;
    • In 'ChangeLog' block, check that the specified 'reader_id' matches the id retuned by 'lctl changelog_register':
 reader_id = "cl1";

It is recommended to define your fileclasses before running the initial filesystem scan:

  • This way, you will get relevent information in 'rbh-report --class-info' report after the initial scan is completed.
  • This will make some optimizations possible for running policies (e.g. skip processing of 'ignored' classes).
Examples:
 fileclass empty_file {
    definition { type == file and size == 0 }
 }
 fileclass small_file {
    definition { type == file
             and size > 0
             and size <= 32MB }
 }

Feeding robinhood

To populate robinhood DB, follow these steps:

  • 1) Enable changelogs (this should have been done in installation steps above).
  • 2) Run the initial scan
  • 3) Run robinhood daemon to continuously read changelogs

Initial scan

  • If you want to run the initial scan in a terminal and see the log messages in this terminal, run:
 robinhood --scan --once -L stderr
  • If you prefer running it in background (and display messages into robinhood log):
 robinhood --scan --once -d

Running changelog reader

  • You can run a changelog reader test by reading pending changelog records, then exit:
 robinhood --readlog --once -L stderr
  • To start a robinhood daemon to read changelog continuously:
    • Edit /etc/sysconfig/robinhood to indicate that we just want robinhood daemon to read changelogs, not yet run policies:
 RBH_OPT="--readlog"
    • Start robinhood service:
 # on RHEL 6:
 service robinhood start
 # on RHEL 7:
 systemctl start robinhood.service

Managing multiple filesystems

On RHEL7, if you want to manage several filesystems on the same robinhood host, use 'robinhood@' service instead.

  • Per-filesystem service is managed by systemctl [start|stop|status|restart|...] robinhood@''fsname''
  • Per-filesystem service configuration is /etc/sysconfig/robinhood.''fsname''

Monitoring scan/changelog progress

You can monitor scan progress, or changelog reader activity by looking at robinhood statistics (dumped every 15min by default):

 grep STATS /var/log/robinhood.log

Filesystem reports

Once you have run the initial scan and started a changelog reader, robinhood database reflects the filesystem state and is updated near real-time. Robinhood comes with several reporting and querying commands:

  • rbh-report provides overall reports about filesystem contents (users and groups usage, file size profile, fileclasses...)
  • rbh-find implements classic 'find' command, except that it queries robinhood database instead of the filesystem, which makes it faster. Moreover, it provides specific options to query entries per policy status and other Lustre-specific attributes.
  • rbh-du is a enhanced version of classic 'du' command. It queries robinhood database instead of the filesystem, which makes it faster. It can also report details about entry types, count, etc.
rbh-report examples:
 # filesystem entries:
 # rbh-report --fs-info
 type    ,    count,   volume, avg_size
      dir,  1780074,  8.02 GB,  4.72 KB
     file, 21366275, 91.15 TB,  4.47 MB
  symlink,   496142, 24.92 MB,       53
 # user info, split by group
 # rbh-report -u bar -S
 user , group,  type,  count,  spc_used,   avg_size
 bar  , proj1,  file,      4,  40.00 MB,   10.00 MB
 bar  , proj2,  file,   3296, 947.80 MB,  273.30 KB
 bar  , proj3,  file, 259781, 781.21 GB,    3.08 MB
 # file size profile for a given user
 # rbh-report -u foo --szprof
 user, type,  count,    volume,  avg_size,   0,  1~31,  32~1K-, 1K~31K, 32K~1M-, 1M~31M, 32M~1G-, 1G~31G, 32G~1T-, +1T
 foo ,  dir,     48,   1.48 MB,  31.67 KB,   0,     0,       0,     26,      22,      0,       0,      0,       0,   0
 foo , file,  11055, 308.16 GB,  28.54 MB,   2,     0,      14,     23,    5276,   5712,       9,     17,       2,   0
 # top disk space consumers
 # rbh-report --top-users
 rank, user    , spc_used,  count, avg_size
   1, usr0021 , 11.14 TB, 116396, 100.34 MB
   2, usr3562 ,  5.54 TB,    575,   9.86 GB
   3, usr2189 ,  5.52 TB,   9888, 585.50 MB
   4, usr2672 ,  3.21 TB, 238016,  14.49 MB
   5, usr7267 ,  2.09 TB,   8230, 266.17 MB
 ...

But also:

  • --top-size Report largest files in the filesystem.
  • --entry-info Report all information about a given entry.
  • Run rbh-report --help to get the full list of available reports.
rbh-find example:
 # rbh-find /mnt/lustre/dir -u root -size +32M -mtime +1h -ost 2 -ls

rbh-du examples:

 # rbh-du -H -u foo /mnt/lustre/dir.3
 45.0G /mnt/lustre/dir.3
Clone this wiki locally