Skip to content

Latest commit

 

History

History
2806 lines (2065 loc) · 107 KB

rh.1.pod

File metadata and controls

2806 lines (2065 loc) · 107 KB

NAME

rh - (rawhide) find files using pretty C expressions

SYNOPSIS

usage: rh [options] [path...]
options:
  -h --help    - Show this help message, then exit
  -V --version - Show the version message, then exit
  -N           - Don't read system-wide config (/etc/rawhide.conf)
  -n           - Don't read user-specific config (~/.rhrc)
  -f-          - Read functions and/or expression from stdin
  -f fname     - Read functions and/or expression from a file
 [-e] 'expr'   - Read functions and/or expression from the cmdline

traversal options:
  -r           - Only search one level down (same as -m1 -M1)
  -m #         - Override the default minimum depth (0)
  -M #         - Override the default maximum depth (system limit)
  -D           - Depth-first searching (contents before directory)
  -1           - Single filesystem (don't cross filesystem boundaries)
  -y           - Follow symlinks on the cmdline and in reference files
  -Y           - Follow symlinks encountered while searching as well

alternative action options:
  -x 'cmd %s'  - Execute a shell command for each match (racy)
  -X 'cmd %S'  - Like -x but run from each match's directory (safer)
  -U -U -U     - Unlink matches (but tell me three times), implies -D

output action options:
  -l           - Output matching entries like ls -l (but unsorted)
  -d           - Include device column, implies -l
  -i           - Include inode column, implies -l
  -B           - Include block size column, implies -l
  -s           - Include blocks column, implies -l
  -S           - Include space column, implies -l
  -g           - Exclude user/owner column, implies -l
  -o           - Exclude group column, implies -l
  -a           - Include atime rather than mtime column, implies -l
  -u           - Same as -a (like ls(1))
  -c           - Include ctime rather than mtime column, implies -l
  -v           - Verbose: All columns, implies -ldiBsSac (unless -xXU0L)
  -0           - Output null chars instead of newlines (for xargs -0)
  -L format    - Output matching entries in a user-supplied format
  -j           - Output matching entries as JSON (same as -L "%j\n")

path format options:
  -Q           - Enclose paths in double quotes
  -E           - Output C-style escapes for control characters
  -b           - Same as -E (like ls(1))
  -q           - Output ? for control characters (default if tty)
  -p           - Append / indicator to directories
  -t           - Append most type indicators (one of / @ = | >)
  -F           - Append all type indicators (one of * / @ = | >)

                   * executable
                   / directory
                   @ symlink
                   = socket
                   | fifo
                   > door (Solaris only)

other column format options:
  -H or -HH    - Output sizes like 1.2K 34M 5.6G etc., implies -l
  -I or -II    - Like -H but with units of 1000, not 1024, implies -l
  -T           - Output mtime/atime/ctime in ISO format, implies -l
  -#           - Output numeric user/group IDs (not names), implies -l

debug option:
  -? spec      - Output debug messages: spec can include any of:
                   cmdline, parser, traversal, exec, all, extra

rh (rawhide) finds files using pretty C expressions.
See the rh(1) and rawhide.conf(5) manual entries for more information.

C operators:
  ?:  ||  &&  |  ^  &  == !=  < > <= >=  << >>  + -  * / %  - ~ !

Rawhide tokens:
  "pattern"  "pattern".modifier  "/path".field  "cmd".sh  {cmd}.sh
  123 0777 0xffff  1K 2M 3G  1k 2m 3g  $user @group  $$ @@
  [yyyy/mm/dd] [yyyy/mm/dd hh:mm:ss]

Glob pattern notation:
  ? * [abc] [!abc] [a-c] [!a-c]
  ?(a|b|c) *(a|b|c) +(a|b|c) @(a|b|c) !(a|b|c)
  Ksh extended glob patterns are available here (see fnmatch(3))

Pattern modifiers:
                .i            .re           .rei
  .path         .ipath        .repath       .reipath
  .link         .ilink        .relink       .reilink
  .body         .ibody        .rebody       .reibody
  .what         .iwhat        .rewhat       .reiwhat
  .mime         .imime        .remime       .reimime
  .acl          .iacl         .reacl        .reiacl
  .ea           .iea          .reea         .reiea
  .sh
  Case-insensitive glob matching is available here (i)
  Perl-compatible regular expressions are available here (re)
  Access control lists are available here (acl)
  Extended attributes are available here (ea)

Built-in symbols:
  dev           major         minor         ino           mode
  nlink         uid           gid           rdev          rmajor
  rminor        size          blksize       blocks        atime
  mtime         ctime         attr          proj          gen
  nouser        nogroup       readable      writable      executable
  strlen        depth         prune         trim          exit
  now           today         second        minute        hour
  day           week          month         year          IFREG
  IFDIR         IFLNK         IFCHR         IFBLK         IFSOCK
  IFIFO         IFDOOR        IFMT          ISUID         ISGID
  ISVTX         IRWXU         IRUSR         IWUSR         IXUSR
  IRWXG         IRGRP         IWGRP         IXGRP         IRWXO
  IROTH         IWOTH         IXOTH         texists       tdev
  tmajor        tminor        tino          tmode         tnlink
  tuid          tgid          trdev         trmajor       trminor
  tsize         tblksize      tblocks       tatime        tmtime
  tctime        tstrlen

Reference file fields:
  .exists       .dev          .major        .minor        .ino
  .mode         .type         .perm         .nlink        .uid
  .gid          .rdev         .rmajor       .rminor       .size
  .blksize      .blocks       .atime        .mtime        .ctime
  .attr         .proj         .gen          .strlen       .inode
  .nlinks       .user         .group        .sz           .accessed
  .modified     .changed      .attribute    .project      .generation
  .len

System-wide and user-specific functions can be defined here:
  /etc/rawhide.conf          ~/.rhrc
  /etc/rawhide.conf.d/*      ~/.rhrc.d/*

INTRODUCTION

Rawhide (rh) lets you search for files on the command line using expressions and user-defined functions in a mini-language inspired by C. It's like find(1), but more fun to use. Search criteria can be very readable and self-explanatory and/or very concise and typeable, and you can create your own lexicon of search terms. The output can include lots of detail, like ls(1).

DESCRIPTION

Rawhide (rh) searches the filesystem, starting at each given path, for files that make the given search criteria expression true. If no search paths are given, the current working directory is searched.

The search criteria expression can come from the command line (with the -e option), from a file (with the -f option), or from standard input (stdin) (with -f-). If there is no explicit -e option expression, rh looks for an implicit expression among any remaining command line arguments. If no expression is specified, the default search criteria is the expression 1, which matches all filesystem entries.

An rh expression is a C-like expression that can call user-defined functions. These expressions can contain all of C's conditional, logical, relational, equality, arithmetic, and bit operators.

Numeric constants can be decimal, octal, or hexadecimal integers. Decimal constants can have scale units (e.g., 10K).

There are built-in symbols that represent each candidate file's inode metadata. These are the fields in the corresponding stat(2) structure (e.g., st_mode, st_uid, st_size, st_mtime, ...). See stat(2) for details. For convenience, the "st_" prefix is omitted from the symbol names (e.g., st_mtime is used as mtime).

Other built-in symbols represent the constants defined by C's <sys/stat.h> header file. These are useful for interpreting the mode in order to identify file types and permissions. The "S_" prefix is omitted from the symbol names (e.g., S_IFMT is used as IFMT).

Other built-in symbols represent various useful values and constants, control flow, more file information, and candidate symlink target inode metadata.

File glob patterns and Perl-compatible regular expressions (regexes) can be used to match files by their name, path, symlink target path, body, file type description, MIME type, access control list, and extended attributes.

Search criteria can also include comparisons with the inode metadata of arbitrary reference files, and the exit success status of arbitrary shell commands.

Functions are a means of referring to an expression by name. They allow complex expressions to be composed of simpler ones. They also allow you to create your own lexicon of search terms for finding files.

There is a default standard library of functions to start with. It provides a high-level interface to the built-in symbols mentioned above, and makes rh easy to use. See rawhide.conf(5) for details.

OPTIONS

-h, --help

Display the help message, then exit. The --help option must not be used with any other command line options or arguments.

The help message summarizes the command line usage, and presents concise lists of the search criteria language operators, special tokens, glob pattern notation, pattern modifiers, built-in symbols, reference file fields, and the locations of configuration files.

Some features are not available on all systems: Ksh extended glob patterns, case-insensitive glob matching, Perl-compatible regular expressions (regexes), access control lists, and extended attributes. The help message states which optional features are available on the local system.

See the SYNOPSIS section above for details.

-V, --version

Display the version message, then exit. The --version option must not be used with any other command line options or arguments.

-N

By default, rh first reads system-wide configuration from /etc/rawhide.conf (or similar), and then (in lexicographic order) from any files in the /etc/rawhide.conf.d directory (or similar) whose names do not start with dot ("."). This option suppresses that behaviour.

-n

By default, rh then reads user-specific configuration from ~/.rhrc, and then (in lexicographic order) from any files in the ~/.rhrc.d directory whose names do not start with dot ("."). This option suppresses that behaviour.

-f-, -f fname

After reading any configuration files, this option causes rh to read code from the file specified by fname. If there is also a directory whose name is fname followed by ".d", then rh reads (in lexicographic order) any files there whose names do not start with dot ("."). If fname specifies a directory, then rh reads any files there in the same manner. If fname is "-", then code is read from standard input (stdin). The -f option can be supplied more than once, but it is an error to use "-" (for stdin) more than once.

Each file can contain zero or more function definitions, and/or a trailing file test expression. If a file does contain a trailing file test expression, it is used to match files, unless another file test expression is supplied via a subsequent -f option file, or via the -e option, or in any remaining command line arguments.

-e 'expr'

Read code from the expr argument itself. It is an error to supply the -e option more than once.

The expr argument can contain zero or more function definitions, and/or a trailing file test expression. The -e option is processed after any -f options, and so can make use of any functions defined via the -f option. If the expr argument contains a file test expression (which is expected), it overrides any default file test expression from a configuration file or -f option file.

Normally, the -e option argument supplies the file test expression that will be used for the file search. Since many of the operators are also shell meta-characters, and since rh expressions can contain spaces, it is strongly recommended that expr generally be enclosed in single quotes ("'").

If no explicit file test expression is supplied via the -e option, then any remaining command line arguments are examined to identify any implicit file test expression.

If a command line argument is a path that exists in the filesystem, it is interpreted as a filesystem entry to search. Otherwise, if it contains any characters that are likely to appear in an expression, but that are unlikely to appear in many filesystem paths (i.e., "?:|&^=!<>*%$\"\\[]{};\n"), it is interpreted as a file test expression. Otherwise, if it looks like a filesystem path (i.e., if it contains a slash character ("/"), and an apparent ancestor directory does exist in the filesystem), it is interpreted as a filesystem entry (that happens not to exist). Otherwise, it is interpreted as a file test expression. Only the first suitable command line argument will be interpreted as a file test expression. Any other command line arguments will all be interpreted as search paths.

This makes it almost always possible to not actually need to type the -e option itself. It also makes it possible to supply search paths before and/or after the file test expression. e.g.:

$ rh -e 'expr' dir1 dir2
$ rh 'expr' dir1 dir2
$ rh dir1 'expr' dir2
$ rh dir1 dir2 'expr'

The -e option only really needs to be explicitly included when the file test expression might happen to be the same as an existing filesystem entry relative to the current working directory (e.g., touch file; rh -e 'file' path), or (less likely) when the expression starts with a minus sign ("-"), and would otherwise be mistaken for a command line option.

You can also need an explicit -e option if you want the file test expression to appear to the left of any command line options on (most) systems where all non-option command line arguments must appear to the right of all command line options and their arguments (e.g., rh -e 'expr' -X 'cmd %S' dir). This doesn't apply to Linux with GNU glibc, which provides more flexible command line option parsing.

If no file test expression is supplied anywhere, the default file test expression is 1, which matches all filesystem entries.

Traversal options

-r

This option causes rh to only report (or act on) the immediate contents of the starting search directories. The starting search directories themselves are excluded, and the contents of any sub-directories of the starting search directories are not searched. This is the same as -m1 -M1 (see next).

This option and the -m option are mutually exclusive.

This option and the -M option are mutually exclusive.

-m #

Override the default minimum search depth to report (or act on). By default, the minimum search depth is zero, which means that the starting search directories are reported (or acted on) if they satisfy the file test expression.

For example, setting the minimum search depth to 1 suppresses reporting (or acting on) the starting search directories if they match, and only reports (or acts on) the matching entries among those directories' entries and their descendants.

Note that this option does not prevent file test evaluation above the minimum search depth. It only prevents reporting (or acting on) matching entries. This matters when the search criteria involves the prune, trim, or exit built-ins (see rawhide.conf(5)), because they have control flow side-effects when they are evaluated. This makes it possible to skip sub-directories, or terminate a search, before anything is reported (or acted on).

This option and the -r option are mutually exclusive.

-M #

Override the default maximum search depth to examine. By default, the maximum search depth is a very large system-imposed limit (e.g., 1019).

For example, setting the maximum search depth to 1 prevents searching below the immediate children of the starting search directories. And setting the maximum search depth to 0 prevents searching below the starting search paths themselves.

This option and the -r option are mutually exclusive.

-D

Perform a depth-first search. This means that directories are examined and reported (or acted on) after their descendants, rather than before them.

This option is incompatible with the prune and trim built-ins (see rawhide.conf(5)). When this option is used, prune and trim will not work. They will not prevent searching in sub-directories.

The -U option implies this option (see below).

-1

Limit the search to each starting search directory's filesystem only. This prevents descending into directories that are mountpoints for other filesystems.

-y

By default, rh does not follow symlinks. This option causes rh to follow any symlinks supplied as command line arguments or reference files. But any candidate symlinks encountered while searching are still not followed.

Note: When a followed symlink is broken/dangling, rather than reporting this as an error, the resulting stat(2) structure fields will be those of the symlink itself. This might or might not be desirable behaviour. This is done for compatibility with the familiar behaviour of find(1). If you would prefer that an attempt to follow a broken symlink be reported as an error, set the environment variable RAWHIDE_REPORT_BROKEN_SYMLINKS=1. The resulting stat(2) structure fields will still be those of the symlink itself, and searching will still continue, but there will be an error message, and the eventual exit status will be non-zero to indicate failure.

This option is compatible with the symlink target-related built-ins (see rawhide.conf(5)), and the -L %Y format conversion (see below), except for any symlinks on the command line. For them, the symlink target-related built-ins and the -L %Y format conversion will only ever get to see symlinks that are broken.

-Y

By default, rh does not follow symlinks. This option causes rh to follow any symlinks supplied as command line arguments or reference files, and any candidate symlinks encountered while searching.

Note: When a followed symlink is broken/dangling, rather than reporting this as an error, the resulting stat(2) structure fields will be those of the symlink itself. This might or might not be desirable behaviour. This is done for compatibility with the familiar behaviour of find(1). If you would prefer that an attempt to follow a broken symlink be reported as an error, set the environment variable RAWHIDE_REPORT_BROKEN_SYMLINKS=1. The resulting stat(2) structure fields will still be those of the symlink itself, and searching will still continue, but there will be an error message, and the eventual exit status will be non-zero to indicate failure.

This option is incompatible with the symlink target-related built-ins (see rawhide.conf(5)), and the -L %Y format conversion (see below). The only symlinks they will ever get to see are broken ones.

Alternative action options

By default, rh outputs each matching filesystem entry's full path starting from the search directory. These options provide alternative actions. They, and the -l, -0, -L, and -j options, are all mutually exclusive.

-x 'cmd'

Execute the shell command specified by cmd via system(3) (i.e., via /bin/sh) for each matching entry. It is an error to supply the -x option more than once.

The cmd argument can contain %s which represents the matching entry's full path starting from the search directory. It can also contain %S which represents the matching entry's base name (or "/" when the matching entry is the root directory (/) which has no base name). For example, given the matching file /etc/passwd, %s and %S would represent "/etc/passwd" and "passwd", respectively. To include a literal per cent sign ("%") in the shell command, use %%. It is an error if % is not followed by s, S, or %.

There is no reason to place any quote characters around %s or %S. To prevent shell command injection, they are replaced with "$1" or "$2", respectively, and the corresponding full path and base name are passed to the /bin/sh command as separate positional arguments. This avoids the problem that there is no single way to quote data in a shell command that works correctly in all syntactic contexts.

This will usually be fine, but if the double quote quoting used when interpolating %s and %S as "$1" and "$2" is inappropriate for a particular command, you can instead use $1 and $2 directly, possibly via the shell's Parameter Expansion syntax.

For this option, the %s interpolation is more likely to be useful than the %S interpolation.

If any command exits with a non-zero exit status, rh itself will continue, but it will eventually exit with a non-zero exit status.

This is similar to the -exec action in POSIX find(1). And it suffers from the same large number of path-based race conditions as -exec. This is insecure on hosts with malicious local actors that have write access to the directory tree being searched, and so should not generally be used. It is much safer to use the -X option instead (see next). Note that piping the default output to a program like xargs(1) is also insecure in the same way.

Note: If the user's $PATH environment variable includes the current working directory, or any other non-absolute paths, they are automatically removed first. This is done for consistency with the -X option (see next) and the "cmd".sh "pattern" modifier (see rawhide.conf(5)), where this is needed for security. This means that you can't rely on $PATH to find an executable that is in the current directory. An explicit path would be needed instead (e.g., -x './cmd %s').

This option, and the -l, -0, -L, -j, -X, and -U options, are all mutually exclusive.

-X 'cmd'

Execute the shell command specified by cmd via system(3) (i.e., via /bin/sh) for each matching entry. It is an error to supply the -X option more than once.

This is like the -x option (see above), except that the shell command is executed after safely changing the current working directory to the directory containing each matching entry. This minimizes the number of path-based race conditions.

The cmd argument can contain %s which represents the matching entry's full path starting from the search directory. It can also contain %S which represents the matching entry's base name (or "/" when the matching entry is the root directory (/) which has no base name). For example, given the matching file /etc/passwd, %s and %S would represent "/etc/passwd" and "passwd", respectively. To include a literal per cent sign ("%") in the shell command, use %%. It is an error if % is not followed by s, S, or %.

There is no reason to place any quote characters around %s or %S. To prevent shell command injection, they are replaced with "$1" or "$2", respectively, and the corresponding full path and base name are passed to the /bin/sh command as separate positional arguments. This avoids the problem that there is no single way to quote data in a shell command that works correctly in all syntactic contexts.

This will usually be fine, but if the double quote quoting used when interpolating %s and %S as "$1" and "$2" is inappropriate for a particular command, you can instead use $1 and $2 directly, possibly via the shell's Parameter Expansion syntax.

For this option, the %S interpolation is more likely to be useful than the %s interpolation.

If any command exits with a non-zero exit status, rh itself will continue, but it will eventually exit with a non-zero exit status.

This is similar to the -execdir action in GNU find(1), and so does not suffer from the same large number of path-based race conditions as the -exec action in POSIX find(1). It is much safer than the -x option, and should generally be used in preference.

And if the user's $PATH environment variable includes the current working directory, or any other non-absolute paths, that could be dangerous, so they are automatically removed first.

Note: Since the shell commands are executed from the directory containing each matching entry, if they do require the matching entry's full path starting from the search directory (i.e., %s), then it's best if the starting search paths are all absolute paths, so that %s is always an absolute path. Otherwise, the shell command might need to change its current working directory back to the initial working directory. Also note that %s suffers from many path-based race conditions, which is insecure on hosts with malicious local actors that have write access to the directory tree being searched, and so should not generally be used.

This option, and the -l, -0, -L, -j, -x, and -U options, are all mutually exclusive.

-U -U -U

Unlink/Remove/Delete matching filesystem entries. Due to the destructive nature of this option, and the ease with which a single letter can be mistyped, this option must be supplied three times in order for it to take effect. It is an error to supply the -U option once or twice.

This option implies the use of the -D option (see above) to ensure that each matching directory's matching entries are removed before it is. Directories can only be removed when they are empty.

If rh fails to remove any matching entry, it will continue, but it will eventually exit with a non-zero exit status.

This option is incompatible with the prune and trim built-ins (see rawhide.conf(5)). When this option is used, prune and trim will not work. They will not prevent unlinkage/removal/deletion in sub-directories.

When this option is used with the -y or -Y option (see above), and a symlink to a directory is followed, the symlink's ultimate target directory's contents are searched, and any matches found there are removed, but the target directory itself is never removed. It isn't possible to remove a filesystem entry via a symlink to it. If the target directory itself matches the search criteria, the symlink to it is removed. Similarly, when a symlink to a non-directory is followed, and the symlink's ultimate target matches the search criteria, the symlink is removed, not the ultimate target.

This option, and the -l, -0, -L, -j, -x, and -X options, are all mutually exclusive.

Output action options

-l

By default, rh outputs each matching path on a line by itself. This option includes more details in a format similar to that of ls -l (but unsorted). The details included are the file type, permissions, existence of an access control list and/or extended attributes, number of hard links, user/owner, group, size (or comma-separated rdev major and minor device numbers), modified time, and path. For symlinks, the target path is also included at the end (preceded by " -> ").

Note that, unlike ls -l, for readable directories, the reported size is the number of entries they contain (excluding . and ..). For unreadable directories, it is the usual (undocumented) st_size field of the corresponding stat(2) structure.

If a file has a non-trivial access control list (ACL), this is indicated by a plus sign ("+") at the end of the file type and permissions column (e.g., -rw-rw-r--+). If a file has any extended attributes (EA), this is indicated by an at sign ("@") (e.g., -rw-rw-r--@). Note that this doesn't include the EAs that are used on Linux for ACLs and selinux(8) contexts, because they are not interesting enough (ACLs are already indicated by "+", and selinux(8) contexts are ubiquitous, and they are indicated by "." (see below)). If a file has both a (non-trivial) ACL and any (interesting) EAs, this is indicated by an asterisk character ("*") (e.g., -rw-rw-r--*). If a file has neither, but it does have an selinux(8) context, this is indicated by a dot character (".") (e.g., -rw-rw-r--.). If a file has none of the above, there's just a space character (" ") at the end of the file type and permissions column.

This option, and the -0, -L, -j, -x, -X, and -U options, are all mutually exclusive.

-d

Include the device column. Implies the -l option.

This is the comma-separated major and minor device numbers of the device/filesystem that the matching file resides on.

This column is first.

-i

Include the inode number column. Implies the -l option.

This column is after any device column, and before any block size column.

-B

Include the block size column. Implies the -l option.

This column is after any inode number column, and before any blocks column.

Note that this is just the preferred block size for efficient I/O on the matching file's filesystem. On some filesystems (e.g., zfs), this is specific to each file, rather than to the whole filesystem.

Note that this is unrelated to the blocks column (see next).

-s

Include the blocks column. Implies the -l option.

This column is after any block size column, and before any space column.

Note that the number of blocks always refers to standardized 512-byte blocks, even when the filesystem's real block size is something else.

-S

Include the space column. Implies the -l option.

The space occupied by a file is the number of 512-byte blocks multiplied by 512. This is usually larger than the size in bytes, but it can be smaller in the case of files with holes (and on filesystems with transparent compression).

This column is after any blocks column, and before the file type and permissions column.

-g

Exclude the user/owner column. Implies the -l option.

-o

Exclude the group column. Implies the -l option.

-a

Include the accessed time column in place of the modified time column. Implies the -l option.

If the -a/-u and -c options are both supplied, then both columns appear in place of the modified time column (with the accessed time column appearing before the inode changed time column).

-u

Same as the -a option (like ls(1)).

-c

Include the inode changed time column in place of the modified time column. Implies the -l option.

If the -a/-u and -c options are both supplied, then both columns appear in place of the modified time column (with the accessed time column appearing before the inode changed time column).

-v

Turn on verbose mode. With the -l option, this option includes all possible columns (i.e., device, inode number, block size, number of blocks, space, file type, permissions, existence of an access control list and/or extended attributes, number of hard links, user/owner, group, size (or comma-separated rdev major and minor device numbers), modified time, accessed time, inode changed time, and path).

With the -x or -X option, this option outputs each command before it is executed. The %s and %S escape sequences will appear as "$1" and "$2", respectively, and their literal values will be included as a comment. When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

With the -U option, this option outputs each matching path before it is removed. When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

With the -0 option, this option has no effect.

With the -L option, this option makes the %z format conversion on FreeBSD and Solaris output the non-compact form of NFSv4 access control lists (ACLs), rather than the default compact form.

Without any of the above options, this option behaves as though the -l option had been supplied, and includes all possible columns.

-0

Output the null character ("\0") after each matching path, rather than the newline character ("\n"). This is useful in combination with xargs -0 to handle matching entries whose paths contain troublesome characters (like newlines).

But note that, due to a large number of path-based race conditions, piping the output to a program like xargs(1) is insecure on hosts with malicious local actors that have write access to the directory tree being searched, and so should not generally be done. It is much safer to use the -X option instead (see above).

This option, and the -l, -L, -j, -x, -X, and -U options, are all mutually exclusive.

-L format

Output selected information about matching entries according to the user-supplied format, which is similar to C's printf(3) and strftime(3) format strings. It is an error to supply the -L option more than once.

Unlike the -l and -0 options, no newline or null character is appended automatically. An empty -L option argument (i.e., -L '') will produce no output.

This option, and the -l, -0, -j, -x, -X, and -U options, are all mutually exclusive.

The supported backslash escape sequences are:

\a Alert or Bell (BEL)
\b Backspace (BS)
\c Stop processing this format string and flush the output
\f Form feed (FF)
\n Newline or Line feed (LF)
\r Carriage return (CR)
\t Horizontal tab (HT)
\v Vertical tab (VT)
\0 Null byte (NUL)
\\ A literal backslash ("\")
\NNN The byte whose numeric value is NNN (1-3 octal digits)

A backslash character followed by any other character is treated as an ordinary character, and both characters are output.

The following % format conversion specifiers are available:

%%

A literal per cent sign ("%").

%p

The path including the starting search directory.

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%P

The path excluding the starting search directory.

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%f

The base name (the path excluding any leading directories and final slash character ("/")). As a special case, for the root directory (/) which has no base name, this is "/".

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%h

The directory (the path excluding the last slash character ("/") and the base name). As a special case, for paths in the current working directory (with no slash), this is ".". Note that, for the root directory (/) and its immediate children, this is the empty string.

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%l

The target path of a symlink. For non-symlinks, this is the empty string.

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%H

The starting search directory.

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%d

The depth relative to the starting search directory (depth).

%D

The device number of the device/filesystem that the file resides on (dev). See the related %V and %v format conversions (next) for the major and minor device numbers of the device/filesystem that the file resides on.

%V

The major device number of the device/filesystem that the file resides on (major, part of dev).

%v

The minor device number of the device/filesystem that the file resides on (minor, part of dev).

%i

The inode number (ino).

%M

The file type and permissions in symbolic form (like in ls -l) (mode).

%y

The file type (like in ls -l, but with "f" for regular files, rather than "-").

%Y

The file type (like %y), but show the type of symlink targets instead of the symlinks themselves. Symlink-related errors are indicated with "N" for non-existence, "L" for loops, and "?" for any other error.

Note: The -Y option is incompatible with this format conversion. The only symlinks seen will be broken ones.

%m

The file permissions in octal (mode & ~IFMT).

%n

The number of hard links (nlink).

%u

The user name (based on uid), or the numeric user ID if the user has no name.

%g

The group name (based on gid), or the numeric group ID if the group has no name.

%U

The numeric user ID (uid).

%G

The numeric group ID (gid).

%E

The device number of the file (rdev). This is only meaningful for character devices and block devices. See the related %R and %r format conversions (next) for the major and minor device numbers of the file.

%R

The major device number of the file (rmajor, part of rdev). This is only meaningful for character devices and block devices.

%r

The minor device number of the file (rminor, part of rdev). This is only meaningful for character devices and block devices.

%s

For regular files, this is the size in bytes (size). For symlinks (that are not followed with the -y or -Y option), this is the length in bytes of the target path. For readable directories, this is the number of entries they contain (excluding . and ..). For unreadable directories (and everything else), this is the usual (undocumented) st_size field of the corresponding stat(2) structure.

%S

The file "sparseness". This is only meaningful for regular files. This is defined as (blocks * 512) / size when the file size is non-zero, or as 1 otherwise. Values above 1 indicate files that haven't filled up their last block. The value 1 indicates files that have filled up their last block, and empty files. Values below 1 indicate files with holes (or a filesystem with transparent compression). The value 0 indicates files that are not real files on disk (e.g., kernel parameters exposed as virtual files).

%B

The preferred block size for efficient I/O on the file's filesystem (blksize). On some filesystems (e.g., zfs), this is specific to each file, rather than to the whole filesystem.

%b

The amount of disk space occupied by the file in standardized 512-byte blocks (blocks).

%k

The amount of disk space occupied by the file in units of 1KiB "blocks". This is defined as (blocks + blocks % 2) / 2. Note that elsewhere, reported blocks are always 512 bytes, and that real blocks on modern filesystems are often larger (e.g., 4KiB).

%a

The accessed time (atime) in the format returned by the C ctime(3) function (excluding its terminating newline).

%Ak

The accessed time (atime) in the format specified by k, which is either the at sign ("@"), for the number of seconds since the UNIX epoch, or a conversion specifier character for the C strftime(3) function. See strftime(3) for details.

%t

The modified time (mtime) in the format returned by the C ctime(3) function (excluding its terminating newline).

%Tk

The modified time (mtime) in the format specified by k, which is either the at sign ("@"), for the number of seconds since the UNIX epoch, or a conversion specifier character for the C strftime(3) function. See strftime(3) for details.

%c

The inode changed time (ctime) in the format returned by the C ctime(3) function (excluding its terminating newline).

%Ck

The inode changed time (ctime) in the format specified by k, which is either the at sign ("@"), for the number of seconds since the UNIX epoch, or a conversion specifier character for the C strftime(3) function. See strftime(3) for details.

%w

The file type description (as would be output by the file(1) utility). This is available on systems with libmagic(3) installed. On other systems, this is the empty string.

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%W

The MIME type (including the character set). This is available on systems with libmagic(3) installed. On other systems, this is the empty string.

When standard output (stdout) is a terminal, "?" is output in place of any control characters to prevent terminal escape injection.

%e

The Linux ext2-style file attributes, or BSD-style file flags, as a space-separated list of attribute/flag names. This is available on Linux systems with libe2p. See chattr(1) and lsattr(1) for details. This is also available on FreeBSD, OpenBSD, NetBSD, and macOS. See chflags(1) for details. On other systems, this is the empty string.

The possible Linux ext2-style file attribute names are: secrm, unrm, compr, sync, immutable, append, nodump, noatime, dirty, comprblk, nocompr, encrypt, index, imagic, journal_data, notail, dirsync, topdir, huge_file, extents, verity, ea_inode, nocow, snapfile, dax, snapfile_deleted, snapfile_shrunk, inline_data, projinherit, and casefold.

The possible FreeBSD file flag names are: nodump, uimmutable, uappend, opaque, unounlink, system, sparse, offline, reparse, archive, readonly, hidden, archived, simmutable, sappend, snounlink, and snapshot.

The possible OpenBSD file flag names are: nodump, uimmutable, uappend, opaque, archived, simmutable, and sappend.

The possible NetBSD file flag names are: nodump, uimmutable, uappend, opaque, snapshot, log, and snapinval.

The possible macOS file flag names are: nodump, uimmutable, uappend, opaque, compressed, tracked, datavault, hidden, archived, simmutable, sappend, restricted, snounlink, firmlink, and dataless.

%J

The ext2-style project number. This is available on Linux systems with libe2p. See chattr(1) and lsattr(1) for details. On other systems, this is the empty string.

%I

The ext2-style version/generation number. This is available on Linux systems with libe2p. See chattr(1) and lsattr(1) for details. On other systems, this is the empty string.

%z

The access control list (ACL) as a comma-separated list of items. This is available on Linux, FreeBSD, macOS, Solaris, and Cygwin. On systems without supported ACLs, this is the empty string.

FreeBSD and Solaris have NFSv4 ACLs with two forms of ACL text. By default, the compact form will be output. With the -v option, the non-compact form will be output. For "POSIX" ACLs (Linux and Cygwin) and macOS ACLs, the -v option has no effect.

On Solaris, ACLs are always present by default, even if they are trivially identical to the file permission bits. This can be convenient, but if it seems like noise, it can be silenced (but only on Solaris) by setting the environment variable RAWHIDE_SOLARIS_ACL_NO_TRIVIAL=1.

%x

The extended attributes (EA) as a comma-separated list. This is available on Linux, FreeBSD, macOS, Solaris, and Cygwin. On systems without supported EAs, this is the empty string.

Note that any control characters (i.e., ASCII 0-31, 127) in extended attribute names or values, and any non-ASCII bytes (i.e., 128-255) in the values, will be presented as C-like backslash escape sequences such as "\n" for the newline character, and "\0" for the null character, or as hexadecimal escape sequences such as "\x1b" for the escape character. Any backslash characters are quoted with a preceding backslash. Note that commas are not quoted.

On FreeBSD, extended attributes in the user namespace are presented with "user." as a prefix to their actual name, and those in the system namespace are presented with "system." as a name prefix. On FreeBSD, only the root user may see extended attributes in the system namespace.

On most systems with extended attributes, the values are typically only up to a few hundred bytes in size. But on Solaris, extended attributes take the form of regular files in a special extended attributes directory "hidden" inside each real file. Entire files can be copied into that special directory, and they become extended attributes. So extended attributes could tend to be larger on Solaris.

The default maximum total size for (encoded) extended attributes is 4KiB on most systems, and 64KiB on Solaris. If this is not enough, extended attributes will be silently truncated. This affects extended attribute searching (ea) (see rawhide.conf(5)), and the -L %x format conversion. To prevent truncation, set the environment variable RAWHIDE_EA_SIZE to a positive integer value that is large enough for your needs. Note that the value must be the size in bytes. Scale units are not supported.

On Solaris, every file's extended attributes directory contains the SUNWattr_ro and SUNWattr_rw extended attributes. By default, they are included for every file that has any other extended attributes. They can be excluded by setting the environment variable RAWHIDE_SOLARIS_EA_NO_SUNWATTR=1.

Also on Solaris, since extended attributes are files (of a sort), they each have their own stat(2) information. By default, this information is represented as an artificial extended attribute whose name is the name of the corresponding real extended attribute followed by "/stat". These artificial extended attributes can be suppressed by setting the environment variable RAWHIDE_SOLARIS_EA_NO_STATINFO=1.

%Z

The selinux(8) context. This is available on Linux systems with selinux(8) enabled. On other systems, this is the empty string.

%X

The access control list/extended attributes (ACL/EA) indicator (like in rh -l). When a (non-trivial) ACL is present, this is a plus sign ("+"). When any (interesting) EAs are present, this is an at sign ("@"). When both are present, this is an asterisk character ("*"). When neither is present, but there is an selinux(8) context, this is a dot character ("."). When none of the above are present, this is a space character (" ").

%j

All of the file information in JSON format, representing an object with the following possible attributes:

path (string) (same as %p)
name (string) (same as %f)
start (string) (same as %H)
depth (integer) (same as %d)
dev (integer) (same as %D)
major (integer) (same as %V)
minor (integer) (same as %v)
ino (integer) (same as %i)
mode (integer) (like %M, but in the underlying numeric form)
modestr (string) (same as %M)
type (string) (same as %y)
perm (integer) (same as %m, but in decimal)
user (string) (same as %u) (only if a name is available)
group (string) (same as %g) (only if a name is available)
uid (integer) (same as %U)
gid (integer) (same as %G)
rdev (integer) (same as %E)
rmajor (integer) (same as %R)
rminor (integer) (same as %r)
size (integer) (same as %s)
blksize (integer) (same as %B)
blocks (integer) (same as %b)
atime (string) (like %a/%A@, but in ISO format)
mtime (string) (like %t/%T@, but in ISO format)
ctime (string) (like %c/%C@, but in ISO format)
atime_unix (integer) (same as %A@)
mtime_unix (integer) (same as %T@)
ctime_unix (integer) (same as %C@)
filetype (string) (same as %w) (only if available)
mimetype (string) (same as %W) (only if available)
attributes (string) (same as %e) (only if available)
project (integer) (same as %J) (only if available)
generation (integer) (same as %I) (only if available)
access_control_list (string) (like %z without the -v option, but not reformatted as a comma-separated list) (only if present)
access_control_list_verbose (string) (like %z with the -v option, but not reformatted as a comma-separated list) (only if present)
extended_attributes (string) (like %x, but not reformatted as a comma-separated list) (only if present)
selinux_context (string) (same as %Z) (only if present)
acl_ea_indicator (string) (same as %X)

Note that any extended attributes are formatted and encoded as described for the ea pattern modifier (see rawhide.conf(5)), before being encoded again as a JSON string literal.

Also note that %j should not be used in conjunction with other format conversions, especially %x. If %x appears before %j in the -L format argument, the extended_attributes value will be reformatted as a comma-separated list. But this is unlikely to be a problem. The %j format conversion probably needs to be used by itself if the output is to be interpreted as valid JSON by other software.

Also note that this only works in locales that use UTF-8.

It is an error if % is followed by any other character, or if it is the last character in the format string.

All of the ISO C printf(3) conversion flags are available (i.e., "#", "0", "-", " ", and "+"), as well as field width and precision specifiers. These behave differently depending on the type of underlying printf(3) conversion that they are applied to. See printf(3) for details.

The above conversions that output text use an underlying printf(3) %s string conversion. Those that output an integer use an underlying %d or %o integer conversion. The %S (sparseness) conversion uses an underlying %g floating point number conversion.

Note that, when the %u (user) or %g (group) conversions are forced to output a numeric user or group ID because the user or group has no name, it is still output using an underlying %s string conversion, rather than changing to an underlying %d integer conversion. This is to prevent any surprises relating to the use of conversion flags or precisions. Also note that the %A@, %C@, and %T@ conversions output integers, so they use an underlying %d integer conversion, but all other %Ak, %Ck, and %Tk conversions use an underlying %s string conversion (because strftime(3) produces strings).

This option is mostly but not entirely compatible with GNU find(1)'s -printf action. rh doesn't do find(1)'s %F (filesystem type), and find(1) doesn't do rh's %z (access control list), %x (extended attributes), %X (ACL/EA indicator), %w (file type description), %W (MIME type), %e (attributes), %J (project), %I (generation), %V (dev major device number), %v (dev minor device number), %E (rdev device number), %R (rdev major device number), %r (rdev minor device number), %B (block size), or %j (JSON).

And for most of the format conversions that output an integer, rh and find(1) use different underlying printf(3) conversions (rh uses %d, and find(1) uses %s), so conversion flags or precisions would behave differently. This difference applies to all (non-strftime(3)) conversions that output integers except %m (file permissions in octal) and %d (depth), which rh and find(1) both treat as integers. But if conversion flags and precisions are not used, there is no difference.

And rh's %s (size) conversion outputs the size of readable directories as the number of entries they contain (excluding . and ..), rather than the usual (undocumented) st_size field of the corresponding stat(2) structure. This is arguably compatible, but not identical. This also makes the output of the %S (sparseness) conversion different for readable directories (but it isn't meaningful for directories, so that shouldn't matter).

-j

This option causes rh to output matching entries in JSON format. This is the same as -L "%j\n" (see above).

This option, and the -l, -0, -L, -x, -X, and -U options, are all mutually exclusive.

Path format options

-Q

Enclose matching paths in double quotes ("""). Any double quote or backslash characters ("\") in a path will be quoted with a preceding backslash.

-E

Output C-style escape sequences in place of any control characters in matching paths. Some control characters have single-letter backslashed encodings (i.e., "\a\b\t\n\v\f\r", which are ASCII 7 (BEL), 8 (BS), 9 (HT), 10 (LF), 11 (VT), 12 (FF), and 13 (CR), respectively). The remaining ones will be output as backslashed octal numbers (e.g., "\033", which is ASCII 27 (ESC)). Any backslash characters ("\") in a path will be quoted with a preceding backslash.

This option and the -q option are mutually exclusive.

-b

Same as the -E option (like ls(1)).

-q

Output "?" in place of any control characters in matching paths. This is the default if standard output (stdout) is a terminal, and the -E/-b option has not been supplied, so as to prevent terminal escape injection.

This option and the -E/-b option are mutually exclusive.

-p

Output "/" after matching directory paths so as to indicate that they are directories.

-t

Output most of the type indicators after matching paths (i.e., one of "/", "@", "=", "|", or ">").

-F

Output all of the type indicators after matching paths (i.e., one of "*", "/", "@", "=", "|", or ">").

The type indicators have the following meanings:

* executable
/ directory
@ symlink
= socket
| fifo
> door (Solaris only)

Other column format options

-H or -HH

For the block size, space, and size columns, use "human readable" traditional computer storage units, based on 1024 bytes, rather than just numbers of bytes. This is like the -h option in GNU ls(1). Implies the -l option.

If the size is below 1024, it is output in the usual way as a number of bytes. Otherwise, the appropriate scale is determined (i.e., K, M, G, T, P, E). If the scaled number is less than ten, a decimal place is included. Otherwise, an integer is output. For the block size and space columns, a decimal place is not included if it is zero. The size is rounded up. This gives the property that the actual size is never larger than the reported size.

When this option is supplied twice (i.e., -HH), then instead of always rounding up (like ls(1)), the reported size is rounded half up. This gives more accurate figures than always rounding up.

This option and the -I option are mutually exclusive.

-I or -II

For the block size, space, and size columns, use the International System of Units (SI) prefixes, based on 1000 bytes, rather than just numbers of bytes. This is like the --si option in GNU ls(1). Implies the -l option.

If the size is below 1000, it is output in the usual way as a number of bytes. Otherwise, the appropriate scale is determined (i.e., k, m, g, t, p, e). If the scaled number is less than ten, a decimal place is included. Otherwise, an integer is output. For the block size and space columns, a decimal place is not included if it is zero. The size is rounded up. This gives the property that the actual size is never larger than the reported size.

When this option is supplied twice (i.e., -II), then instead of always rounding up (like ls(1)), the reported size is rounded half up. This gives more accurate figures than always rounding up.

Like ls(1), a lower case k is used to represent KB. Unlike ls(1) (and unlike real SI prefixes), lower case letters are used to represent all of the other SI prefixes as well. This is to avoid any ambiguity.

This option and the -H option are mutually exclusive.

-T

For the modified time, accessed time, and inode changed time columns, use ISO date/time format ("YYYY-MM-DD HH:MM:SS +HHMM"), rather than the default format ("MMM DD HH:MM:SS YYYY"). Implies the -l option.

-#

For the user/owner and group columns, use numeric user and group IDs, rather than user and group names. Implies the -l option.

Debug option

-? spec

Output debug messages to standard error (stderr). The spec argument is scanned for one or more of the following labels:

cmdline, parser, traversal, exec, all, extra

The first four labels relate to different aspects of rh. all implies all four of them. extra outputs additional debug messages for parser and/or traversal when they are also included. There are no debug messages for exec.

Note that debug messages are not sanitized against terminal escape injection. So it is safest to direct debug output (i.e., stderr) to a file (e.g., rh -? all,extra 2>rh.dbg).

Note that if rh has been compiled without support for debug messages, this option will still be accepted, but there will be no debug messages.

SEARCH CRITERIA LANGUAGE

See rawhide.conf(5) for details on the search criteria language used in system-wide and user-specific configuration files, -f option files, and -e option arguments. It also includes details on the standard library that builds on the language, and makes rh easy to use. Now would be a good time to read it. The rest of this manual entry should make more sense. But here's a brief introduction.

There are expressions and functions. Expressions look like C expressions. The only data type is integer. These C operators are available (presented in groups of increasing precedence):

?:  Conditional (i.e., condition-expr ? if-expr : else-expr)

||  Logical or

&&  Logical and

|   Bit or

^   Bit exclusive or

&   Bit and

==  Equals
!=  Not equals

<   Less than
>   Greater than
<=  Less than or equal to
>=  Greater than or equal to

<<  Bit shift left
>>  Bit shift right

+   Addition
-   Subtraction

*   Multiplication
/   Division
%   Modulo (remainder)

-   Minus (unary)
~   Bit not (unary)
!   Logical not (unary)

Parentheses override operator precedence (e.g., (1 + 2) * 3).

Integer constants can be decimal, octal (starting with 0), or hexadecimal (starting with 0x). Decimal integers can have scale units (e.g., 1K, 2M, 3G, ... for traditional storage units (KiB, MiB, GiB, ...), and 1k, 2m, 3g, ... for SI-style units (KB, MB, GB, ...)).

There are special tokens to represent various things:

"pattern"             - file glob pattern matches
"pattern".modifier    - modified pattern matches
"/path".field         - reference files for comparison
"cmd".sh              - external shell commands
{cmd}.sh              - as above with alternate string literal syntax
$user @group          - user and group IDs
$$ @@                 - current user's user ID and primary group ID
[yyyy/mm/dd]          - dates
[yyyy/mm/dd hh:mm:ss] - date/times

See rawhide.conf(5) for all the details.

Functions can have parameters. Functions that don't have parameters can be defined and called with or without parentheses. Function bodies can only contain a return statement or an expression.

Every source/configuration file, -f option file, and -e option argument can contain zero or more function definitions, optionally followed by a file test expression, which is optionally terminated by a semicolon (";").

There are built-in symbols that represent the inode metadata (i.e., stat(2) structure fields) of candidate files (e.g., mode, uid, size, mtime, ...), Linux/BSD file attributes/flags (i.e., attr, proj, gen), other useful file information (e.g., nouser, nogroup, readable, writable, ...), control flow (i.e., prune, trim, exit), useful values and constants (e.g., now, today, minute, hour, day, ...), and more constants from C's <sys/stat.h> header file (e.g., IFMT, IFDIR, IRUSR, ...). There are also built-in symbols that represent the inode metadata of candidate symlink targets (e.g., tmode, tuid, tsize, tmtime, ...).

There is also a standard library of functions in /etc/rawhide.conf (or similar). It contains both readable and concise functions for various things like: file types (e.g., file, f, directory, dir, d, symlink, link, l, ...); file permissions (e.g., user_readable, ur, group_writable, gw, other_executable, ox, setuid, suid, all_readable, allr, all(), any(), none(), ...); aliases for stat(2) structure fields and other built-ins (e.g., inode, user, group, modified, accessed, imayread, ir, imaywrite, iw, ...); size units (e.g., KiB, MiB, GiB, KB, MB, GB, ...); and miscellaneous helper functions (e.g., ago(), old(), past(), empty, roots, mine, broken, gmtoday, ...).

On Linux systems, there is an additional library of functions in /etc/rawhide.conf.d/attributes. It contains constants and predicates for ext2-style file attributes (e.g., immutable, append, nodump, nocow, dax, ...).

On FreeBSD, OpenBSD, NetBSD, and macOS systems, there is an additional library of functions in /etc/rawhide.conf.d/attributes. It contains constants and predicates for BSD file flags (e.g., uimmutable, uappend, nodump, opaque, ...).

By default, string literals ("pattern") represent a file glob pattern match against the file name. Pattern modifiers ("pattern".modifier) change the interpretation of string literals to let you choose how to match text (i.e., glob pattern or Perl-compatible regular expression (regex), and case-sensitive or case-insensitive), and which text to match against (i.e., file name, path, symlink target path, access control list, or extended attributes).

There are other string literal suffixes that represent the inode metadata (i.e., stat(2) structure fields) of arbitrary reference files ("/path".field) for comparison purposes. And the sh string literal suffix lets you execute an arbitrary shell command ("cmd".sh), and use its exit success status in the search criteria.

See rawhide.conf(5) for all the details. See below for some examples.

EXPRESSION EXAMPLES

The following are examples of rh expressions. Where multiple versions are given, the first one only uses built-in symbols, and the rest usually make use of the standard library in /etc/rawhide.conf (or similar) as well. See rawhide.conf(5) for details.

Find files that are owned by the user drew, and are writable by other people:

(uid == $drew) && (mode & 022) # uid and mode are built-in
(uid == $drew) && (gw | ow)    # gw and ow are in /etc/rawhide.conf

Find files that are owned by root, have the setuid bit set, and are world-writable:

!uid && (mode & ISUID) && (mode & 02) # uid, mode, ISUID: built-in
roots && setuid && other_writable     # the rest: /etc/rawhide.conf
roots && setuid && world_writable
roots && suid && ow
roots && suid && ww

Find executable files that are larger than 10KiB, and have not been executed in the last 24 hours:

(mode & 0111) && (size > 10 * 1024) && (atime < now - 24 * hour)
any(0111) && (size > 10 * KiB) && accessed < ago(24 * hours)
anyx && sz > 10K && atime < ago(day)

Find C source files that are smaller than 4KiB, and other files that are smaller than 32KiB:

size < ("*.c" ? 4K : 32K)     # size: built-in
size < ("*.c" ? 4 : 32) * KiB # KiB: /etc/rawhide.conf

Find files that are an exact multiple of 1KiB in size:

(size % 1024) == 0
!(sz % 1K)

Find files that were last modified during March, 1982:

mtime >= [1982/3/1] && mtime < [1982/4/1]
modified >= [1982/3/1] && modified < [1982/4/1]

Find files that have been read since they were last written:

atime > mtime
accessed > modified

Find files whose names are between 4 and 10 bytes in length:

strlen >= 4 && strlen <= 10
len >= 4 && len <= 10

Find files that are at a relative depth of 3 or more below the starting search directory:

depth >= 3

This expression finds *.c files. However, it will not search in any directories named bin or tmp. If these file names are encountered, the prune built-in is evaluated, preventing the current path from matching, and preventing further searching below the current path.

("tmp" || "bin") ? prune : "*.c"
("tmp" || "bin") && prune || "*.c"

Find files that were modified after another file was last modified:

mtime > "/otherfile".mtime
modified > "/otherfile".modified

Find files that are larger than one file and smaller than another file:

size > "/somefile".size && size < "/otherfile".size
sz > "/somefile".sz && sz < "/otherfile".sz

Find files with holes (for filesystems without transparent compression):

(mode & IFMT) == IFREG && size && blocks && (blocks * 512) < size
file && size && blocks && space < size

Find regular files with multiple hard links:

(mode & IFMT) == IFREG && nlink > 1
file && nlinks > 1
f && nlink > 1

Find all hard links to a particular file:

(dev == "/path".dev) && (ino == "/path".ino)
(dev == "/path".dev) && (ino == "".ino) # Implicit 2nd reference

Find devices with the same device driver as /dev/tty:

rmajor == "/dev/tty".rmajor

Find symlinks whose target paths are relative:

"[!/]*".link

Find symlinks whose ultimate targets are on a different filesystem:

(mode & IFMT) == IFLNK && texists && tdev != dev
symlink && target_exists && target_dev != dev
l && texists && tdev != dev
texists && tdev != dev

Find symlinks whose ultimate targets don't exist:

(mode & IFMT) == IFLNK && !texists
symlink && !target_exists
link && !texists
l && !texists
dangling
broken

Find mountpoints under the current directory:

$ rh -1 'dev != ".".dev'

Find directories with no sub-directories (fast, for most filesystems, but not btrfs):

$ rh 'd && nlink == 2'

The same, but works for btrfs (slow-ish, but demonstrates shell commands):

$ rh 'd && "[ $(rh -red -- %S | wc -l) = 0 ]".sh'
$ rh 'd && "[ -z \"$(rh -red -- %S)\" ]".sh'
$ rh 'd && { [ -z "$(rh -red -- %S)" ] }.sh'

Find empty (readable) directories (fast-ish, and works for btrfs):

$ rh 'd && empty'

Find symlinks whose immediate targets are also symlinks:

$ rh -l 'l && "[ -L \"$(rh -L%%l -- %S)\" ]".sh'
$ rh -l 'l && "[ -L \"$(readlink -- %S)\" ]".sh'
$ rh -l 'l && { [ -L "$(readlink -- %S)" ] }.sh'

Find all hard links to all regular files that have multiple hard links (very slow):

# rh -e 'f && nlink > 1' \
     -X 'rh / "(dev == \"%S\".dev) && (ino == \"\".ino)"; echo' \
     /

The same, but for a single filesystem only (shorter, less slow, but still very slow):

# rh -1 -e 'f && nlink > 1' -X 'rh -1 / "ino == \"%S\".ino"; echo' /

Find 32-bit ELF executables:

$ rh 'f && anyx && sz > 10k && "ELF 32-bit*executable*".what'

Find text files with ISO-8859 encoding:

$ rh 'f && "*ISO-8859 text".what'
$ rh 'f && "text/*; charset=iso-8859*".mime'

Find files that contain TODO:

$ rh 'f && "*TODO*".body'
$ rh 'f && "TODO".rebody'

Find files using a Perl-compatible regular expression (regex):

$ rh '"^[a-zA-Z0-9_]+[0-9][0-9][0-9]?\..*[a-cz]$".re'
$ rh '"^\w+\d{2,3}\..*[a-cz]$".re'

See perlre(1), pcre2pattern(3), and pcre2syntax(3) for details.

The same, but with documentation:

$ rh '"
  ^         # Anchor the match to the start of the base name
  \w+       # Starts with at least one word character
  \d{2,3}   # Followed by two or three digits
  \.        # Followed by a literal dot
  .*        # Followed by anything (or nothing)
  [ a-c z ] # Ends with a, b, c, or z
  $         # Anchor the match to the end of the base name
".re'

Case-insensitive search (anything with abc in the name):

$ rh '"*ABC*".i' # Case-insensitive glob of base name
$ rh '"ABC".rei' # Case-insensitive regex of base name

Find files by their full path starting from the search directory (anything under an abc directory):

$ rh '"*/abc/*".path'  # Glob of full path
$ rh '"/abc/".repath'  # Regex of full path
$ rh '"*/ABC/*".ipath' # Case-insensitive glob of full path
$ rh '"/ABC/".reipath' # Case-insensitive regex of full path

Find symlinks by their target path (symlinks to anything under an abc directory):

$ rh -l '"*/abc/*".link'  # Glob of symlink target path
$ rh -l '"/abc/".relink'  # Regex of symlink target path
$ rh -l '"*/ABC/*".ilink' # Case-insensitive glob of symlink target
$ rh -l '"/ABC/".reilink' # Case-insensitive regex of symlink target

Find files with "POSIX" ACLs (Linux and Cygwin) that grant write access to the user drew:

$ rh '(uid == $drew) ? "*user::?w?*".acl   : "*user:drew:?w?*".acl'
$ rh '(uid == $drew) ? "^user::.w.$".reacl : "^user:drew:.w.$".reacl'

Find files with NFSv4 ACLs (FreeBSD and Solaris) that grant write access to the user drew:

$ rh '(uid == $drew)
    ?    "*owner@:?w????????????:???????:allow*".acl
    : "*user:drew:?w????????????:???????:allow*".acl
'

$ rh '(uid == $drew)
    ?    "owner@:.w.{12}:.{7}:allow".reacl
    : "user:drew:.w.{12}:.{7}:allow".reacl
'

$ rh '(uid == $drew)
    ?    "owner@:[^:\n]+/write_data/[^:\n]+(:[^:\n]*)?:allow".reacl
    : "user:drew:[^:\n]+/write_data/[^:\n]+(:[^:\n]*)?:allow".reacl
'

Note that, with NFSv4 ACLs, you can search for ACLs using either the compact form, or the non-compact form. But be warned that the permission names in the non-compact form do not always appear in the same order (at least on Solaris).

Find files on macOS with ACLs that grant write access to the user drew:

$ rh '(uid == $drew) ? uw : "user:[^:\n]+:drew:\d+:allow:write".reacl'

Find files with non-trivial access control lists (ACL):

$ rh '"*mask::*".acl'        # "POSIX" ACLs (Linux, Cygwin)
$ rh '"(user|group):".reacl' # NFSv4 ACLs (FreeBSD, Solaris)
$ rh '"?*".acl'              # macOS ACLs

Find files with extended attributes (EA):

$ rh '"?*".ea'
$ rh '".".reea'

Find files on Linux by their selinux(8) context (any):

$ rh '"*security.selinux: *_u:*_r:*_t:s[0-3]*".ea'
$ rh '"^security\.selinux:\ .*_u:.*_r:.*_t:s[0-3]".reea'

Find files on Linux, FreeBSD, OpenBSD, NetBSD, or macOS, that are immutable or append-only:

$ rh / 'immutable || append'

Find files on Solaris with setuid executable extended attributes (silly):

$ rh / '"*/stat: -rws*".ea'
$ rh / '"/stat:\ -rws".reea'

FUNCTION EXAMPLES

The following are examples of function definition and usage.

This defines a function that returns true if the current candidate file is a directory, and false otherwise:

dir()
{
    return (mode & IFMT) == IFDIR;
}

And this defines a function that returns whether or not the current candidate file is owned by the current user:

mine()
{
    return uid == $$;
}

Then this expression matches directories that are not owned by the user:

dir() && !mine();

Since dir and mine take no arguments, they can be called without parentheses:

dir && !mine;

Parentheses can also be omitted when defining a function that has no parameters. For example, this defines a function named drews that returns true when the current candidate file is owned by the user drew:

drews
{
    return uid == $drew;
}

Functions can also have parameters. An alternative to the functions mine and drews could be:

owner(who)
{
    return uid == who;
}

Then this expression would be true for any file owned by the users alex or drew:

owner($alex) || owner($drew);

Since functions can only ever contain a return statement, the return keyword and the trailing semicolon (";") are optional.

The above functions can be defined as:

dir        { (mode & IFMT) == IFDIR }
mine       { uid == $$ }
drews      { uid == $drew }
owner(who) { uid == who }

COMMAND LINE EXAMPLES

The -e option argument usually supplies the file test expression. But it isn't usually necessary to actually include the -e option itself. If no explicit file test expression is supplied via the -e option, then any remaining command line arguments are examined to identify any implicit file test expression. The file test expression and search paths can appear in any order. The following examples are equivalent:

$ rh -e 'expr' dir1 dir2
$ rh 'expr' dir1 dir2
$ rh dir1 'expr' dir2
$ rh dir1 dir2 'expr'

List the current directory in detail (like ls -lA, but unsorted):

$ rh -rl

List the current directory in greater detail (all stat(2) details, and all type indicators):

$ rh -rvF

List the current directory in detail, sorted by name (by cheating):

$ rh -lM0 .* *

Delete old backup files:

$ rh -UUU '"*.bak" && modified <= ago(month)'
$ rh -UUU '"*.bak" && old(month)'

grep(1) for something only in recent files:

$ rh -e 'f && modified >= ago(hour)' -x 'grep -H something %s'
$ rh -e 'f && past(hour)'            -x 'grep -H something %s'

The same, but just list the files where grep(1) found something:

$ rh 'f && modified >= ago(hour) && "grep -q something %S".sh'
$ rh 'f && past(hour)            && "grep -q something %S".sh'

Show all access control lists:

$ rh -L '%p\n%z\n' '"?*".acl'

Show all extended attributes:

$ rh -L '%p\n%x\n' '"?*".ea'

Find the block device that the current directory resides on:

$ rh -l /dev 'b && rdev == ".".dev'

Note: This doesn't work for filesystems like devtmpfs that don't appear in /dev.

Find regular files whose sizes are prime numbers (so silly):

$ rh -l '
prime1(n, i) { (i * i > n) ? 1 : !(n % i) ? 0 : prime1(n, i + 2) }
prime(n) { (n < 2) ? 0 : !(n % 2) ? n == 2 : prime1(n, 3) }
file && prime(size)
'

Sum the sizes of all regular files in the current directory (with jq(1)):

$ rh -r -L '%j\n' f | jq .size | jq -s add
$ rh -r -L '%s\n' f | jq -s add
$ rh -rj f | jq .size | jq -s add

Some command line shell syntactic sugar to save keystrokes:

# rq - rh with automatic "" around the first argument
# usage: rq pattern [options] [path...]
# e.g.:  rq '*.c' instead of rh '"*.c"'
rq() { rq_pat="$1"; shift && rh -e "\"$rq_pat\"" "$@"; }

# rql - rh -l with automatic "" around the first argument
# usage: rql pattern [options] [path...]
# e.g.:  rql '*.c' instead of rh -l '"*.c"'
rql() { rql_pat="$1"; shift && rh -le "\"$rql_pat\"" "$@"; }

# rqv - rh -v with automatic "" around the first argument
# usage: rqv pattern [options] [path...]
# e.g.:  rqv '*.c' instead of rh -v '"*.c"'
rqv() { rqv_pat="$1"; shift && rh -ve "\"$rqv_pat\"" "$@"; }


# ri - rh with automatic "".i around the first argument
# usage: ri pattern [options] [path...]
# e.g.:  ri '*.c' instead of rh '"*.c".i'
ri() { ri_pat="$1"; shift && rh -e "\"$ri_pat\".i" "$@"; }

# ril - rh -l with automatic "".i around the first argument
# usage: ril pattern [options] [path...]
# e.g.:  ril '*.c' instead of rh -l '"*.c".i'
ril() { ril_pat="$1"; shift && rh -le "\"$ril_pat\".i" "$@"; }

# riv - rh -v with automatic "".i around the first argument
# usage: riv pattern [options] [path...]
# e.g.:  riv '*.c' instead of rh -v '"*.c".i'
riv() { riv_pat="$1"; shift && rh -ve "\"$riv_pat\".i" "$@"; }


# re - rh with automatic "".re around the first argument
# usage: re pattern [options] [path...]
# e.g.:  re '\.c$' instead of rh '"\.c$".re'
re() { re_pat="$1"; shift && rh -e "\"$re_pat\".re" "$@"; }

# rel - rh -l with automatic "".re around the first argument
# usage: rel pattern [options] [path...]
# e.g.:  rel '\.c$' instead of rh -l '"\.c$".re'
rel() { rel_pat="$1"; shift && rh -le "\"$rel_pat\".re" "$@"; }

# rev - rh -v with automatic "".re around the first argument
# usage: rev pattern [options] [path...]
# e.g.:  rev '\.c$' instead of rh -v '"\.c$".re'
rev() { rev_pat="$1"; shift && rh -ve "\"$rev_pat\".re" "$@"; }


# rei - rh with automatic "".rei around the first argument
# usage: rei pattern [options] [path...]
# e.g.:  rei '\.c$' instead of rh '"\.c$".rei'
rei() { rei_pat="$1"; shift && rh -e "\"$rei_pat\".rei" "$@"; }

# reil - rh -l with automatic "".rei around the first argument
# usage: reil pattern [options] [path...]
# e.g.:  reil '\.c$' instead of rh -l '"\.c$".rei'
reil() { reil_pat="$1"; shift && rh -le "\"$reil_pat\".rei" "$@"; }

# reiv - rh -v with automatic "".rei around the first argument
# usage: reiv pattern [options] [path...]
# e.g.:  reiv '\.c$' instead of rh -v '"\.c$".rei'
reiv() { reiv_pat="$1"; shift && rh -ve "\"$reiv_pat\".rei" "$@"; }


alias rl='rh -rl' # rh -l version of ls -lA (unsorted)
alias rlr='rh -l' # rh -l version of ls -lAR (unsorted)

alias rv='rh -rv' # rh -v version of ls -lA (unsorted)
alias rvr='rh -v' # rh -v version of ls -lAR (unsorted)

alias rj='rh -j'

alias r0='rh -0'

alias r1='rh -1'
alias r1l='rh -1l'
alias r1v='rh -1v'

alias ry='rh -y'
alias ryl='rh -yl'
alias ryv='rh -yv'

alias rY='rh -Y'
alias rYl='rh -Yl'
alias rYv='rh -Yv'


# jqs - (helper) use jq to sort rh -j by path
# usage: jq arguments that don't conflict with -s
# e.g.: rh -j | jqs -r
jqs() { jq -s "$@" 'sort_by(.path) | .[].path'; }

# jqt - (helper) use jq to sort rh -j by mtime, most recent first
# usage: jq arguments that don't conflict with -s
# e.g.: rh -j | jqt -r
jqt() { jq -s "$@" 'sort_by(-.mtime_unix,.path) | .[].path'; }

# jqz - (helper) use jq to sort rh -j by size
# usage: jq arguments that don't conflict with -s
# e.g.: rh -j | jqz -r
jqz() { jq -s "$@" 'sort_by(.size,.path) | .[].path'; }

# rhs - plain rh sorted by path (like ls -1AR)
# usage: rh arguments that don't conflict with -j
# e.g.: rhs f
rhs() { rh -j "$@" | jqs -r; }

# rht - plain rh sorted by mtime, most recent first (like ls -1ARt)
# usage: rh arguments that don't conflict with -j
# e.g.: rht f
rht() { rh -j "$@" | jqt -r; }

# rhz - plain rh sorted by size (like ls -1AR but sorted by size)
# usage: rh arguments that don't conflict with -j
# e.g.: rhz 'size > 1M'
rhz() { rh -j "$@" | jqz -r; }

# rls - rh -rl sorted by path (like ls -lA)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rls f
rls() { eval rh -lM0 $(rh -rj "$@" | jqs); }

# rlt - rh -rl sorted by mtime, most recent first (like ls -lAt)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rlt f
rlt() { eval rh -lM0 $(rh -rj "$@" | jqt); }

# rlz - rh -rl sorted by size (like ls -lA but sorted by size)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rlz 'size > 1M'
rlz() { eval rh -lM0 $(rh -rj "$@" | jqz); }

# rlrs - rh -l sorted by path (like ls -lAR)
# usage: rh arguments that don't conflict with -j
# e.g.: rlrs f
rlrs() { eval rh -lM0 $(rh -j "$@" | jqs); }

# rlrt - rh -l sorted by mtime, most recent first (like ls -lARt)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rlrt f
rlrt() { eval rh -lM0 $(rh -j "$@" | jqt); }

# rlrz - rh -l sorted by size (like ls -lAR but sorted by size)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rlrz 'size > 1M'
rlrz() { eval rh -lM0 $(rh -j "$@" | jqz); }

# rvs - rh -rv sorted by path (like ls -lA)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rvs f
rvs() { eval rh -vM0 $(rh -rj "$@" | jqs); }

# rvt - rh -rv sorted by mtime, most recent first (like ls -lAt)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rvt f
rvt() { eval rh -vM0 $(rh -rj "$@" | jqt); }

# rvz - rh -rv sorted by size (like ls -lA but sorted by size)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rvz 'size > 1M'
rvz() { eval rh -vM0 $(rh -rj "$@" | jqz); }

# rvrs - rh -v sorted by path (like ls -lAR)
# usage: rh arguments that don't conflict with -j
# e.g.: rvrs f
rvrs() { eval rh -vM0 $(rh -j "$@" | jqs); }

# rvrt - rh -v sorted by mtime, most recent first (like ls -lARt)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rvrt f
rvrt() { eval rh -vM0 $(rh -j "$@" | jqt); }

# rvrz - rh -v sorted by size (like ls -lAR but sorted by size)
# usage: rh arguments that don't conflict with -r or -j
# e.g.: rvrz 'size > 1M'
rvrz() { eval rh -vM0 $(rh -j "$@" | jqz); }

FIND(1) COMPARISON EXAMPLES

The following subsections are the examples from the GNU find(1) manual entry.

find - search for files in a directory hierarchy
Copyright (C) 1990-2022 Free Software Foundation, Inc
License GPLv3+: GNU GPL version 3 or later
https://www.gnu.org/software/findutils
https://www.gnu.org/licenses/gpl.html

Each example is followed by one or more equivalent rh commands, for the purpose of comparison. Multiple alternative rh commands typically use different functions from /etc/rawhide.conf (or similar). See rawhide.conf(5) for details.

Simple `find | xargs` approach

Find files named core in or below the directory /tmp and delete them.

$ find /tmp -name core -type f -print | xargs /bin/rm -f

$ rh /tmp '"core" && file' | xargs /bin/rm -f
$ rh /tmp '"core" && f' | xargs /bin/rm -f

Safer `find -print0 | xargs -0` approach

Find files named core in or below the directory /tmp and delete them, processing file names in such a way that file or directory names containing single or double quotes, spaces or newlines are correctly handled.

$ find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f

$ rh -0 /tmp '"core" && file' | xargs -0 /bin/rm -f
$ rh -0 /tmp '"core" && f' | xargs -0 /bin/rm -f

Executing a command for each file

Run file(1) on every file in or below the current directory.

$ find . -type f -exec file '{}' \;

$ rh -x 'file %s' file
$ rh -x 'file %s' f

Traversing the filesystem just once - for two different actions

Traverse the filesystem just once, listing set-user-ID files and directories into /root/suid.txt and large files into /root/big.txt.

$ find / \
  \( -perm -4000 -fprintf /root/suid.txt '%#m %u %p\n' \) , \
  \( -size +100M -fprintf /root/big.txt '%-10s %p\n' \)

# rh -L '' / '
  (setuid && "rh -M0 -L \"%%#m %%u %%p\n\" %s >> /root/suid.txt".sh) +
  (size > 100M && "rh -M0 -L \"%%-10s %%p\n\" %s >> /root/big.txt".sh)
'

Searching files by age

Search for files in your home directory which have been modified in the last twenty-four hours.

$ find $HOME -mtime 0

$ rh $HOME 'mtime >= now - 24 * hour'
$ rh $HOME 'mtime >= ago(24 * hours)'
$ rh $HOME 'modified >= ago(day)'
$ rh $HOME 'past(day)'

Searching files by permissions

Search for files which are executable but not readable for the current user.

$ find /sbin /usr/sbin -executable \! -readable -print

$ rh /sbin /usr/sbin 'executable && !readable'
$ rh /sbin /usr/sbin 'imayexec && !imayread'
$ rh /sbin /usr/sbin 'ix && !ir'

Search for files which have read and write permission for their owner, and group, but which other users can read but not write to. Files which meet these criteria but have other permission bits set (for example if someone can execute the file) will not be matched.

$ find . -perm 664

$ rh 'perm == 0664'

Search for files which have read and write permission for their owner and group, and which other users can read, without regard to the presence of any extra permission bits (for example the executable bit). This will match a file which has mode 0777, for example.

$ find . -perm -664

$ rh '(perm & 0664) == 0664'
$ rh 'all(0664)'

Search for files which are writable by somebody (their owner, or their group, or anybody else).

$ find . -perm /222

$ rh 'perm & 0222'
$ rh 'any(0222)'
$ rh 'user_writable || group_writable || other_writable'
$ rh 'uw || gw || ow'
$ rh 'uw | gw | ow'
$ rh uw+gw+ow
$ rh any_writable
$ rh anyw

Search for files which are writable by either their owner or their group.

$ find . -perm /220
$ find . -perm /u+w,g+w
$ find . -perm /u=w,g=w

$ rh 'perm & 0220'
$ rh 'any(0220)'
$ rh 'user_writable || group_writable'
$ rh 'uw || gw'
$ rh 'uw | gw'
$ rh uw+gw

Search for files which are writable by both their owner and their group.

$ find . -perm -220
$ find . -perm -g+w,u+w

$ rh '(perm & 0220) == 0220'
$ rh 'all(0220)'
$ rh 'uw && gw'

A more elaborate search on permissions. These two commands both search for files that are readable for everybody (-perm -444 or -perm -a+r), have at least one write bit set (-perm /222 or -perm /a+w) but are not executable for anybody (! -perm /111 or ! -perm /a+x respectively).

$ find . -perm -444 -perm /222 \! -perm /111
$ find . -perm -a+r -perm /a+w \! -perm /a+x

$ rh '(perm & 0444) == 0444 && (perm & 0222) && !(perm & 0111)'
$ rh 'all(0444) && any(0222) && none(0111)'
$ rh '(ur && gr && or) && (uw || gw || ow) && !(ux || gx || ox)'
$ rh 'all_readable && any_writable && none_executable'
$ rh 'allr && anyw && nonex'

Pruning - omitting files and subdirectories

Copy the contents of /source-dir to /dest-dir, but omit files and directories named .snapshot (and anything in them). It also omits files or directories whose names end in "~", but not their contents.

$ cd /source-dir
$ find . -name .snapshot -prune -o \( \! -name '*~' -print0 \) | \
    cpio -pmd0 /dest-dir

$ rh -0 '".snapshot" ? prune : !"*~"' | cpio -pmd0 /dest-dir
$ rh -0 '".snapshot" && prune || !"*~"' | cpio -pmd0 /dest-dir

Given the following directory of projects and their associated SCM administrative directories, perform an efficient search for the projects' roots:

$ find repo/ \
    \( -exec test -d '{}/.svn' \; \
    -or -exec test -d '{}/.git' \; \
    -or -exec test -d '{}/CVS' \; \
    \) -print -prune

$ rh repo 'd && "[ -d %S/.svn -o -e %S/.git -o -d %S/CVS ]".sh && trim'

Sample directories:

repo/project1/CVS
repo/gnu/project2/.svn
repo/gnu/project3/.svn
repo/gnu/project3/src/.svn
repo/project4/.git

Sample output:

repo/project1
repo/gnu/project2
repo/gnu/project3
repo/project4

Note: These examples highlight an interesting difference in pruning with rh and find(1). In the first example, the pruned paths themselves are not output. In the second example, they are. Both behaviours are useful. find(1) has a single -prune action for both, and the decision whether or not to output the pruned path itself is determined by whether and where -print (or certain other actions) appears on the command line. It's complicated. For simplicity, rh has separate prune and trim built-ins for these two behaviours. prune prevents the current candidate path from matching. trim doesn't. They both prevent searching below the current candidate path. So prune is used when the current candidate path itself needs to be excluded, and trim is used when it needs to be included. You can think of trim as a light prune.

Other useful examples

Search for several file types.

$ find /tmp \( -type f -o -type d -o -type l \)
$ find /tmp -type f,d,l

$ rh /tmp 'file || dir || link'
$ rh /tmp 'f || d || l'
$ rh /tmp 'f | d | l'
$ rh /tmp f+d+l

Search for files with the particular name needle and stop immediately when we find the first one.

$ find / -name needle -print -quit

$ rh / '"needle" ? exit : 0'
$ rh / '"needle" && exit'
$ rh / '"needle" && quit'

Demonstrate the interpretation of the %f and %h format directives of the -printf action for some corner cases. Here is an example including some output.

$ find . .. / /tmp /tmp/TRACE compile compile/64/tests/find \
    -maxdepth 0 -printf '[%h][%f]\n'

$ rh -M0 -L '[%h][%f]\n' \
    . .. / /tmp /tmp/TRACE compile compile/64/tests/find

Sample output:

[.][.]
[.][..]
[][/]
[][tmp]
[/tmp][TRACE]
[.][compile]
[compile/64/tests][find]

CAVEAT

Don't expect too much from the search criteria language. It is a very little language.

A function can only be called if its definition has already been encountered by the parser. So recursive functions are possible, but mutually recursive functions are not.

Function parameters (temporarily) share the same namespace as the functions themselves. This means that function parameter names can't be the same as the names of any existing functions or built-in symbols.

Locale support is peculiar. The only supported locales are those that use UTF-8 or an ASCII-compatible single-byte character encoding like ISO-8859-*. But the -j (JSON) option and the -L %j conversion are only supported in UTF-8 locales. In locales with an ASCII-compatible single-byte character encoding, JSON output might be invalid UTF-8 (and hence invalid JSON) because it might contain strings like file names or user names or group names in their original encoding, which wouldn't be UTF-8. Such invalid JSON output can be fixed by piping it through iconv -t utf8 to convert it to UTF-8. But only if the user's locale's character set is the only character set used on the system. Otherwise, it could become a mess of miscoded characters.

All non-ASCII characters in the search criteria are considered to be "letters" when parsing the names of functions, parameters, users, and groups. This means that all languages and scripts, and even emojis, can be used in names, but non-ASCII digits and numbers in other scripts cannot be used in numeric constants. Other multi-byte character encodings are not supported (e.g., UTF-16, GB 18030, Big5, Shift JIS, and EUC-*). These limitations let rh enjoy most of the benefits of Unicode without needing to expend any time or energy decoding and encoding characters.

If the user's $PATH environment variable includes the current working directory, or any other non-absolute paths, they are automatically removed. This is important for security when the -X option is used, and when the "cmd".sh "pattern" modifier is used. But it isn't important when they are not used. But it always happens anyway, for consistency. So this affects the -x option as well, whether or not the "cmd".sh "pattern" modifier is also used. This ensures that a change in the search criteria expression won't inadvertently change the behaviour of the -x option command.

When following symlinks with the -Y option, it's possible to encounter filesystem cycles. When this happens, rh will output an "error" message to indicate that it is skipping an already encountered directory because of the filesystem cycle, but this won't result in a non-zero exit status, because it's not really an error. If you would prefer that filesystem cycle detection not be reported at all, set the environment variable RAWHIDE_DONT_REPORT_CYCLES=1.

When using the -l option to output multiple columns of extra information for matching entries, the ideal width of each column is not known at the start. Small default widths are used, and columns are widened as necessary. This results in less than perfectly tidy columns. This is the result of wanting to use as little memory as possible, and wanting to avoid columns that look too wide. If you prefer columns that start already wide enough, and you know how wide they need to be, you can override the default initial column widths by setting environment variables whose names start with "RAWHIDE_COLUMN_WIDTH_". See the ENVIRONMENT section below for details.

Invalid values in environment variables (see below) are silently ignored.

The spaces for the virtual code, data, and stack have fixed sizes. So they could conceivably run out. But it would take megabytes of search criteria source code, or over ten thousand patterns, or hundreds of thousands of nested function calls. These thresholds are reduced if rh is configured with a small or tiny static size at compile time, but it would still be unlikely to be a problem.

If you have a pathologically deep directory tree (i.e., thousands of directories deep), you might want to rethink that, or you might need to increase the limit on the number of open files, with something like: ulimit -n 2048. This is because an open file descriptor is required for each directory level.

Filesystems can be mounted with options such as noatime, nodiratime, and relatime, which suppress or limit the updating of accessed times (to improve read performance). The ro mount option also suppresses updates. See mount(8) for details. On Linux, the relatime mount option is the default. The altered semantics affects the atime and tatime built-in symbols, and the atime reference file field (see rawhide.conf(5)).

A file's inode changed time (or status changed time) is not updated when its accessed time is changed. It is only updated for other changes to the inode. This relates to the ctime and tctime built-in symbols, and the ctime reference file field (see rawhide.conf(5)).

EXIT STATUS

rh's exit status is zero upon success, or non-zero upon failure. Possible reasons for failure are: invalid command line options or arguments; search criteria syntax errors; permission/existence errors while searching; permission/existence errors while unlinking; -x or -X commands exiting with a non-zero exit status; failure to change the current working directory; attempt to use a stat(2) structure field of a reference file that does not exist or cannot be reached; failure to follow a symlink (but not by default); failure to obtain an access control list; failure to obtain extended attributes; traversing too deeply; a starting search path being too long for its filesystem; failure to allocate memory; -x, -X, or "cmd".sh commands being too large; attempt to divide by zero; too much code; too much data (i.e., patterns, reference file paths, and shell commands); stack overflow.

ENVIRONMENT

The location of the main system-wide configuration file (/etc/rawhide.conf, or similar) can be overridden with the environment variable RAWHIDE_CONFIG. This is only available to non-root users (as it could be dangerous for root). The directory containing any additional system-wide configuration files is derived from it by appending ".d".

The location of the main user-specific configuration file (~/.rhrc) can be overridden with the environment variable RAWHIDE_RC. This is only available to non-root users (as it could be dangerous for root). The directory containing any additional user-specific configuration files is derived from it by appending ".d".

The following environment variables can be set to override the default initial column widths for the -l option:

RAWHIDE_COLUMN_WIDTH_DEV_MAJOR   (device column (major), default 1)
RAWHIDE_COLUMN_WIDTH_DEV_MINOR   (device column (minor), default 1)
RAWHIDE_COLUMN_WIDTH_INODE       (inode number column, default 6)
RAWHIDE_COLUMN_WIDTH_BLKSIZE     (block size column, default 1)
RAWHIDE_COLUMN_WIDTH_BLOCKS      (blocks column, default 2)
RAWHIDE_COLUMN_WIDTH_SPACE       (space column, default 6)
RAWHIDE_COLUMN_WIDTH_SPACE_UNITS (space column (-H/-I), default 4)
RAWHIDE_COLUMN_WIDTH_NLINK       (nlink column, default 1)
RAWHIDE_COLUMN_WIDTH_USER        (user/owner column, default 3)
RAWHIDE_COLUMN_WIDTH_GROUP       (group column, default 3)
RAWHIDE_COLUMN_WIDTH_SIZE        (size column, default 6)
RAWHIDE_COLUMN_WIDTH_SIZE_UNITS  (size column (-H/-I), default 4)
RAWHIDE_COLUMN_WIDTH_RDEV_MAJOR  (rdev column (major), default 2)
RAWHIDE_COLUMN_WIDTH_RDEV_MINOR  (rdev column (minor), default 3)

Their values must be integers between 1 and 99, inclusive.

Setting the environment variable RAWHIDE_REPORT_BROKEN_SYMLINKS=1 causes an error message (and an eventual non-zero exit status), when attempting to follow a symlink whose ultimate target does not exist or cannot be reached. By default, when following symlinks with the -y or -Y option, a broken symlink is not interpreted as an error. The broken symlink is just processed as though the -y or -Y option had not been supplied. This is done for compatibility with the familiar behaviour of find(1).

Setting the environment variable RAWHIDE_DONT_REPORT_CYCLES=1 suppresses the "error" message whenever a filesystem cycle is detected and skipped. This can happen when following symlinks with the -Y option. It's not really an error, and is just reported by default for compatibility with the familiar behaviour of find(1).

Setting the environment variable RAWHIDE_PCRE2_NOT_UTF8_DEFAULT=1 suppresses the assumption that regular expression patterns, and the file names, paths, symlink target paths, access control lists, and extended attributes that they match against, are encoded as UTF-8. When this environment variable is set, individual regular expression patterns can still enable UTF-8 interpretation with a leading (*UTF). This UTF-8 assumption is made when the locale uses UTF-8 (i.e., when the $LANG environment variable includes "UTF-8"). When the locale doesn't use UTF-8, and you want rh to assume that everything is UTF-8 anyway, set the environment variable RAWHIDE_PCRE2_UTF8_DEFAULT=1.

Setting the environment variable RAWHIDE_PCRE2_DOTALL_ALWAYS=1 causes regular expression patterns to always treat the subject text as a single line by default. This means that the . meta-character matches any character, including the newline character. This is like (?s) or Perl's /s modifier. This is enabled by default when matching file names, paths, symlink target paths, and MIME types (e.g., re, repath, relink, and remime) because it can be helpful when faced with malicious paths. These are all considered to be a single "line" (even if they actually contain newlines). It is not enabled by default when matching everything else (e.g., reacl, reea, rewhat, and rebody). These are all considered to be multiple lines. When this environment variable is set, its effect can be disabled for individual matches by starting the regular expression with (?-s).

Setting the environment variable RAWHIDE_PCRE2_MULTILINE_ALWAYS=1 causes regular expression patterns to always treat the subject text as multiple lines by default. This means that the ^ meta-character matches after every internal newline character, not just at the start. The $ meta-character matches before every internal newline character, not just at the end. This is like (?m) or Perl's /m modifier. This is not enabled by default when matching file names, paths, symlink target paths, and MIME types (e.g., re, repath, relink, and remime) because it can be unhelpful when faced with malicious paths. These are all considered to be a single "line" (even if they actually contain newlines). It is only enabled by default when matching everything else (e.g., reacl, reea, rewhat, and rebody). These are all considered to be multiple lines. When this environment variable is set, its effect can be disabled for individual matches by starting the regular expression with (?-m).

On Solaris, setting the environment variable RAWHIDE_SOLARIS_ACL_NO_TRIVIAL=1 suppresses trivial access control lists (ACLs). By default on Solaris, ACLs are always present, even if they are trivially identical to the file permission bits. This can be convenient, but if it seems like noise, it can be silenced (but only on Solaris). This affects access control list searching (acl) (see rawhide.conf(5)), and the -L %z format conversion (see above).

On Solaris, setting the environment variable RAWHIDE_SOLARIS_EA_NO_SUNWATTR=1 suppresses the inclusion of the ubiquitous SUNWattr_ro and SUNWattr_rw extended attributes. This affects extended attribute searching (ea) (see rawhide.conf(5)), and the -L %x format conversion (see above).

On Solaris, setting the environment variable RAWHIDE_SOLARIS_EA_NO_STATINFO=1 suppresses the artificial extended attributes that are included by default to represent the stat(2) information relating to real extended attributes, which take the form of regular files in a special extended attributes directory "hidden" inside each real file. This affects extended attribute searching (ea) (see rawhide.conf(5)), and the -L %x format conversion (see above).

The environment variable RAWHIDE_EA_SIZE can be set to a positive integer value to override the default buffer size used for (encoded) extended attributes. Note that the value must be the size in bytes. Scale units are not supported. On most systems, the default buffer size is 4KiB. On Solaris, the default buffer size is 64KiB. This can be used to increase the buffer size if needed to prevent extended attributes from being silently truncated. This affects extended attribute searching (ea) (see rawhide.conf(5)), and the -L %x format conversion (see above).

FILES

The following source/configuration files are read by default:

/etc/rawhide.conf     - main system-wide configuration
/etc/rawhide.conf.d/* - additional system-wide configuration
~/.rhrc               - main user-specific configuration
~/.rhrc.d/*           - additional user-specific configuration

The location of the system-wide configuration might be somewhere else, depending on the operating system preferences (e.g., /usr/local/etc, /opt/local/etc, /usr/pkg/etc).

The output of rh --help states where the system-wide configuration files are on the local system.

The location of the main system-wide configuration file (/etc/rawhide.conf, or similar) can be overridden with the environment variable RAWHIDE_CONFIG. This is only available to non-root users (as it could be dangerous for root). The directory containing any additional system-wide configuration files is derived from it by appending ".d".

The location of the main user-specific configuration file (~/.rhrc) can be overridden with the environment variable RAWHIDE_RC. This is only available to non-root users (as it could be dangerous for root). The directory containing any additional user-specific configuration files is derived from it by appending ".d".

HISTORY

On the 18th of February 1990, Ken Stauffer of the University of Calgary published the source code to Rawhide VERSION 2 on the Usenet comp.sources.unix newsgroup. It was posted as a multi-part shar(1) archive (as was the custom at the time). I think a previous version must date back to 1982 or earlier.

It was a lovely alternative to find(1) that let you define your own functions for file search criteria in a mini-language inspired by C ("because most Unix users already know C"). I remember liking it at the time, and I've often wished that it hadn't subsequently vanished into the ether.

One day, while looking through a dusty old archive tarball, I came across the source code in my old ~/News/Comp.sources.unix file. After 32 years, it only took an hour or so to get it compiling and working again. Yay! But of course, that wasn't enough. It had a bug or two, and so many security flaws (it actually was 1990, after all), and so many missing features that were needed to make it a viable modern alternative to GNU find(1) for all my file-finding needs. So I spent the next month or so fixing it all up, enhancing it in many fun ways, testing it ruthlessly, and documenting it thoroughly. It now has a lean flexible command line interface, many new capabilities (and even some novel ones), and a standard library of functions to make it really pretty, and easy to use, and easy to remember.

TRIVIA

In the 7th Edition UNIX programmer's manual (1979), the find(1) manual entry has a BUGS section which just says:

The syntax is painful.

SEE ALSO

rawhide.conf(5), find(1), xargs(1), ls(1), stat(1), jq(1), stat(2), fnmatch(3), glob(7), perlre(1), pcre2pattern(3), pcre2syntax(3), system(3), sh(1), printf(3), strftime(3), ctime(3), acl(2) or acl(3) or acl(5), xattr(1) or xattr(7) or extattr(2) or fsattr(7), chmod(1), setfacl(1), getfacl(1), setfattr(1), getfattr(1), selinux(8), runat(1), mount(8), chattr(1), lsattr(1), chflags(1), file(1), The C programming language.

AUTHORS

1990 Ken Stauffer (University of Calgary)

2022-2023 raf <[email protected]>