Skip to content

0.2.4

Compare
Choose a tag to compare
@aphyr aphyr released this 03 Jun 20:57
· 436 commits to main since this release

This release is all about automation. It introduces a new SSH backend based on SSHJ which is significantly faster than the current clj-ssh. This release also shells out to scp for uploads and downloads, which is much, much faster than using clj-ssh or SSHJ. SSH errors are less frequent, and don't clog the logs with stacktraces.

For databases with expensive setup processes (especially those which need to be compiled from source), this release introduces jepsen.fs-cache: a lightweight, concurrency-controlled, filesystem-backed cache for strings, Clojure data, and entire files. This cache is persistent across Jepsen invocations, so you can build a binary or perform initial datafile allocation once, cache it, and skip that process on subsequent test runs.

There's also a new checker which looks for patterns in downloaded log files. This is particularly helpful for catching stacktraces, panics, segfaults, etc.

API Changes

  • In test SSH options, :password* is no longer used for sudo by default. To set a sudo password, set :sudo-password. This fixes a (likely rare) issue where sudo would skip a password prompt, sending that password to the stdin of whatever command was being invoked instead.
  • control/upload and download no longer take rest args, which used to be passed directly to clj-ssh. These were unused in Jepsen itself, but you may have relied on this behavior. If so, you should call into clj-ssh directly.
  • control.remote has been moved to control.core, and has been restructured to take option maps instead of relying on dynamically bound variables. This should only affect you if you wrote a custom Remote implementation.

New Features

  • control.sshj: a new Remote backend for the control system. This is orders of magnitude faster than clj-ssh. Unfortunately, like clj-ssh, it also exhibits weird race conditions.
  • control.scp allows Jepsen to upload and download files by shelling out to SCP, which is dramatically faster for large files. This is the default for both sshj and clj-ssh remotes.
  • fs-cache: a lightweight, local-filesystem-backed cache for Jepsen's control node. Well-suited for DBs that require an expensive build or setup process. Can cache strings, EDN structures, and remote files alike, and includes a basic locking mechanism.
  • A new checker, log-file-pattern, scans downloaded log files for given regular expressions. Handy for finding server crashes!
  • cli/test-all-cmd now merges opt specs like test-cmd does, allowing you to override default options.
  • util/sh: a wrapper for invoking local shell commands on the control node.

Bugfixes

  • control.util/tmp-file! now creates /tmp/jepsen if it doesn't already exist
  • control.clj-ssh (and the new sshj backend) now include a concurrency-limiting semaphore, which prevents at least some (but not all) of the weird, nondeterministic bugs we've seen with session initiation.

Minor Changes

  • checker.timeline is dramatically faster now: it uses a custom pretty-printer for events.
  • Large parts of control have been refactored into control.core, control.retry, etc. to improve readability and composability
  • Docker and AWS environments now also set up ed25519 keys by default
  • Lots of new tests for jepsen.control
  • When test-all tests crash, we now display their full paths, not just test names
  • Removed tea-time, a now-unused dependency
  • Removed :active-histories: a now-unused part of test maps
  • j.u.c.TimeoutException is now considered an "uninteresting" exception when choosing which exception to throw from a concurrent failure; this should result in more helpful stacktraces.
  • Control no longer logs a full stacktrace when it encounters a recoverable exception. Users consistently complained about these kinds of errors: they happen constantly but unpredictably, I can't eliminate them, and they don't really require user action. We log a one-line message instead.
  • os/debian no longer tries to install the old libzip2 package for Debian Jessie
  • nemesis.time uses fewer samples for ntpdate, is generally faster to set up
  • control.util/await-tcp-port can now take separate intervals for retry and logging: shorter latency, less log spam!