Skip to content

Latest commit

 

History

History
127 lines (105 loc) · 5.65 KB

README.md

File metadata and controls

127 lines (105 loc) · 5.65 KB

phi-accrual-detector

What Is It?

This is a port of Akka's Accrual Failure Detector to Node.js. It is an implementation of "The Phi Accrual Failure Detector" by Hayashibara et al. as defined in their paper.

Why Use It?

The phi accrual detector provides a configurable, continuous "suspicion of failure" value for remote systems whose availability is indicated by periodic sampling. The phi value can help answer questions like:

  • Is some HTTP server up?
  • Did that out-of-process job handler crash?

The standard example is an event source that suddenly stops sending events.

The suspicion level adjusts to the recorded event intervals, which makes it more resilient to event sources that sawtooth into stability.

More examples:

How to Use It

  1. Install: npm install phi-accrual-detector

  2. Determine the configuration settings. The documentation below is largely copied from the Akka source. The specific settings depend on your application.

    1. threshold : The suspicion level above which the event source is considered to have failed.
    2. max_sample_size : The maximum number of samples to store for mean and standard deviation calculations of event reports.
    3. min_std_deviation : Minimum standard deviation for the normal distribution used when calculating phi. Too low a standard deviation might result in too much sensitivity for sudden, but normal, deviations in event intervals.
    4. acceptable_heartbeat_pause : Duration (ms) corresponding to the number of potentially lost/delayed events that will be accepted before it is considered anomalous. This margin is important for surviving sudden, occasional, gaps between event reports.
    5. first_heartbeat_estimate : Duration (ms) values with which to bootstrap the event history. They are recorded with rather high standard deviation since the environment is unknown at initialization.
  3. Reference it:

    var phi_detector = require('phi-accrual-detector');
    var mock_service_detector = phi_detector.new_detector(threshold,
                                                        max_sample_size,
                                                        min_std_deviation,
                                                        acceptable_heartbeat_pause,
                                                        first_heartbeat_estimate,
                                                        optional_name);
    /**
     * The 'available' event is broadcast when the phi value
     * crosses from above to below the threshold value
     */
    mock_service_detector.on('available', function (phi) {
      console.log("Sweet - the service is available!");
    })
    /**
     * The 'unavailable' event is broadcast when the phi value
     * crosses from below to above the threshold value
     */
    mock_service_detector.on('unavailable', function (phi) {
      console.log("Rats - the service has forsaken me");
    })
    
  4. Record events:

var mock_service = setInterval(function() {
  mock_service_detector.signal();
}, 100);

See the ./test directory for more samples and associated graphs to get an idea of phi behavior.

Probes

Probes are layered on top of the phi accrual detector and provide domain-specific wrappers for service classes. There are two types available:

  • HTTP Probe:
    var probe = http_probe.new_http_service_probe("http://www.google.com" /* url string or options object */,
                                                  80 /* polling frequency_ms */,
                                                  3 /* phi threshold */,
                                                  10 /* max sample history */,
                                                  20 /* min std deviation_ms */);
    probe.on('available', function (phi) {
      console.log('Ho-hum');
    });
    probe.on('unavailable', function (phi) {
      console.log('First time for everything');
    });

  • Synchronous Probe:
    • Similar to the HTTP Probe, but accepts a function():Boolean synchronous method that is periodically called to determine if a service is responding

You can also build your own probe using the sampling_detector.new_sampling_detector