Skip to content

Latest commit

 

History

History
170 lines (148 loc) · 4.77 KB

hash_router.md

File metadata and controls

170 lines (148 loc) · 4.77 KB

hash_router

The hash_router enables routing based off of hashing either the fields values or the _key metadata value. It distributes the records equally over the number of partitions configured by adding the standard:route metadata to each record which is used by the routed_sender processor to dynamically route records. The hashing algorithm used is FNV-1a. This processor requires either the fields to be specified or the _key metadata value to be set.

Usage

Determine bucket route of hashed values by fields

Example of a job using the hash_router with fields configured and writing the data to Elasticsearch. This configuration specifies 15 partitions, so each record will be routed to one of the 15 partitions depending the resulting hash value.

{
    "name" : "testing",
    "workers" : 1,
    "slicers" : 1,
    "lifecycle" : "once",
    "assets" : [
        "standard",
        "elasticsearch"
    ],
    "apis": [
        {
            "_name": "elasticsearch_sender_api",
            "index": "other_index",
            "size": 1000
        }
    ],
    "operations" : [
        {
            "_op": "data_generator",
            "size": 10000
        },
        {
            "_op": "hash_router",
            "fields": ["created", "uuid"],
            "partitions": 15
        },
        {
            "_op": "routed_sender",
            "api_name": "elasticsearch_sender_api",
            "routing": {
                "**": "default"
            }
        }
    ]
}

Output of the example job

const data = [
    DataEntity.make(
        {
            date: '2020-01-17T19:21:52.159Z',
            field1: 'val1.1',
            field2: 'val1.2'
        },
        {
            _key: someId1
        }
    ),
    DataEntity.make(
        {
            date: '2020-01-17T19:21:52.159Z',
            field1: 'val2.1',
            field2: 'val2.2'
        },
        {
            _key: someId2
        }
    ),
];

const results = await processor.run(data);

results[0].getMetadata('standard:route') === '4';
results[1].getMetadata('standard:route') === '8';

Determine bucket route of hashed values by _key

Example of a job using the hash_router that uses the _key metadata value to determine the route. In this example the uuid is set as the _key which is then used by the hash_router to assign the records to one of the 15 partitions.

{
    "name" : "testing",
    "workers" : 1,
    "slicers" : 1,
    "lifecycle" : "once",
    "assets" : [
        "standard",
        "elasticsearch"
    ],
    "apis": [
        {
            "_name": "elasticsearch_sender_api",
            "index": "other_index",
            "size": 1000
        }
    ],
    "operations" : [
        {
            "_op": "data_generator",
            "size": 10000
        },
        {
            "_op": "set_key",
            "field": "uuid"
        },
        {
            "_op": "hash_router",
            "partitions": 15
        },
        {
            "_op": "routed_sender",
            "api_name": "elasticsearch_sender_api",
            "routing": {
                "**": "default"
            }
        }
    ]
}

Output of the example job

const data = [
    DataEntity.make(
        {
            date: '2020-01-17T19:21:52.159Z',
            field1: 'val1.1',
            field2: 'val1.2',
            uuid: 'd0fd71ae-18db-41c6-b14f-e9fa40dc2566'
        },
        {
            _key: 'd0fd71ae-18db-41c6-b14f-e9fa40dc2566'
        }
    ),
    DataEntity.make(
        {
            date: '2020-01-17T19:21:52.159Z',
            field1: 'val2.1',
            field2: 'val2.2',
            uuid: '530ff04c-c673-4f75-b001-a341a16f64a3'
        },
        {
            _key: '530ff04c-c673-4f75-b001-a341a16f64a3'
        }
    ),
];

const results = await processor.run(data);

results[0].getMetadata('standard:route') === '1';
results[1].getMetadata('standard:route') === '14';

Parameters

Configuration Description Type Notes
_op Name of operation, it must reflect the exact name of the file String required
fields Specifies fields to hash for partitioning String[] optional, defaults to using the _key metadata field
partitions Number of partitions to use with hashing Number required