Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fingerprint plugin shall document how fields are serialized #72

Open
ivosh opened this issue Jan 3, 2023 · 0 comments
Open

Fingerprint plugin shall document how fields are serialized #72

ivosh opened this issue Jan 3, 2023 · 0 comments

Comments

@ivosh
Copy link

ivosh commented Jan 3, 2023

The first sentence of the documentation about this plugin says: Create consistent hashes (fingerprints) of one or more fields and store the result in a new field.
However nowhere in the documentation it is documented how the plugin serializes the fields, for example if it sorts the keys and values, in case one of the following is specified:

  • concatenate_sources = true
  • concatenate_all_fields = true

Studying the plugin's source code, I can observe that property keys within an object are sorted somehow (in the natural order perhaps?) and that items within an array are probably not sorted.
However if the plugin claims to create consistent hashes, it shall also document the process, so that the hash can be re-created later.

Consider the following document which is being ingested by a logstash pipeline:


{
    "@timestamp": "2022-12-30T11:55:31.837Z",
    "message": "Dec 30 11:55:31 ca systemd[1]: run-docker-runtime\\x2drunc-moby-5fff27c042417aecda8491ffdd90a45530d2e2fc5e07ff0dbddbeae21d4952c2-runc.onhYLM.mount: Deactivated successfully.",
    "data_stream": {
      "namespace": "sccoe",
      "type": "logs",
      "dataset": "system.syslog"
    },
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "host": {
      "architecture": "x86_64",
      "containerized": false,
      "name": "ca",
      "os": {
        "platform": "ubuntu",
        "family": "debian",
        "name": "Ubuntu",
        "type": "linux",
        "version": "22.04 LTS (Jammy Jellyfish)",
        "kernel": "5.15.0-40-generic",
        "codename": "jammy"
      },
      "id": "79666a8bc3214311b4341e120c507a8c",
      "mac": [
        "00-50-56-BE-A4-AB",
        "02-42-96-95-16-BD",
        "02-42-F9-27-AB-AE",
        "A6-77-E0-4D-92-09",
        "AE-2C-0A-24-85-51",
        "AE-E0-58-8B-B0-97"
      ],
      "hostname": "ca",
      "ip": [
        "10.200.0.1",
        "172.16.108.167",
        "172.17.0.1",
        "fe80::250:56ff:febe:a4ab",
        "fe80::42:96ff:fe95:16bd",
        "fe80::42:f9ff:fe27:abae",
        "fe80::a477:e0ff:fe4d:9209",
        "fe80::ac2c:aff:fe24:8551",
        "fe80::ace0:58ff:fe8b:b097"
      ]
    },
    "ecs": {
      "version": "8.0.0"
    },
    "log": {
      "file": {
        "path": "/var/log/syslog"
      },
      "offset": 20519203
    },
    "elastic_agent": {
      "snapshot": false,
      "id": "3dfc9201-7d5c-4961-939a-df6b0ac8e19d",
      "version": "8.5.2"
    },
    "@version": "1",
    "event": {
      "timezone": "+00:00",
      "original": "Dec 30 11:55:31 ca systemd[1]: run-docker-runtime\\x2drunc-moby-5fff27c042417aecda8491ffdd90a45530d2e2fc5e07ff0dbddbeae21d4952c2-runc.onhYLM.mount: Deactivated successfully.",
      "dataset": "system.syslog",
      "hash": "ac4b09e75151f9a6640ad9580a08d3b0a3d78a49e8f52e8eaea4f7b74a7c5d89"
    },
    "input": {
      "type": "log"
    },
    "sccoe": {
      "component": {
        "id": "SYS",
        "name": "System Support"
      }
    },
    "agent": {
      "type": "filebeat",
      "id": "3dfc9201-7d5c-4961-939a-df6b0ac8e19d",
      "name": "ca",
      "version": "8.5.2",
      "ephemeral_id": "e20a235a-96ff-40dc-bfd5-0193d7266ba7"
    }
  }

Let's also consider the following fingerprint plugin configuration:

    fingerprint {
      concatenate_all_fields => true
      ecs_compatibility => "v8"
      method => "SHA256"
    }

So the question now is: how fingerprint plugin builds the string which is then fed into sha256 digest?
The documentation says only:

When set to true and method isn’t UUID or PUNCTUATION, the plugin concatenates the names and values of all fields of the event into one string (like the old checksum filter) before doing the fingerprint computation. If false and at least one source field is given, the target field will be an array with fingerprints of the source fields given.

However the documentation says nothing about the serialization details, which are necessary to know so that the hash can be reconstructed. In particular:

  • how are serialized nested objects? (consider log or host in the example; what are the delimiters? is breadth-first or depth-first algorithm used? etc.)
  • how are serialized properties within one object? (i.e. are the keys sorted? if yes, how?; what are the delimiters? etc.)
  • how are serialized items of an array? (are the items sorted? if yes, how?; what are the delimiters? etc.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant