Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Processors] Add Binary File Parsing Processor #24195

Closed
wants to merge 30 commits into from

Conversation

andrewstucki
Copy link

@andrewstucki andrewstucki commented Feb 23, 2021

What does this PR do?

This is a very large PR. It adds support for parsing data out of pe, macho, elf, and lnk files and dumping it to Elasticsearch. It's undergone some minor fuzzing, but does panic due to oversized memory allocations with some malformed elf files (some examples of which are in libbeat/formats/fixtures/elf/crashes) that break go's elf parsing library. I'll look into seeing if I can figure out a workaround or see if I can commit an upstream patch at some point.

This parses files through leveraging a new add_file_data processor created to parse files specified at a given path. One of the oddities with how this works is that due to the supported formats not currently being finalized in ECS, any beat/module that uses this will need to add the extended fields that this processor adds into a module's fields.yml. Ideally this would eventually be replaced with either official ECS support or more modular field definitions through packages.

It builds off of the work by @peasead in elastic/ecs#1097, elastic/ecs#1071, and elastic/ecs#1077 with some minor changes and the addition of an lnk file format.

For now, it adds the corresponding field mappings and templated processor settings to the auditbeat.file_integrity module.

Last major thing is that internally the telfhash calculations use the capstone disassembly framework to disassemble and hash non-exported call sites. Capstone is written in C and, if we want to keep the telfhash code around then I'll have to look into compiling it into libbeat (unless someone has other ideas).

The processor itself takes a number of configuration arguments to reduce the impact of doing this for every single file on the system and instead parsing files that are of specific interest. Configuration used for filtering/changing failure modes are:

`exclude`:: Exclude the specified file parsers.
`only`:: Use only the specified file parsers.
`ignore_failure`:: No-op if the file could not successfully be parsed.

Additionally, like all processors, you can filter out even more noise with processor conditions by adding a when block.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Use cases

File forensics.

Logs

Enabling with the following in auditbeat.yml:

  - module: file_integrity
    paths:
      - /usr/local/bin
    processors:
      - add_file_data:
          ignore_failure: true

executing the following:

cp /bin/ls /usr/local/bin/ls-malicious

and then querying:

curl -X GET "localhost:9200/auditbeat-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match" : { "file.path": { "query": "/usr/local/bin/ls-malicious" }}
    }
}'

gives me the following:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 7.581379,
    "hits" : [
      {
        "_index" : "auditbeat-8.0.0-2021.02.23-000001",
        "_id" : "BwXr0HcBQrUaCDhxhEKo",
        "_score" : 7.581379,
        "_source" : {
          "@timestamp" : "2021-02-23T22:03:50.764Z",
          "file" : {
            "owner" : "andrew.stucki",
            "mtime" : "2021-02-23T22:03:50.757Z",
            "size" : 51888,
            "mode" : "0755",
            "path" : "/usr/local/bin/ls-malicious",
            "inode" : "89391337",
            "ctime" : "2021-02-23T22:03:50.757Z",
            "mime_type" : "application/x-mach-binary",
            "hash" : {
              "sha1" : "0f7d51f54113ba60c9b229ca174bf030abf5ce46"
            },
            "gid" : "80",
            "type" : "file",
            "uid" : "502",
            "group" : "admin"
          },
          "hash" : {
            "sha1" : "0f7d51f54113ba60c9b229ca174bf030abf5ce46"
          },
          "file.macho" : {
            "architectures" : [
              {
                "type" : "Exec",
                "header" : {
                  "commands" : [
                    {
                      "number" : 25,
                      "size" : 72,
                      "type" : "LC_SEGMENT_64"
                    },
                    {
                      "number" : 25,
                      "size" : 552,
                      "type" : "LC_SEGMENT_64"
                    },
                    {
                      "size" : 232,
                      "type" : "LC_SEGMENT_64",
                      "number" : 25
                    },
                    {
                      "type" : "LC_SEGMENT_64",
                      "number" : 25,
                      "size" : 392
                    },
                    {
                      "type" : "LC_SEGMENT_64",
                      "number" : 25,
                      "size" : 72
                    },
                    {
                      "number" : 2.147483682E9,
                      "size" : 48,
                      "type" : "LC_DYLD_INFO_ONLY"
                    },
                    {
                      "number" : 2,
                      "size" : 24,
                      "type" : "LC_SYMTAB"
                    },
                    {
                      "number" : 11,
                      "size" : 80,
                      "type" : "LC_DYSYMTAB"
                    },
                    {
                      "number" : 14,
                      "size" : 32,
                      "type" : "LC_LOAD_DYLINKER"
                    },
                    {
                      "number" : 27,
                      "size" : 24,
                      "type" : "LC_UUID"
                    },
                    {
                      "number" : 50,
                      "size" : 32,
                      "type" : "LC_UNKNOWN"
                    },
                    {
                      "number" : 42,
                      "size" : 16,
                      "type" : "LC_SOURCE_VERSION"
                    },
                    {
                      "number" : 2.147483688E9,
                      "size" : 24,
                      "type" : "LC_MAIN"
                    },
                    {
                      "number" : 12,
                      "size" : 48,
                      "type" : "LC_LOAD_DYLIB"
                    },
                    {
                      "number" : 12,
                      "size" : 56,
                      "type" : "LC_LOAD_DYLIB"
                    },
                    {
                      "type" : "LC_LOAD_DYLIB",
                      "number" : 12,
                      "size" : 56
                    },
                    {
                      "number" : 38,
                      "size" : 16,
                      "type" : "LC_FUNCTION_STARTS"
                    },
                    {
                      "number" : 41,
                      "size" : 16,
                      "type" : "LC_DATA_IN_CODE"
                    },
                    {
                      "type" : "LC_CODE_SIGNATURE",
                      "number" : 29,
                      "size" : 16
                    }
                  ],
                  "magic" : "0xfeedfacf",
                  "flags" : [
                    "MH_NOUNDEFS",
                    "MH_DYLDLINK",
                    "MH_TWOLEVEL",
                    "MH_PIE"
                  ]
                },
                "segments" : [
                  {
                    "vmaddr" : "100000000",
                    "name" : "__TEXT",
                    "vmsize" : 20480,
                    "fileoff" : 0,
                    "filesize" : 20480,
                    "sections" : [
                      {
                        "offset" : 3404,
                        "size" : 13935,
                        "entropy" : 6.15,
                        "chi2" : 149586.65,
                        "flags" : [
                          "S_ATTR_PURE_INSTRUCTIONS",
                          "S_ATTR_SOME_INSTRUCTIONS"
                        ],
                        "name" : "__text",
                        "type" : "S_REGULAR"
                      },
                      {
                        "name" : "__stubs",
                        "type" : "S_SYMBOL_STUBS",
                        "offset" : 17340,
                        "size" : 462,
                        "entropy" : 3.3,
                        "chi2" : 22578,
                        "flags" : [
                          "S_ATTR_PURE_INSTRUCTIONS",
                          "S_ATTR_SOME_INSTRUCTIONS"
                        ]
                      },
                      {
                        "name" : "__stub_helper",
                        "type" : "S_REGULAR",
                        "offset" : 17804,
                        "size" : 786,
                        "entropy" : 4.34,
                        "chi2" : 24077.92,
                        "flags" : [
                          "S_ATTR_PURE_INSTRUCTIONS",
                          "S_ATTR_SOME_INSTRUCTIONS"
                        ]
                      },
                      {
                        "name" : "__const",
                        "type" : "S_REGULAR",
                        "offset" : 18592,
                        "size" : 504,
                        "entropy" : 5.34,
                        "chi2" : 4795.81
                      },
                      {
                        "name" : "__cstring",
                        "type" : "S_CSTRING_LITERALS",
                        "offset" : 19096,
                        "size" : 1213,
                        "entropy" : 5.18,
                        "chi2" : 11201.84
                      },
                      {
                        "entropy" : 3.43,
                        "chi2" : 10889.6,
                        "name" : "__unwind_info",
                        "type" : "S_REGULAR",
                        "offset" : 20312,
                        "size" : 160
                      }
                    ]
                  },
                  {
                    "vmaddr" : "100005000",
                    "name" : "__DATA_CONST",
                    "vmsize" : 4096,
                    "fileoff" : 20480,
                    "filesize" : 4096,
                    "sections" : [
                      {
                        "name" : "__got",
                        "type" : "S_NON_LAZY_SYMBOL_POINTERS",
                        "offset" : 20480,
                        "size" : 48,
                        "entropy" : 0,
                        "chi2" : 12240
                      },
                      {
                        "type" : "S_REGULAR",
                        "offset" : 20528,
                        "size" : 552,
                        "entropy" : 1.44,
                        "chi2" : 92757.22,
                        "name" : "__const"
                      }
                    ]
                  },
                  {
                    "name" : "__DATA",
                    "vmsize" : 4096,
                    "fileoff" : 24576,
                    "filesize" : 4096,
                    "sections" : [
                      {
                        "name" : "__la_symbol_ptr",
                        "type" : "S_LAZY_SYMBOL_POINTERS",
                        "offset" : 24576,
                        "size" : 616,
                        "entropy" : 2.54,
                        "chi2" : 64518.55
                      },
                      {
                        "chi2" : 8712,
                        "name" : "__data",
                        "type" : "S_REGULAR",
                        "offset" : 25200,
                        "size" : 56,
                        "entropy" : 1.21
                      },
                      {
                        "name" : "__bss",
                        "type" : "S_ZEROFILL",
                        "offset" : 0,
                        "size" : 224,
                        "entropy" : 2.09,
                        "chi2" : 31083.43
                      },
                      {
                        "entropy" : 2.03,
                        "chi2" : 19959.11,
                        "name" : "__common",
                        "type" : "S_ZEROFILL",
                        "offset" : 0,
                        "size" : 144
                      }
                    ],
                    "vmaddr" : "100006000"
                  }
                ],
                "libraries" : [
                  "/usr/lib/libutil.dylib",
                  "/usr/lib/libncurses.5.4.dylib",
                  "/usr/lib/libSystem.B.dylib"
                ],
                "imports" : [
                  "__DefaultRuneLocale",
                  "___assert_rtn",
                  "___bzero",
                  "___error",
                  "___maskrune",
                  "___snprintf_chk",
                  "___stack_chk_fail",
                  "___stack_chk_guard",
                  "___stderrp",
                  "___stdoutp",
                  "___tolower",
                  "_acl_free",
                  "_acl_get_entry",
                  "_acl_get_flag_np",
                  "_acl_get_flagset_np",
                  "_acl_get_link_np",
                  "_acl_get_perm_np",
                  "_acl_get_permset",
                  "_acl_get_qualifier",
                  "_acl_get_tag_type",
                  "_atoi",
                  "_calloc",
                  "_compat_mode",
                  "_err",
                  "_exit",
                  "_fflagstostr",
                  "_fprintf",
                  "_fputs",
                  "_free",
                  "_fts_children$INODE64",
                  "_fts_close$INODE64",
                  "_fts_open$INODE64",
                  "_fts_read$INODE64",
                  "_fts_set$INODE64",
                  "_fwrite",
                  "_getbsize",
                  "_getenv",
                  "_getopt",
                  "_getpid",
                  "_getuid",
                  "_getxattr",
                  "_group_from_gid",
                  "_humanize_number",
                  "_ioctl",
                  "_isatty",
                  "_kill",
                  "_listxattr",
                  "_localtime",
                  "_malloc",
                  "_mbr_identifier_translate",
                  "_mbrtowc",
                  "_memchr",
                  "_nl_langinfo",
                  "_optind",
                  "_printf",
                  "_putchar",
                  "_readlink",
                  "_realloc",
                  "_reallocf",
                  "_setenv",
                  "_setlocale",
                  "_signal",
                  "_sscanf",
                  "_strcoll",
                  "_strcpy",
                  "_strdup",
                  "_strerror",
                  "_strftime",
                  "_strlen",
                  "_strmode",
                  "_sysctlbyname",
                  "_tgetent",
                  "_tgetstr",
                  "_tgoto",
                  "_time",
                  "_tputs",
                  "_user_from_uid",
                  "_uuid_unparse_upper",
                  "_warn",
                  "_warnx",
                  "_wcwidth",
                  "_write",
                  "dyld_stub_binder"
                ],
                "symhash" : "f51f8c19a535637967a1ec097761649d",
                "cpu" : "x86_64",
                "byte_order" : "little-endian"
              }
            ]
          },
          ...
          "event" : {
            "category" : [
              "file"
            ],
            "type" : [
              "change",
              "creation"
            ],
            "action" : [
              "created",
              "updated",
              "attributes_modified"
            ],
            "module" : "file_integrity",
            "dataset" : "file",
            "kind" : "event"
          },
          "service" : {
            "type" : "file_integrity"
          }
        }
      }
    ]
  }
}

CC: @ebeahan , @dcode , @peasead

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 23, 2021
@andrewstucki andrewstucki requested a review from a team February 23, 2021 21:11
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 23, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #24195 updated

  • Start Time: 2021-03-03T21:38:25.346+0000

  • Duration: 110 min 20 sec

  • Commit: 1e82b35

Test stats 🧪

Test Results
Failed 0
Passed 46198
Skipped 4931
Total 51129

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 46198
Skipped 4931
Total 51129

@andrewstucki
Copy link
Author

To make it so that this doesn't require capstone I removed the telfhash code, it exists at https://github.com/andrewstucki/telfhash for now.

@andrewstucki andrewstucki marked this pull request as ready for review February 24, 2021 21:46
@andrewstucki andrewstucki requested a review from a team as a code owner February 24, 2021 21:46
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic
Copy link

botelastic bot commented Apr 3, 2021

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Apr 3, 2021
@mergify
Copy link
Contributor

mergify bot commented Apr 7, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b file-formats upstream/file-formats
git merge upstream/master
git push upstream file-formats

@botelastic
Copy link

botelastic bot commented May 19, 2021

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label May 19, 2021
@botelastic
Copy link

botelastic bot commented Jun 18, 2021

Hi!
This PR has been stale for a while and we're going to close it as part of our cleanup procedure.
We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team.
Feel free to re-open this PR if you think it should stay open and is worth rebasing.
Thank you for your contribution!

@botelastic botelastic bot closed this Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants