[filebeat] VirusTotal Livehunt dataset - WIP #21815

dcode · 2020-10-14T17:22:53Z

THIS IS CURRENTLY IN DRAFT

What does this PR do?

Adds initial support for streaming VirusTotal Livehunt data via Filebeat httpjson input from VT API endpoint or via a kafka broker input, allowing multi-stage pipeline (also helpful for testing).

Why is it important?

Data from VirusTotal (VT) is important for threat research. The Livehunt feature allows organizations to enable one or many YARA rules in one or many rulesets. This module uses the Livehunt Notification API to stream VT file objects into an ECS-compatible mapping, where possible, and an ECS-styled mapping elsewhere.

VirusTotal is just one source of file events, which are a bit different than other security-related logging. Making this data available and standardized in Elasticsearch will allow analysis that combines the existing security event logging from network and endpoints with the file objects that traverse those mediums.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

How to test this PR locally

My current testing procedures are documented in x-pack/filebeat/module/virustotal/README.md. I will attach raw ndjson logs that contain a sample of original events covering the use cases.

Related issues

Closes [New Module] VirusTotal Intelligence Live Hunt Filebeat Module #21541

Use cases

Feature: VirusTotal Livehunt dataset

  Scenario: Poll VirusTotal Livehunt HTTP API
    When Filebeat polls the Livehunt notification API
    Then it receives a set of the most recent notifications of file objects
    And common file and notification metadata is transformed into standardized mappings

  # The second scenario is the same as the first, except events are consumed from Kafka
  Scenario: Subscribe to Kafka topic to consume livehunt events
    When Filebeat consumes livehunt events from a Kafka broker
    Then it receives a set of the most recent notifications of file objects
    And common file and notification metadata is transformed into standardized mappings

  # This scenario picks up after the above 2 scenarios
  Scenario: Process ELF file notifications
    When Filebeat consumes a livehunt event
    And metadata indicates the file is an ELF object
    Then format ELF-related metadata into standardized ELF fields and common file fields

  # This scenario picks up after the above 2 scenarios
  Scenario: Process PE file notifications
    When Filebeat consumes a livehunt event
    And metadata indicates the file is a PE object
    Then format PE-related metadata into standardized PE fields and common file fields

Screenshots

Logs

TODO

elasticmachine · 2020-10-14T17:35:01Z

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #21815 updated
- Start Time: 2021-01-21T20:16:38.940+0000
Duration: 50 min 53 sec
Commit: f78e752

Test stats 🧪

Test	Results
Failed	1
Passed	5135
Skipped	574
Total	5710

Test errors

Expand to view the tests failures

`Build&Test / x-pack/filebeat-build / test_fileset_file_150_virustotal – x-pack.filebeat.tests.system.test_xpack_modules.XPackTest`

Expand to view the error details

 Exception: Key 'virustotal.packers' found in event is not documented!

Expand to view the stacktrace

 a = (<test_xpack_modules.XPackTest testMethod=test_fileset_file_150_virustotal>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../build/ve/docker/lib/python3.7/site-packages/parameterized/parameterized.py:518: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../filebeat/tests/system/test_modules.py:99: in test_fileset_file
    cfgfile=cfgfile)
../../filebeat/tests/system/test_modules.py:183: in run_on_file
    self.assert_fields_are_documented(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test_xpack_modules.XPackTest testMethod=test_fileset_file_150_virustotal>
evt = {'@timestamp': '2020-10-03T21:06:13.000Z', 'agent': {'ephemeral_id': 'e514d923-3c34-4a94-af99-a406ce4155b7', 'id': 'a7... '2021-01-21T20:51:49.271Z', 'dataset': 'virustotal.livehunt', 'ingested': '2021-01-21T20:51:50.425388415Z', ...}, ...}

    def assert_fields_are_documented(self, evt):
        """
        Assert that all keys present in evt are documented in fields.yml.
        This reads from the global fields.yml, means `make collect` has to be run before the check.
        """
        expected_fields, dict_fields, aliases = self.load_fields()
        flat = self.flatten_object(evt, dict_fields)
    
        def field_pattern_match(pattern, key):
            pattern_fields = pattern.split(".")
            key_fields = key.split(".")
            if len(pattern_fields) != len(key_fields):
                return False
            for i in range(len(pattern_fields)):
                if pattern_fields[i] == "*":
                    continue
                if pattern_fields[i] != key_fields[i]:
                    return False
            return True
    
        def is_documented(key, docs):
            if key in docs:
                return True
            for pattern in (f for f in docs if "*" in f):
                if field_pattern_match(pattern, key):
                    return True
            return False
    
        for key in flat.keys():
            metaKey = key.startswith('@metadata.')
            # Range keys as used in 'date_range' etc will not have docs of course
            isRangeKey = key.split('.')[-1] in ['gte', 'gt', 'lte', 'lt']
            if not(is_documented(key, expected_fields) or metaKey or isRangeKey):
>               raise Exception("Key '{}' found in event is not documented!".format(key))
E               Exception: Key 'virustotal.packers' found in event is not documented!

../../libbeat/tests/system/beat/beat.py:729: Exception

Steps errors

Expand to view the steps failures

`filebeat-Lint - make -C filebeat check;

make -C filebeat update;
make check-no-changes;`

Took 2 min 36 sec . View more details on here
Description: make -C filebeat check;make -C filebeat update;make check-no-changes;

`x-pack/filebeat-Lint - make -C x-pack/filebeat check;

make -C x-pack/filebeat update;
make check-no-`

Took 2 min 1 sec . View more details on here
Description: make -C x-pack/filebeat check;make -C x-pack/filebeat update;make check-no-changes;

`x-pack/filebeat-build - mage build test`

Took 29 min 32 sec . View more details on here
Description: mage build test

`Error signal`

Took 0 min 0 sec . View more details on here
Description: Error 'hudson.AbortException: script returned exit code 2'

Log output

Expand to view the last 100 lines of log output

[2021-01-21T21:02:20.980Z] FAILED tests/system/test_xpack_modules.py::XPackTest::test_fileset_file_150_virustotal
[2021-01-21T21:02:20.980Z] ================== 1 failed, 305 passed in 1274.19s (0:21:14) ==================
[2021-01-21T21:02:21.238Z] >> python test: Integration Testing Complete
[2021-01-21T21:02:21.238Z] Error: running "/go/src/github.com/elastic/beats/build/ve/docker/bin/pytest --timeout=90 --durations=20 --junit-xml=build/TEST-python-integration.xml tests/system/test_filebeat_xpack.py tests/system/test_http_endpoint.py tests/system/test_xpack_modules.py" failed with exit code 1
[2021-01-21T21:02:24.526Z] Error: running "docker-compose -p filebeat_8_0_0_89772cbfef-snapshot run -e DOCKER_COMPOSE_PROJECT_NAME=filebeat_8_0_0_89772cbfef-snapshot -e BEAT_STRICT_PERMS=false -e STACK_ENVIRONMENT=snapshot -e TESTING_ENVIRONMENT=snapshot -e GOCACHE=/go/src/github.com/elastic/beats/build/docker-gocache -v /var/lib/jenkins/workspace/Beats_beats_PR-21815/pkg/mod/cache/download:/gocache:ro -e GOPROXY=file:///gocache,direct -e EXEC_UID=1158 -e EXEC_GID=1159 -e TEST_COVERAGE=true -e RACE_DETECTOR=true -e TEST_TAGS=null,oracle -e MODULE=virustotal -e BEATS_INSIDE_INTEGRATION_TEST_ENV=true -e GOFLAGS=-mod=readonly beat /go/src/github.com/elastic/beats/x-pack/filebeat/build/mage-linux-amd64 pythonIntegTest" failed with exit code 1
[2021-01-21T21:02:24.871Z] Client: Docker Engine - Community
[2021-01-21T21:02:24.871Z]  Version:           20.10.2
[2021-01-21T21:02:24.871Z]  API version:       1.41
[2021-01-21T21:02:24.871Z]  Go version:        go1.13.15
[2021-01-21T21:02:24.871Z]  Git commit:        2291f61
[2021-01-21T21:02:24.871Z]  Built:             Mon Dec 28 16:17:32 2020
[2021-01-21T21:02:24.871Z]  OS/Arch:           linux/amd64
[2021-01-21T21:02:24.871Z]  Context:           default
[2021-01-21T21:02:24.871Z]  Experimental:      true
[2021-01-21T21:02:24.871Z] 
[2021-01-21T21:02:24.871Z] Server: Docker Engine - Community
[2021-01-21T21:02:24.871Z]  Engine:
[2021-01-21T21:02:24.871Z]   Version:          20.10.2
[2021-01-21T21:02:24.871Z]   API version:      1.41 (minimum version 1.12)
[2021-01-21T21:02:24.871Z]   Go version:       go1.13.15
[2021-01-21T21:02:24.871Z]   Git commit:       8891c58
[2021-01-21T21:02:24.871Z]   Built:            Mon Dec 28 16:15:09 2020
[2021-01-21T21:02:24.871Z]   OS/Arch:          linux/amd64
[2021-01-21T21:02:24.871Z]   Experimental:     false
[2021-01-21T21:02:24.871Z]  containerd:
[2021-01-21T21:02:24.871Z]   Version:          1.4.3
[2021-01-21T21:02:24.871Z]   GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
[2021-01-21T21:02:24.871Z]  runc:
[2021-01-21T21:02:24.871Z]   Version:          1.0.0-rc92
[2021-01-21T21:02:24.871Z]   GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
[2021-01-21T21:02:24.871Z]  docker-init:
[2021-01-21T21:02:24.871Z]   Version:          0.19.0
[2021-01-21T21:02:24.871Z]   GitCommit:        de40ad0
[2021-01-21T21:02:24.871Z] Unable to find image 'alpine:3.4' locally
[2021-01-21T21:02:25.807Z] 3.4: Pulling from library/alpine
[2021-01-21T21:02:26.066Z] c1e54eec4b57: Pulling fs layer
[2021-01-21T21:02:26.325Z] c1e54eec4b57: Verifying Checksum
[2021-01-21T21:02:26.325Z] c1e54eec4b57: Download complete
[2021-01-21T21:02:26.325Z] c1e54eec4b57: Pull complete
[2021-01-21T21:02:26.325Z] Digest: sha256:b733d4a32c4da6a00a84df2ca32791bb03df95400243648d8c539e7b4cce329c
[2021-01-21T21:02:26.325Z] Status: Downloaded newer image for alpine:3.4
[2021-01-21T21:02:28.515Z] + python .ci/scripts/pre_archive_test.py
[2021-01-21T21:02:30.419Z] Copy ./x-pack/filebeat/build into build/x-pack/filebeat/build
[2021-01-21T21:02:30.429Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21815/src/github.com/elastic/beats/build
[2021-01-21T21:02:30.729Z] + rm -rf ve
[2021-01-21T21:02:30.729Z] + find . -type d -name vendor -exec rm -r {} ;
[2021-01-21T21:02:30.741Z] Recording test results
[2021-01-21T21:02:31.892Z] [Checks API] No suitable checks publisher found.
[2021-01-21T21:02:32.246Z] + tar --version
[2021-01-21T21:02:32.604Z] + tar --exclude=test-build-artifacts-x-pack/filebeat-build.tgz -czf test-build-artifacts-x-pack/filebeat-build.tgz .
[2021-01-21T21:03:19.431Z] [INFO] Override default googleStorageUpload with some sleep
[2021-01-21T21:03:19.442Z] Sleeping for 1 min 38 sec
[2021-01-21T21:04:57.458Z] [Google Cloud Storage Plugin] Found 1 files to upload from pattern: test-build-artifacts-x-pack/filebeat-build.tgz
[2021-01-21T21:04:57.837Z] [Google Cloud Storage Plugin] Uploading: test-build-artifacts-x-pack/filebeat-build.tgz
[2021-01-21T21:05:09.958Z] + python .ci/scripts/search_system_tests.py
[2021-01-21T21:05:09.974Z] [INFO] system-tests='build/x-pack/filebeat/build/system-tests'. If no empty then let's create a tarball
[2021-01-21T21:05:10.299Z] + tar --version
[2021-01-21T21:05:10.600Z] + tar --exclude=x-pack-filebeat--system-tests-linux.tgz -czf x-pack-filebeat--system-tests-linux.tgz build/x-pack/filebeat/build/system-tests
[2021-01-21T21:05:37.167Z] [INFO] Override default googleStorageUpload with some sleep
[2021-01-21T21:05:37.178Z] Sleeping for 41 sec
[2021-01-21T21:06:18.191Z] [Google Cloud Storage Plugin] Found 1 files to upload from pattern: x-pack-filebeat--system-tests-linux.tgz
[2021-01-21T21:06:18.251Z] [Google Cloud Storage Plugin] Uploading: x-pack-filebeat--system-tests-linux.tgz
[2021-01-21T21:06:24.847Z] Client: Docker Engine - Community
[2021-01-21T21:06:24.847Z]  Version:           20.10.2
[2021-01-21T21:06:24.847Z]  API version:       1.41
[2021-01-21T21:06:24.847Z]  Go version:        go1.13.15
[2021-01-21T21:06:24.847Z]  Git commit:        2291f61
[2021-01-21T21:06:24.847Z]  Built:             Mon Dec 28 16:17:32 2020
[2021-01-21T21:06:24.847Z]  OS/Arch:           linux/amd64
[2021-01-21T21:06:24.847Z]  Context:           default
[2021-01-21T21:06:24.847Z]  Experimental:      true
[2021-01-21T21:06:24.847Z] 
[2021-01-21T21:06:24.847Z] Server: Docker Engine - Community
[2021-01-21T21:06:24.847Z]  Engine:
[2021-01-21T21:06:24.847Z]   Version:          20.10.2
[2021-01-21T21:06:24.847Z]   API version:      1.41 (minimum version 1.12)
[2021-01-21T21:06:24.847Z]   Go version:       go1.13.15
[2021-01-21T21:06:24.847Z]   Git commit:       8891c58
[2021-01-21T21:06:24.847Z]   Built:            Mon Dec 28 16:15:09 2020
[2021-01-21T21:06:24.847Z]   OS/Arch:          linux/amd64
[2021-01-21T21:06:24.847Z]   Experimental:     false
[2021-01-21T21:06:24.847Z]  containerd:
[2021-01-21T21:06:24.847Z]   Version:          1.4.3
[2021-01-21T21:06:24.847Z]   GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
[2021-01-21T21:06:24.847Z]  runc:
[2021-01-21T21:06:24.847Z]   Version:          1.0.0-rc92
[2021-01-21T21:06:24.847Z]   GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
[2021-01-21T21:06:24.847Z]  docker-init:
[2021-01-21T21:06:24.847Z]   Version:          0.19.0
[2021-01-21T21:06:24.847Z]   GitCommit:        de40ad0
[2021-01-21T21:06:30.607Z] Failed in branch x-pack/filebeat-build
[2021-01-21T21:06:30.684Z] Stage "Packaging" skipped due to earlier failure(s)
[2021-01-21T21:06:30.728Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21815/src/github.com/elastic/beats
[2021-01-21T21:06:30.983Z] Running on Jenkins in /var/lib/jenkins/workspace/Beats_beats_PR-21815
[2021-01-21T21:06:31.057Z] [INFO] getVaultSecret: Getting secrets
[2021-01-21T21:06:31.138Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-01-21T21:06:31.742Z] + chmod 755 generate-build-data.sh
[2021-01-21T21:06:31.742Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-21815/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-21815/runs/5 FAILURE 2992541
[2021-01-21T21:06:32.292Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-21815/runs/5/steps/?limit=10000 -o steps-info.json
[2021-01-21T21:06:32.843Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-21815/runs/5/tests/?status=FAILED -o tests-errors.json

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	1
Passed	5135
Skipped	574
Total	5710

Genuine test errors

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

Name: Build&Test / x-pack/filebeat-build / test_fileset_file_150_virustotal – x-pack.filebeat.tests.system.test_xpack_modules.XPackTest

- Provides input directly from VT API using key or via kafka topic - Implements filebeat transforms for many common [file object fields](https://developers.virustotal.com/v3.0/reference#files) - Implements filebeat transforms for many common [PE fields](https://developers.virustotal.com/v3.0/reference#pe_info) - Implements filebeat transforms for many common [ELF fields](https://developers.virustotal.com/v3.0/reference#elf_info) - Included some notes in README that I used to help develop and test this

peasead · 2020-10-16T20:52:30Z

VirusTotal ECS RFC
elastic/ecs#1034

… dcode/virustotal-module

dcode · 2020-10-16T22:24:12Z

So, I think we're pretty close functionally. Gonna smooth out some documentation and need to implement some tests...but not sure how that works. If anyone wants to play with this and trying to get started, I can give you a hand. I think we're about ready to bounce the schema off the @elastic/ecs team and working group to negotiate extensions and renaming for fields.

- Provides input directly from VT API using key or via kafka topic - Implements filebeat transforms for many common [file object fields](https://developers.virustotal.com/v3.0/reference#files) - Implements filebeat transforms for many common [PE fields](https://developers.virustotal.com/v3.0/reference#pe_info) - Implements filebeat transforms for many common [ELF fields](https://developers.virustotal.com/v3.0/reference#elf_info) - Included some notes in README that I used to help develop and test this

… dcode/virustotal-module

elasticmachine · 2020-10-29T21:18:24Z

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	1
Passed	1947
Skipped	259
Total	2207

Genuine test errors

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

    * **Name**: `Build&Test / x-pack/filebeat-build / test_fileset_file_100_virustotal – x-pack.filebeat.tests.system.test_xpack_modules.XPackTest`

elasticmachine · 2020-10-30T20:08:46Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

dcode · 2020-10-30T20:09:51Z

This isn't perfect, but it passes the local tests now and has docs by @peasead. I would welcome feedback on structure and/or style

andrewstucki

Took a quick look at some of the field mappings, haven't done a pass over everything yet, but awhile ago took a look at how we'd map some more detailed binary data info (based off of an experiment) into ECS-style fields and, accordingly, highlighted some of the PE/ELF info that I had thoughts about in this PR.

Is there a plan to do any Mach-O binaries?

andrewstucki · 2020-10-30T22:34:28Z

x-pack/filebeat/module/virustotal/livehunt/_meta/fields.yml

+      description: >
+        Number of ELF Section Headers.
+      type: long
+    - name: sections


I'm not entirely sure given that virus total returns information about whether artifacts are malicious to begin with, but I imagine that entropy calculations and or hashes might be useful to retain here.

Yes, the intention is to keep all the info for the time being. If users don't want a particular fieldset, it can be dropped in the filebeat config or ingest processor. The section data has chi2 calculations and entropy. Virustotal doesn't provide an overall status of malicious or benign, but offers community votes of that, and individual engine assessments

similar to comment about imported symbols below, we could normalize section data with something like:

file.*.sections:

{ "virtual_address": 4096, "size": 2353664, "entropy": 6.37, "name": ".text", "flags": "rx" }

After working through several examples and reading on the various binary executable formats, I've come up with this. Thoughts @andrewstucki ?

// Abstract structure for all binary types, missing fields for a given data source will be excluded { name: "[keyword] Name of code section", physical_offset: "[keyword] Offset of the section from the beginning of the segment, in hex", physical_size: "[long] Size of the code section in the file in bytes", virtual_address: "[keyword] relative virtual memory address when loaded", virtual_size: "[long] Size of the section in bytes when loaded into memory", flags: "[keyword] List of flag values as strings for this section", type: "[keyword] Section type as string, if applicable", segment_name: "[keyword] Name of segment for this section, if applicable" } // Mach-O example { file.macho.sections: [ { name: "__nl_symbol_ptr", flags: ["S_8BYTE_LITERALS"], type: "S_CSTRING_LITERALS", segment_name: "__DATA" }, ... ] } // ELF example { file.elf.sections: [ { name: ".data", physical_offset: "0x3000", physical_size: 16, virtual_address: "0x4000", flags: ["WA"], // This is how VT presents the data. Pretty sure this maps to ["WRITE", "ALLOC"], but I don't have an exhaustive mapping type: "PROGBITS" }, ... ] } // PE example { file.pe.sections: [ { name: ".data", physical_size: 2542592, virtual_address: "0x2DE000", virtual_size: 2579264, flags: ["rw"], // Again, this is how VT presents it. Likely maps to ["MEM_READ", "MEM_WRITE"], but I don't have an exhaustive mapping type: ".data", entropy: 6.83, chi2: 13360996 }, ... ] }

I'm least pleased by my Mach-O example, but I think that's mostly limited to how VT provides the data currently. It provides offset info for each segment, and then lists sections that exist within the segment with no info at all. This is the only reason, I think to even mention the segment name, though that could be omitted and and be listed within each segment data as a list of included sections.

Finally, I think this at least works for a common fieldset of section data. The flags we can improve over time since it will be a list of keywords, and for PE, I think it's hard-coded as an attribute of the section name/type.

@dcode that looks pretty similar to what I was thinking. Thinking through the flags bit trips me up a little too--I'm thinking that eventually we may want some verbiage in the description that says something to the effect of "use whatever constant name is found in the spec/OS headers" >_>. If we wanted to be strict about it, a VT filebeat module could always just normalize the VT payload to whatever we wanted.

Also, for reference, sections do have offset and size info associated with them, so despite the VT api shortcomings, pretty sure the same fields would still be useful. I'd be fine suggesting the entropy and chi2 calculations as fields too, at least as a first pass, in the RFC. Statistical byte calculations seem pretty common in the binary analysis-side of security.

agree on all points. on it

andrewstucki · 2020-10-30T22:37:27Z

x-pack/filebeat/module/virustotal/livehunt/_meta/fields.yml

+            Type of exported symbol
+          type: keyword
+          default_field: false
+    - name: imports


Wondering if it would make sense for this level to describe an actual linked in library and for the stuff currently nested here (i.e. name, type, etc.) to specify the symbols imported by a library. Otherwise you get symbols free of context from where they're actually being imported.

I agree. Ideally, I'd like to see a common representation across ELF, PE, and Mach-O. Unfortunately, these formats don't work the same, especially in the way they import symbols. I think making exports and imports nested rather than a group makes sense to maintain context. Making these a nested dictionary with common fields for each binary type might be the right answer. Not all binary types will have all fields populated, but at least consistent across formats. I'll play with this.

So, it's totally possible to try and resolve the libraries that symbols come from in ELF format, see example. There's actually default support for resolving the libraries in Go's standard library. I couldn't tell if the ndjson example that is dropped here actually does that as part of the VirusTotal service for ELF files, but if it does, it would probably make sense to scope these.

Edit: BTW, Go supports this through the GNU symbol versioning tables introduced to support Linux dynamic symbol versioning, so if a symbol isn't versioned, you'll be hard-pressed to get this information from the binary itself.

I'd really like to have a common interface across PE, ELF, and MachO for this. LIEF actually does this as an analytic framework, but VT doesn't expose this data equally across all binary types. We could implement a common fieldset for imports, and applications can populate it as they are able to.

Proposal:

file.*.imported_symbols:

{ "name": "my_symbol", "size": 0, "value": 0, "type": "function", "library_name": "my_library.dll" }

In the case of PE, the VT data would permit populating symbol name, library name, and we can derive a type of "function". For ELF, the data provides symbol name and type (the samples I've seen), for Mach-O... VT doesn't give us any symbols... just a list of linked libraries, which could feasibly go somewhere else as a flat list, say file.*.linked_libraries.

Anything not provided by the source (VT in this case) would be omitted. Another application could feasibly populate this data with much greater detail. The library_name for ELF could be resolved as you say, but it's not coded in the binary specifically (I think).

andrewstucki · 2020-10-30T22:40:28Z

x-pack/filebeat/module/virustotal/livehunt/_meta/fields.yml

+          type: flattened
+          description: >
+            If the PE contains resources, some info about them
+        - name: resource_languages


just wondering, does VT return language/type information tied to the specific resources it's enumerating? Because I would imagine this and the field below would show up in the resource_details albeit not aggregated.

Yes, that's correct. resource_types and resource_languages are summaries of resource_details. If I had an exhaustive list of the keys for languages and details, it'd be great not to flatten them to provide easy access to this data for aggregations, leaving the resource_details as a nested type for more complex analysis and visualization.

Here's an example

"resource_details": [ { "chi2": 40609.63671875, "entropy": 3.079699754714966, "filetype": "Data", "lang": "NEUTRAL", "sha256": "87ab855ab53879e5b1a7e59e7958e22512440c50627115ae5758f5f5f5685e79", "type": "RT_ICON" }, { "chi2": 22370.37890625, "entropy": 2.9842348098754883, "filetype": "Data", "lang": "NEUTRAL", "sha256": "60457334b5385635e2d6d5edc75619dd5dcd5b7f015d7653ab5a37520a52f5c4", "type": "RT_ICON" }, { "chi2": 27408.888671875, "entropy": 2.968428611755371, "filetype": "ASCII text", "lang": "NEUTRAL", "sha256": "a67c8c551025a684511bd5932b5ad7575b352653135326587054532d5e58ab2b", "type": "RT_STRING" } ], "resource_langs": { "NEUTRAL": 14 }, "resource_types": { "RT_GROUP_ICON": 1, "RT_ICON": 2, "RT_RCDATA": 3, "RT_STRING": 7, "RT_VERSION": 1 },

andrewstucki · 2020-10-30T22:41:22Z

x-pack/filebeat/module/virustotal/livehunt/_meta/fields.yml

+        Compile timestamp of the PE file.
+      type: date
+
+    - name: packers


why have this and the flattened field?

This can probably actually be removed. I restructured virustotal.packers because the data returned is consistent for both ELFs and PEs to include the analysis tool name and the resulting value. This isn't what the docs said though, so this was an attempt to provide a consistent interface with ELF data. I'll axe it.

andrewstucki · 2020-10-30T22:45:10Z

x-pack/filebeat/module/virustotal/livehunt/_meta/fields.yml

+      type: keyword
+      description: >
+        Version of the compiler product.
+    - name: rich_pe_header_hash


does it make sense to make this into rich_header.hash.*? I would imagine that some other forensics from rich headers might be useful in other PE parsing implementations

That's a good point. Since it's PE specific, maybe we treat it like authentihash. We could put them all under file.pe.hash.* with authentihash, rich_header_hash, imphash. Similarly, ELF would have file.elf.hash.telfhash

I guess I was thinking more along the lines of making it possible for someone to actually namespace whatever parsing might be done on the rich header itself. Say, if someone wanted to try and actually parse out the artifact ids/counts from the rich header itself, then by doing something like pe.rich_header.hash.* you could allow for someone else to go in and do something like pe.rich_header.entries or something.

Additionally, I believe that most of the time the hash for a rich header is usually just an md5 of the bytes in the rich header, correct? In which case pe.rich_header.hash.md5 would make sense to me.

peasead · 2020-10-31T00:43:03Z

Thanks for the comments, @andrewstucki

We have opened an issue to extend the PE fieldset and create the ELF fieldset.

We have the Mach-O data, but wanted to wait and see how the other two issues were handled and if we needed to make an RFC for either of them. Once we know if a new sub-fieldset (like ELF, and also Mach-O) needs and RFC or not, we planned on opening the Mach-O issue in the proper way.

That said, if you'd prefer we open the Mach-O issue now with our dataset, we certainly can.

andrewstucki · 2020-10-31T01:32:13Z

@peasead thanks for the heads up about the two issues. This module doesn't necessarily require the ECS extensions prior to getting merged as a module. That said, if we do decide to merge it prior to the field extensions shoring up, then we ought to make sure we don't break ECS (if any of these fields are official in the future with different types) and potentially consider shoving these fields into a new namespace.

WRT Mach-O format, no need to necessarily figure that out first. More of just a question about where you guys were going to go with this eventually.

…module

Since elastic#23183 was merged, `fields.yml` can now properly specify types for nested object properties

…al-module

mergify · 2021-08-03T09:16:52Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b dcode/virustotal-module upstream/dcode/virustotal-module
git merge upstream/master
git push upstream dcode/virustotal-module

mergify · 2021-09-22T12:18:56Z

This pull request does not have a backport label. Could you fix it @dcode? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

jlind23 · 2022-03-31T14:29:56Z

@dcode - Closing this one as there were no activity for a while

dcode added enhancement Filebeat Filebeat labels Oct 14, 2020

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 14, 2020

dcode self-assigned this Oct 14, 2020

dcode added the Team:Security-External Integrations label Oct 14, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 14, 2020

dcode force-pushed the dcode/virustotal-module branch from a72c033 to 5a7e08b Compare October 15, 2020 16:17

dcode and others added 4 commits October 16, 2020 10:34

initial docs

62cab07

added dashboard

422240f

Change results to nested field

74f346f

dcode force-pushed the dcode/virustotal-module branch from 23961c8 to 74f346f Compare October 16, 2020 15:39

peasead added 2 commits October 16, 2020 14:48

updated dashboard

fffecb4

spelling fixes

59c7ecb

dcode added 3 commits October 16, 2020 17:17

Adds VT dashboard and related viz

1853f56

Merge branch 'dcode/virustotal-module' of github.com:dcode/beats into…

8ef284c

… dcode/virustotal-module

Adds sample data for testing

60f0b3c

dcode and others added 9 commits October 19, 2020 13:37

initial docs

0d5e31a

added dashboard

cc06e0b

Change results to nested field

bf11b7c

Adds VT dashboard and related viz

29bd99f

updated dashboard

6d04c5e

spelling fixes

3a7bbbc

Adds sample data for testing

59fdfe4

re-exported dashboards using dev tools

da2d1db

dcode force-pushed the dcode/virustotal-module branch from 60f0b3c to da2d1db Compare October 19, 2020 18:51

dcode and others added 5 commits October 22, 2020 15:21

Renamed test data to .log

d27f4ef

Merge branch 'dcode/virustotal-module' of github.com:dcode/beats into…

e1671e3

… dcode/virustotal-module

Updated CHANGELOGs

de1a9d4

Merge branch 'dcode/virustotal-module' of github.com:dcode/beats into…

76cb9db

… dcode/virustotal-module

updated dashboard and docs

2d1ea4e

Parsed out packer list for all binaries, not just PEs

f647b3b

dcode marked this pull request as ready for review October 30, 2020 20:08

andrewstucki reviewed Oct 30, 2020

View reviewed changes

dcode mentioned this pull request Nov 6, 2020

Extend file.pe Fieldset elastic/ecs#1071

Merged

dcode added 6 commits November 9, 2020 15:17

Towards normalized symbol tables

4e86c74

Normalizing symbols

3e01dc1

move towards normalized symbol objects across all

2080552

Merge remote-tracking branch 'upstream/master' into dcode/virustotal-…

f87b6b9

…module

Merge remote-tracking branch 'upstream/master' into dcode/virustotal-…

93e5f8a

…module

update to for nested fields

2b275c5

dcode mentioned this pull request Dec 16, 2020

fields.yml doesn't generate correct properties for elasticsearch template for nested field types #23178

Closed

dcode added 4 commits December 17, 2020 08:40

Merge remote-tracking branch 'upstream/master' into dcode/virustotal-…

be22d2f

…module

Adjust fields.yml to implement nested types

959b6bd

Since elastic#23183 was merged, `fields.yml` can now properly specify types for nested object properties

Merge branch 'master' of github.com:elastic/beats into dcode/virustot…

b8b1e72

…al-module

catch up and sample documents of working ideas

f78e752

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Sep 22, 2021

jlind23 closed this Mar 31, 2022

[filebeat] VirusTotal Livehunt dataset - WIP #21815

[filebeat] VirusTotal Livehunt dataset - WIP #21815

Conversation

dcode commented Oct 14, 2020 • edited Loading

THIS IS CURRENTLY IN DRAFT

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

elasticmachine commented Oct 14, 2020 • edited by jenkins-beats-ci bot Loading

💔 Tests Failed

Build stats

Test stats 🧪

Test errors

Build&Test / x-pack/filebeat-build / test_fileset_file_150_virustotal – x-pack.filebeat.tests.system.test_xpack_modules.XPackTest

Steps errors

`filebeat-Lint - make -C filebeat check;

`x-pack/filebeat-Lint - make -C x-pack/filebeat check;

x-pack/filebeat-build - mage build test

Error signal

Log output

🐛 Flaky test report

Test stats 🧪

Genuine test errors

peasead commented Oct 16, 2020

dcode commented Oct 16, 2020

elasticmachine commented Oct 29, 2020 • edited by jenkins-beats-ci bot Loading

🐛 Flaky test report

Test stats 🧪

Genuine test errors

elasticmachine commented Oct 30, 2020

dcode commented Oct 30, 2020

andrewstucki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcode Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewstucki Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peasead commented Oct 31, 2020

andrewstucki commented Oct 31, 2020

mergify bot commented Aug 3, 2021

mergify bot commented Sep 22, 2021

jlind23 commented Mar 31, 2022

dcode commented Oct 14, 2020 •

edited

Loading

elasticmachine commented Oct 14, 2020 •

edited by jenkins-beats-ci bot

Loading

`Build&Test / x-pack/filebeat-build / test_fileset_file_150_virustotal – x-pack.filebeat.tests.system.test_xpack_modules.XPackTest`

`x-pack/filebeat-build - mage build test`

`Error signal`

elasticmachine commented Oct 29, 2020 •

edited by jenkins-beats-ci bot

Loading

dcode Nov 5, 2020 •

edited

Loading

andrewstucki Nov 2, 2020 •

edited

Loading