diff --git a/rfcs/text/0000-create-file-macho.md b/rfcs/text/0000-create-file-macho.md new file mode 100644 index 0000000000..51a97c6bdf --- /dev/null +++ b/rfcs/text/0000-create-file-macho.md @@ -0,0 +1,151 @@ +# 0000: Create the Mach-O sub-field of the File fieldset + +- Stage: **1 (draft)** +- Date: **TBD** + +Create the Mach Object (Mach-O) sub-field, of the `file` or `process` top-level fieldsets. This document metadata can be used for malware research, as well as coding and other application development efforts. + +## Fields + +**Stage 0** + +This RFC is to create the Mach-O sub-field within the `file.` fieldset. This will include 35 sub-fields. + +| Name | Type | Description | +|--------------------------------------------|------------|-----------------------------------------------------------------------------| +| macho.cpu | object | CPU information for the file. | +| macho.cpu.architecture | keyword | CPU architecture target for the file. | +| macho.cpu.byte_order | keyword | CPU byte order for the file. | +| macho.cpu.subtype | keyword | CPU subtype for the file. | +| macho.cpu.type | keyword | CPU type for the file. | +| macho.headers | nested | Header information for the file. | +| macho.headers.commands.number | long | Number of load commands for the Mach-O header. | +| macho.headers.commands.size | long | Size of load commands of the Mach-O header. | +| macho.headers.commands.type | keyword | Type of the load commands for the Mach-O header. | +| macho.headers.magic | keyword | Magic field of the Mach-O header. | +| macho.headers.flags | keyword | Flags set in the Mach-O header. | +| macho.segments | nested | Segment information for the file. | +| macho.segments.name | keyword | Name of this segment. | +| macho.segments.physical_offset | long | File offset of this segment. | +| macho.segments.physical_size | keyword | Amount of memory to map from the file. | +| macho.segments.virtual_address | keyword | Memory address of this segment. | +| macho.segments.virtual_size | keyword | Memory size of this segment. | +| macho.segments.sections | keyword | Section names contained in this segment. | +| macho.sections | nested | Section information for the segment of the file. | +| macho.sections.name | keyword | Section name for the segment of the file. | +| macho.sections.flags | keyword | Section flags for the segment of the file. | +| macho.sections.type | keyword | Section type for the segment of the file. | +| macho.sections.physical_offset | long | Section List offset. | +| macho.sections.physical_size | long | Section List physical size. | +| macho.sections.virtual_address | long | Section List virtual address. | +| macho.sections.virtual_size | long | Section List virtual size. | +| macho.sections.entropy | float | Shannon entropy calculation from the section. | +| macho.sections.chi2 | float | Chi-square probability distribution of the section. | +| macho.page_size | long | Page size of the file. | +| macho.cdhash | keyword | Code Digest (CD) SHA256 hash of the first 20-bytes of the file. | + + +**Stage 1** + +[New `macho.yml` candidate](macho/macho.yml)] + + + +## Usage + +**Stage 1** + +In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures. + +As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, libraries, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family. + +As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor (this is for Windows malware, but the concept is the same). + +## Source data + +**Stage 1** + +This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms. + +* [VirusTotal API](https://developers.virustotal.com/v3.0/reference) +* [Emerson FSF](https://github.com/EmersonElectricCo/fsf) +* [Target Strelka](https://github.com/target/strelka) +* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) +* [LIEF Analysis Library](https://lief.quarkslab.com/doc/latest/api/python/macho.html) + + + + + + + +## Scope of impact + +**Stage 2** + +There should be no breaking changes, depreciation strategies, or significant refactoring as this is creating a sub-field for the existing `file.` fieldset. + +While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields. + + + +## Concerns + + + + + + + + + +## Real-world implementations + + + +## People + +The following are the people that consulted on the contents of this RFC. + +* @peasead | author +* @devonakerr | sponsor +* @dcode, @peasead | subject matter expert + +## References + + + +### RFC Pull Requests + + + +* Stage 1: https://github.com/elastic/ecs/pull/1097 + + diff --git a/rfcs/text/macho/docs/usage/macho.asciidoc b/rfcs/text/macho/docs/usage/macho.asciidoc new file mode 100644 index 0000000000..f429a46c58 --- /dev/null +++ b/rfcs/text/macho/docs/usage/macho.asciidoc @@ -0,0 +1,19 @@ +[[ecs-macho-ussage]] +=== Mach-O Usage + +--Description-- + +[discrete] +=== Mach-O Field Details +| Field | Description | Level | +| ---- | ---- | ----------- | +| macho.cpu | CPU information for the file. | extended | +| ... | ... | ... | +| ... | ... | ... | +| ... | ... | ... | + +[discrete] +=== Field Reuse +The `macho` fields are expected to be nested at: `dll.macho`, `file.macho`, `process.macho`. + +Note also that the `macho` fields are not expected to be used directly at the root of the events. diff --git a/rfcs/text/macho/macho.yml b/rfcs/text/macho/macho.yml new file mode 100644 index 0000000000..16b597a4b7 --- /dev/null +++ b/rfcs/text/macho/macho.yml @@ -0,0 +1,146 @@ +--- +- name: macho + title: Mach-O file information. + group: 2 + description: > + These fields contain macOS Mach Object (Mach-O) metadata. + type: group + reusable: + top_level: false + expected: + - file + - process + fields: + + - name: cpu.architecture + description: CPU architecture target for the file. + type: keyword + level: extended + example: 64-bit + + - name: cpu.byte_order + description: CPU byte order for the file. + type: keyword + level: extended + example: Little endian + + - name: cpu.subtype + description: CPU subtype for the file. + type: keyword + level: extended + example: ARM (all) 64-bit + + - name: cpu.type + description: CPU type for the file. + type: keyword + level: extended + example: ARM 64-bit + + - name: headers + level: extended + description: Header information for the file. + type: nested + + - name: headers.commands.number + description: Number of load commands for the Mach-O header. + type: long + level: extended + example: 23 + + - name: headers.commands.size + description: Size of load commands of the Mach-O header. + type: long + level: extended + format: bytes + example: 3888 + + - name: headers.commands.type + description: Type of the load commands for the Mach-O header. + type: keyword + level: extended + example: LC_SYMTAB, 0x2c + + - name: headers.magic + description: Magic field of the Mach-O header. + type: keyword + level: extended + example: 0xfeedfacf + + - name: headers.flags + description: Flags set in the Mach-O header. + type: keyword + level: extended + example: TWOLEVEL, 0x4000000 + + - name: segments + level: extended + description: Segment information for the file. + type: nested + + - name: segments.vmaddr + description: Memory address of this segment. + type: keyword + level: extended + example: 0x0 + + - name: segments.name + description: Name of this segment. + type: keyword + level: extended + example: __TEXT, __DATA, __IMPORT + + - name: segments.vmsize + description: Memory size of this segment. + type: keyword + level: extended + example: 0x4c000 + + - name: segments.fileoff + description: File offset of this segment. + type: keyword + level: extended + example: 0x0 + + - name: segments.filesize + description: Amount of memory to map from the file. + type: keyword + level: extended + example: 0x4c000 + + - name: segments.sections + level: extended + description: Section information for the segment of the file. + type: flattened + + - name: segments.offset + description: Offset of the segment. + type: long + format: bytes + level: extended + example: 0 + + - name: segments.size + description: Segment limit size. + type: long + format: bytes + level: extended + example: 123456 + + - name: segments.flags + description: Segment flags. + type: keyword + level: extended + example: 0x0 + + - name: cdhash + description: Code Digest (CD) SHA256 hash of the first 20-bytes of the file. + type: keyword + level: extended + example: 2035094a7065b29421e7a51f51db9bd61807c3628f210b1f8e667235777dc592 + + - name: page_size + description: Page size of the file. + type: long + format: bytes + level: extended + example: 4096