Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File.macho create #1097

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7080358
Merge pull request #1 from elastic/master
peasead Oct 20, 2020
314f9ab
Merge pull request #2 from elastic/master
peasead Nov 3, 2020
1448cd6
Merge pull request #3 from elastic/master
peasead Nov 4, 2020
16aae5f
Merge pull request #4 from elastic/master
peasead Nov 5, 2020
ef7bd12
initial commit
peasead Nov 5, 2020
0107542
added PR#
peasead Nov 5, 2020
de73a01
removed field present in code_signature
peasead Nov 10, 2020
714c859
removed field present in code_signature
peasead Nov 10, 2020
07c011d
updated work in signature
peasead Nov 20, 2020
aeadc6b
move executable fields to segments.
peasead Nov 20, 2020
29ecf43
removed signature fields
peasead Dec 23, 2020
6d77439
removed file. from field names
peasead Dec 23, 2020
16ad2bc
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
6969054
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
692cc5a
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
f64a08d
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
805f6c5
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
cd6a5e0
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
cdd9766
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
b8c02ce
renamed mach-o to macho
peasead Dec 23, 2020
6e8e729
Merge branch 'file.macho-create' of github.com:peasead/ecs into file.…
peasead Dec 23, 2020
72ee845
removed plurality from "header"
peasead Dec 23, 2020
c6f20b2
created usage doc
peasead Dec 23, 2020
bbd1afd
removed header plurality, sections to flattened
peasead Jan 13, 2021
e0e5a1a
changed macho.segments to nested
peasead Feb 1, 2021
689fa39
typo in segments.size
peasead Feb 1, 2021
ccf1b88
corrected segments.sections fieldtype
peasead Feb 1, 2021
84bdb2e
added cdhash to RFC doc.
peasead Feb 1, 2021
c049773
Fixed segments.offset fieldtype
peasead Feb 1, 2021
3fd0931
typo on rfc doc for segments.flags
peasead Feb 1, 2021
a53a52b
back to headers from header
peasead Feb 3, 2021
276acfe
Update 0000-create-file-macho.md
peasead Feb 9, 2021
d996d6f
Update macho.yml
peasead Feb 9, 2021
fc30c23
ecs housekeeping edits
ebeahan Feb 10, 2021
442d212
Update rfcs/text/0000-create-file-macho.md
peasead Feb 16, 2021
a7ff6ae
Update rfcs/text/0000-create-file-macho.md
peasead Feb 16, 2021
a96fd55
Update rfcs/text/0000-create-file-macho.md
peasead Feb 16, 2021
5177e2e
Update rfcs/text/0000-create-file-macho.md
peasead Mar 11, 2021
1347c0d
Update rfcs/text/0000-create-file-macho.md
peasead Mar 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions rfcs/text/0000-create-file-macho.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# 0000: Create the Mach-O sub-field of the File fieldset

- Stage: **1 (draft)**
- Date: **TBD**

Create the Mach Object (Mach-O) sub-field, of the `file` or `process` top-level fieldsets. This document metadata can be used for malware research, as well as coding and other application development efforts.

## Fields

**Stage 0**

This RFC is to create the Mach-O sub-field within the `file.` fieldset. This will include 35 sub-fields.

| Name | Type | Description |
|--------------------------------------------|------------|-----------------------------------------------------------------------------|
| macho.cpu | object | CPU information for the file. |
| macho.cpu.architecture | keyword | CPU architecture target for the file. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewstucki I did some cross-checking between the proposed field list here and your work in elastic/beats#24195. I pulled those mappings for macho into a gist for easier reference.

100% alignment isn't necessary at all at this point. However, there are enough differences between that approach and what's captured here I think it's worth raising.

@peasead @dcode @devonakerr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have since updated this PR with a proposed layout that aligns sections and segments with ELF and PE (which doesn't have segments).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @dcode.

There's still some large differences between the implementation in elastic/beats#24195 and what's proposed here.

Again, not something we need to completely resolve now. At a minimum, let's capture the difference in approaches as a Concern to make sure we revisit the topic in the next stage's PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewstucki the strategy I'm suggesting here is splitting out fat binary components that can be unified by the same file metadata (namely hashes or other source unique id's). That will allow analyzing all binaries by their sections, segments, symbols, etc in a seamless way, regardless of platform or if their multi-platform. Do you think that can work for your implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pining to see if there was a resolution here.

Copy link
Contributor

@andrewstucki andrewstucki Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcode sorry, just caught up on this. To clarify, you're talking about essentially emitting an event per cpu architecture? I'm not completely against that, but it does seem like a pretty artificial constraint to model something that is, by its nature, a single entity (i.e. a multi-arch mach-o file).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a use-case example: let's say we're triggering an alert on a malicious binary that has been compiled for both the new m1 as well as older, intel-based chipsets and we want to add in object-level data about the file. Would we emit two separate alerts, one for each architecture? Seems kind of odd.

| macho.cpu.byte_order | keyword | CPU byte order for the file. |
| macho.cpu.subtype | keyword | CPU subtype for the file. |
| macho.cpu.type | keyword | CPU type for the file. |
| macho.headers | nested | Header information for the file. |
| macho.headers.commands.number | long | Number of load commands for the Mach-O header. |
| macho.headers.commands.size | long | Size of load commands of the Mach-O header. |
| macho.headers.commands.type | keyword | Type of the load commands for the Mach-O header. |
| macho.headers.magic | keyword | Magic field of the Mach-O header. |
| macho.headers.flags | keyword | Flags set in the Mach-O header. |
| macho.segments | nested | Segment information for the file. |
| macho.segments.name | keyword | Name of this segment. |
| macho.segments.physical_offset | long | File offset of this segment. |
| macho.segments.physical_size | keyword | Amount of memory to map from the file. |
| macho.segments.virtual_address | keyword | Memory address of this segment. |
| macho.segments.virtual_size | keyword | Memory size of this segment. |
| macho.segments.sections | keyword | Section names contained in this segment. |
| macho.sections | nested | Section information for the segment of the file. |
| macho.sections.name | keyword | Section name for the segment of the file. |
| macho.sections.flags | keyword | Section flags for the segment of the file. |
| macho.sections.type | keyword | Section type for the segment of the file. |
| macho.sections.physical_offset | long | Section List offset. |
| macho.sections.physical_size | long | Section List physical size. |
| macho.sections.virtual_address | long | Section List virtual address. |
| macho.sections.virtual_size | long | Section List virtual size. |
| macho.sections.entropy | float | Shannon entropy calculation from the section. |
| macho.sections.chi2 | float | Chi-square probability distribution of the section. |
| macho.page_size | long | Page size of the file. |
| macho.cdhash | keyword | Code Digest (CD) SHA256 hash of the first 20-bytes of the file. |


**Stage 1**

[New `macho.yml` candidate](macho/macho.yml)]

<!--
Stage 3: Add or update all remaining field definitions. The list should now be exhaustive. The goal here is to validate the technical details of all remaining fields and to provide a basis for releasing these field definitions as beta in the schema. Use GitHub code blocks with yml syntax formatting.
-->

## Usage

**Stage 1**

In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures.

As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, libraries, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family.

As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor (this is for Windows malware, but the concept is the same).

## Source data

**Stage 1**

This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms.

* [VirusTotal API](https://developers.virustotal.com/v3.0/reference)
* [Emerson FSF](https://github.com/EmersonElectricCo/fsf)
* [Target Strelka](https://github.com/target/strelka)
* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss)
* [LIEF Analysis Library](https://lief.quarkslab.com/doc/latest/api/python/macho.html)

peasead marked this conversation as resolved.
Show resolved Hide resolved
<!--
Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list.
-->

<!--
Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting.
-->

<!--
Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2.
-->

## Scope of impact

**Stage 2**

There should be no breaking changes, depreciation strategies, or significant refactoring as this is creating a sub-field for the existing `file.` fieldset.

While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields.

<!--
Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into:
* Ingestion mechanisms (e.g. beats/logstash)
* Usage mechanisms (e.g. Kibana applications, detections)
* ECS project (e.g. docs, tooling)
The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each.
-->

## Concerns

ebeahan marked this conversation as resolved.
Show resolved Hide resolved
<!--
Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake.
-->

<!--
Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns.
-->

<!--
Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate the risk of churn and instability by resolving outstanding concerns.
-->

<!--
Stage 4: Document any new concerns and their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed.
-->

## Real-world implementations

<!--
Stage 4: Identify at least one real-world, production-ready implementation that uses these updated field definitions. An example of this might be a GA feature in an Elastic application in Kibana.
-->

## People

The following are the people that consulted on the contents of this RFC.

* @peasead | author
* @devonakerr | sponsor
* @dcode, @peasead | subject matter expert

## References

<!-- Insert any links appropriate to this RFC in this section. -->

### RFC Pull Requests

<!-- An RFC should link to the PRs for each of it stage advancements. -->

* Stage 1: https://github.com/elastic/ecs/pull/1097

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
...
-->
19 changes: 19 additions & 0 deletions rfcs/text/macho/docs/usage/macho.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[[ecs-macho-ussage]]
=== Mach-O Usage

--Description--

[discrete]
=== Mach-O Field Details
| Field | Description | Level |
| ---- | ---- | ----------- |
| macho.cpu | CPU information for the file. | extended |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |

[discrete]
=== Field Reuse
The `macho` fields are expected to be nested at: `dll.macho`, `file.macho`, `process.macho`.

Note also that the `macho` fields are not expected to be used directly at the root of the events.
146 changes: 146 additions & 0 deletions rfcs/text/macho/macho.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
- name: macho
title: Mach-O file information.
group: 2
description: >
These fields contain macOS Mach Object (Mach-O) metadata.
type: group
reusable:
top_level: false
expected:
- file
- process
fields:

- name: cpu.architecture
description: CPU architecture target for the file.
type: keyword
level: extended
example: 64-bit

- name: cpu.byte_order
description: CPU byte order for the file.
type: keyword
level: extended
example: Little endian

- name: cpu.subtype
description: CPU subtype for the file.
type: keyword
level: extended
example: ARM (all) 64-bit

- name: cpu.type
description: CPU type for the file.
type: keyword
level: extended
example: ARM 64-bit

- name: headers
level: extended
description: Header information for the file.
type: nested

- name: headers.commands.number
description: Number of load commands for the Mach-O header.
type: long
level: extended
example: 23

- name: headers.commands.size
description: Size of load commands of the Mach-O header.
type: long
level: extended
format: bytes
example: 3888

- name: headers.commands.type
description: Type of the load commands for the Mach-O header.
type: keyword
level: extended
example: LC_SYMTAB, 0x2c

- name: headers.magic
description: Magic field of the Mach-O header.
type: keyword
level: extended
example: 0xfeedfacf

- name: headers.flags
description: Flags set in the Mach-O header.
type: keyword
level: extended
example: TWOLEVEL, 0x4000000

- name: segments
level: extended
description: Segment information for the file.
type: nested

- name: segments.vmaddr
description: Memory address of this segment.
type: keyword
level: extended
example: 0x0

- name: segments.name
description: Name of this segment.
type: keyword
level: extended
example: __TEXT, __DATA, __IMPORT

- name: segments.vmsize
description: Memory size of this segment.
type: keyword
level: extended
example: 0x4c000

- name: segments.fileoff
description: File offset of this segment.
type: keyword
level: extended
example: 0x0

- name: segments.filesize
description: Amount of memory to map from the file.
type: keyword
level: extended
example: 0x4c000

- name: segments.sections
level: extended
description: Section information for the segment of the file.
type: flattened

- name: segments.offset
description: Offset of the segment.
type: long
format: bytes
level: extended
example: 0

- name: segments.size
description: Segment limit size.
type: long
format: bytes
level: extended
example: 123456

- name: segments.flags
description: Segment flags.
type: keyword
level: extended
example: 0x0

- name: cdhash
peasead marked this conversation as resolved.
Show resolved Hide resolved
description: Code Digest (CD) SHA256 hash of the first 20-bytes of the file.
type: keyword
level: extended
example: 2035094a7065b29421e7a51f51db9bd61807c3628f210b1f8e667235777dc592

- name: page_size
description: Page size of the file.
type: long
format: bytes
level: extended
example: 4096