-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File.macho create #1097
Closed
+316
−0
Closed
File.macho create #1097
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
7080358
Merge pull request #1 from elastic/master
peasead 314f9ab
Merge pull request #2 from elastic/master
peasead 1448cd6
Merge pull request #3 from elastic/master
peasead 16aae5f
Merge pull request #4 from elastic/master
peasead ef7bd12
initial commit
peasead 0107542
added PR#
peasead de73a01
removed field present in code_signature
peasead 714c859
removed field present in code_signature
peasead 07c011d
updated work in signature
peasead aeadc6b
move executable fields to segments.
peasead 29ecf43
removed signature fields
peasead 6d77439
removed file. from field names
peasead 16ad2bc
Update rfcs/text/0000-create-file-mach-o.md
peasead 6969054
Update rfcs/text/0000-create-file-mach-o.md
peasead 692cc5a
Update rfcs/text/0000-create-file-mach-o.md
peasead f64a08d
Update rfcs/text/0000-create-file-mach-o.md
peasead 805f6c5
Update rfcs/text/0000-create-file-mach-o.md
peasead cd6a5e0
Update rfcs/text/0000-create-file-mach-o.md
peasead cdd9766
Update rfcs/text/0000-create-file-mach-o.md
peasead b8c02ce
renamed mach-o to macho
peasead 6e8e729
Merge branch 'file.macho-create' of github.com:peasead/ecs into file.…
peasead 72ee845
removed plurality from "header"
peasead c6f20b2
created usage doc
peasead bbd1afd
removed header plurality, sections to flattened
peasead e0e5a1a
changed macho.segments to nested
peasead 689fa39
typo in segments.size
peasead ccf1b88
corrected segments.sections fieldtype
peasead 84bdb2e
added cdhash to RFC doc.
peasead c049773
Fixed segments.offset fieldtype
peasead 3fd0931
typo on rfc doc for segments.flags
peasead a53a52b
back to headers from header
peasead 276acfe
Update 0000-create-file-macho.md
peasead d996d6f
Update macho.yml
peasead fc30c23
ecs housekeeping edits
ebeahan 442d212
Update rfcs/text/0000-create-file-macho.md
peasead a7ff6ae
Update rfcs/text/0000-create-file-macho.md
peasead a96fd55
Update rfcs/text/0000-create-file-macho.md
peasead 5177e2e
Update rfcs/text/0000-create-file-macho.md
peasead 1347c0d
Update rfcs/text/0000-create-file-macho.md
peasead File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
# 0000: Create the Mach-O sub-field of the File fieldset | ||
|
||
- Stage: **1 (draft)** | ||
- Date: **TBD** | ||
|
||
Create the Mach Object (Mach-O) sub-field, of the `file` or `process` top-level fieldsets. This document metadata can be used for malware research, as well as coding and other application development efforts. | ||
|
||
## Fields | ||
|
||
**Stage 0** | ||
|
||
This RFC is to create the Mach-O sub-field within the `file.` fieldset. This will include 35 sub-fields. | ||
|
||
| Name | Type | Description | | ||
|--------------------------------------------|------------|-----------------------------------------------------------------------------| | ||
| macho.cpu | object | CPU information for the file. | | ||
| macho.cpu.architecture | keyword | CPU architecture target for the file. | | ||
| macho.cpu.byte_order | keyword | CPU byte order for the file. | | ||
| macho.cpu.subtype | keyword | CPU subtype for the file. | | ||
| macho.cpu.type | keyword | CPU type for the file. | | ||
| macho.headers | nested | Header information for the file. | | ||
| macho.headers.commands.number | long | Number of load commands for the Mach-O header. | | ||
| macho.headers.commands.size | long | Size of load commands of the Mach-O header. | | ||
| macho.headers.commands.type | keyword | Type of the load commands for the Mach-O header. | | ||
| macho.headers.magic | keyword | Magic field of the Mach-O header. | | ||
| macho.headers.flags | keyword | Flags set in the Mach-O header. | | ||
| macho.segments | nested | Segment information for the file. | | ||
| macho.segments.name | keyword | Name of this segment. | | ||
| macho.segments.physical_offset | long | File offset of this segment. | | ||
| macho.segments.physical_size | keyword | Amount of memory to map from the file. | | ||
| macho.segments.virtual_address | keyword | Memory address of this segment. | | ||
| macho.segments.virtual_size | keyword | Memory size of this segment. | | ||
| macho.segments.sections | keyword | Section names contained in this segment. | | ||
| macho.sections | nested | Section information for the segment of the file. | | ||
| macho.sections.name | keyword | Section name for the segment of the file. | | ||
| macho.sections.flags | keyword | Section flags for the segment of the file. | | ||
| macho.sections.type | keyword | Section type for the segment of the file. | | ||
| macho.sections.physical_offset | long | Section List offset. | | ||
| macho.sections.physical_size | long | Section List physical size. | | ||
| macho.sections.virtual_address | long | Section List virtual address. | | ||
| macho.sections.virtual_size | long | Section List virtual size. | | ||
| macho.sections.entropy | float | Shannon entropy calculation from the section. | | ||
| macho.sections.chi2 | float | Chi-square probability distribution of the section. | | ||
| macho.page_size | long | Page size of the file. | | ||
| macho.cdhash | keyword | Code Digest (CD) SHA256 hash of the first 20-bytes of the file. | | ||
|
||
|
||
**Stage 1** | ||
|
||
[New `macho.yml` candidate](macho/macho.yml)] | ||
|
||
<!-- | ||
Stage 3: Add or update all remaining field definitions. The list should now be exhaustive. The goal here is to validate the technical details of all remaining fields and to provide a basis for releasing these field definitions as beta in the schema. Use GitHub code blocks with yml syntax formatting. | ||
--> | ||
|
||
## Usage | ||
|
||
**Stage 1** | ||
|
||
In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures. | ||
|
||
As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, libraries, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family. | ||
|
||
As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor (this is for Windows malware, but the concept is the same). | ||
|
||
## Source data | ||
|
||
**Stage 1** | ||
|
||
This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms. | ||
|
||
* [VirusTotal API](https://developers.virustotal.com/v3.0/reference) | ||
* [Emerson FSF](https://github.com/EmersonElectricCo/fsf) | ||
* [Target Strelka](https://github.com/target/strelka) | ||
* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) | ||
* [LIEF Analysis Library](https://lief.quarkslab.com/doc/latest/api/python/macho.html) | ||
|
||
peasead marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<!-- | ||
Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list. | ||
--> | ||
|
||
<!-- | ||
Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting. | ||
--> | ||
|
||
<!-- | ||
Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2. | ||
--> | ||
|
||
## Scope of impact | ||
|
||
**Stage 2** | ||
|
||
There should be no breaking changes, depreciation strategies, or significant refactoring as this is creating a sub-field for the existing `file.` fieldset. | ||
|
||
While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields. | ||
|
||
<!-- | ||
Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into: | ||
* Ingestion mechanisms (e.g. beats/logstash) | ||
* Usage mechanisms (e.g. Kibana applications, detections) | ||
* ECS project (e.g. docs, tooling) | ||
The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each. | ||
--> | ||
|
||
## Concerns | ||
|
||
ebeahan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<!-- | ||
Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake. | ||
--> | ||
|
||
<!-- | ||
Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns. | ||
--> | ||
|
||
<!-- | ||
Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate the risk of churn and instability by resolving outstanding concerns. | ||
--> | ||
|
||
<!-- | ||
Stage 4: Document any new concerns and their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed. | ||
--> | ||
|
||
## Real-world implementations | ||
|
||
<!-- | ||
Stage 4: Identify at least one real-world, production-ready implementation that uses these updated field definitions. An example of this might be a GA feature in an Elastic application in Kibana. | ||
--> | ||
|
||
## People | ||
|
||
The following are the people that consulted on the contents of this RFC. | ||
|
||
* @peasead | author | ||
* @devonakerr | sponsor | ||
* @dcode, @peasead | subject matter expert | ||
|
||
## References | ||
|
||
<!-- Insert any links appropriate to this RFC in this section. --> | ||
|
||
### RFC Pull Requests | ||
|
||
<!-- An RFC should link to the PRs for each of it stage advancements. --> | ||
|
||
* Stage 1: https://github.com/elastic/ecs/pull/1097 | ||
|
||
<!-- | ||
* Stage 1: https://github.com/elastic/ecs/pull/NNN | ||
... | ||
--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
[[ecs-macho-ussage]] | ||
=== Mach-O Usage | ||
|
||
--Description-- | ||
|
||
[discrete] | ||
=== Mach-O Field Details | ||
| Field | Description | Level | | ||
| ---- | ---- | ----------- | | ||
| macho.cpu | CPU information for the file. | extended | | ||
| ... | ... | ... | | ||
| ... | ... | ... | | ||
| ... | ... | ... | | ||
|
||
[discrete] | ||
=== Field Reuse | ||
The `macho` fields are expected to be nested at: `dll.macho`, `file.macho`, `process.macho`. | ||
|
||
Note also that the `macho` fields are not expected to be used directly at the root of the events. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
--- | ||
- name: macho | ||
title: Mach-O file information. | ||
group: 2 | ||
description: > | ||
These fields contain macOS Mach Object (Mach-O) metadata. | ||
type: group | ||
reusable: | ||
top_level: false | ||
expected: | ||
- file | ||
- process | ||
fields: | ||
|
||
- name: cpu.architecture | ||
description: CPU architecture target for the file. | ||
type: keyword | ||
level: extended | ||
example: 64-bit | ||
|
||
- name: cpu.byte_order | ||
description: CPU byte order for the file. | ||
type: keyword | ||
level: extended | ||
example: Little endian | ||
|
||
- name: cpu.subtype | ||
description: CPU subtype for the file. | ||
type: keyword | ||
level: extended | ||
example: ARM (all) 64-bit | ||
|
||
- name: cpu.type | ||
description: CPU type for the file. | ||
type: keyword | ||
level: extended | ||
example: ARM 64-bit | ||
|
||
- name: headers | ||
level: extended | ||
description: Header information for the file. | ||
type: nested | ||
|
||
- name: headers.commands.number | ||
description: Number of load commands for the Mach-O header. | ||
type: long | ||
level: extended | ||
example: 23 | ||
|
||
- name: headers.commands.size | ||
description: Size of load commands of the Mach-O header. | ||
type: long | ||
level: extended | ||
format: bytes | ||
example: 3888 | ||
|
||
- name: headers.commands.type | ||
description: Type of the load commands for the Mach-O header. | ||
type: keyword | ||
level: extended | ||
example: LC_SYMTAB, 0x2c | ||
|
||
- name: headers.magic | ||
description: Magic field of the Mach-O header. | ||
type: keyword | ||
level: extended | ||
example: 0xfeedfacf | ||
|
||
- name: headers.flags | ||
description: Flags set in the Mach-O header. | ||
type: keyword | ||
level: extended | ||
example: TWOLEVEL, 0x4000000 | ||
|
||
- name: segments | ||
level: extended | ||
description: Segment information for the file. | ||
type: nested | ||
|
||
- name: segments.vmaddr | ||
description: Memory address of this segment. | ||
type: keyword | ||
level: extended | ||
example: 0x0 | ||
|
||
- name: segments.name | ||
description: Name of this segment. | ||
type: keyword | ||
level: extended | ||
example: __TEXT, __DATA, __IMPORT | ||
|
||
- name: segments.vmsize | ||
description: Memory size of this segment. | ||
type: keyword | ||
level: extended | ||
example: 0x4c000 | ||
|
||
- name: segments.fileoff | ||
description: File offset of this segment. | ||
type: keyword | ||
level: extended | ||
example: 0x0 | ||
|
||
- name: segments.filesize | ||
description: Amount of memory to map from the file. | ||
type: keyword | ||
level: extended | ||
example: 0x4c000 | ||
|
||
- name: segments.sections | ||
level: extended | ||
description: Section information for the segment of the file. | ||
type: flattened | ||
|
||
- name: segments.offset | ||
description: Offset of the segment. | ||
type: long | ||
format: bytes | ||
level: extended | ||
example: 0 | ||
|
||
- name: segments.size | ||
description: Segment limit size. | ||
type: long | ||
format: bytes | ||
level: extended | ||
example: 123456 | ||
|
||
- name: segments.flags | ||
description: Segment flags. | ||
type: keyword | ||
level: extended | ||
example: 0x0 | ||
|
||
- name: cdhash | ||
peasead marked this conversation as resolved.
Show resolved
Hide resolved
|
||
description: Code Digest (CD) SHA256 hash of the first 20-bytes of the file. | ||
type: keyword | ||
level: extended | ||
example: 2035094a7065b29421e7a51f51db9bd61807c3628f210b1f8e667235777dc592 | ||
|
||
- name: page_size | ||
description: Page size of the file. | ||
type: long | ||
format: bytes | ||
level: extended | ||
example: 4096 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewstucki I did some cross-checking between the proposed field list here and your work in elastic/beats#24195. I pulled those mappings for
macho
into a gist for easier reference.100% alignment isn't necessary at all at this point. However, there are enough differences between that approach and what's captured here I think it's worth raising.
@peasead @dcode @devonakerr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have since updated this PR with a proposed layout that aligns sections and segments with ELF and PE (which doesn't have segments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @dcode.
There's still some large differences between the implementation in elastic/beats#24195 and what's proposed here.
Again, not something we need to completely resolve now. At a minimum, let's capture the difference in approaches as a
Concern
to make sure we revisit the topic in the next stage's PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewstucki the strategy I'm suggesting here is splitting out fat binary components that can be unified by the same file metadata (namely hashes or other source unique id's). That will allow analyzing all binaries by their sections, segments, symbols, etc in a seamless way, regardless of platform or if their multi-platform. Do you think that can work for your implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pining to see if there was a resolution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcode sorry, just caught up on this. To clarify, you're talking about essentially emitting an event per cpu architecture? I'm not completely against that, but it does seem like a pretty artificial constraint to model something that is, by its nature, a single entity (i.e. a multi-arch mach-o file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a use-case example: let's say we're triggering an alert on a malicious binary that has been compiled for both the new m1 as well as older, intel-based chipsets and we want to add in object-level data about the file. Would we emit two separate alerts, one for each architecture? Seems kind of odd.