Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double reuse of a schema doesn't succeed #959

Closed
jonathan-buttner opened this issue Aug 27, 2020 · 5 comments · Fixed by #960
Closed

Double reuse of a schema doesn't succeed #959

jonathan-buttner opened this issue Aug 27, 2020 · 5 comments · Fixed by #960
Labels
1.6.0 bug Something isn't working

Comments

@jonathan-buttner
Copy link
Contributor

jonathan-buttner commented Aug 27, 2020

Description of the problem including expected versus actual behavior:

In ECS version 1.6 process schema is reused on itself to create the parent section. If a custom schema is used to reuse process onto the custom schema, the parent fields are not included.

Steps to reproduce:

Create these files in a directory called test_schema_reuse:

custom_process.yml

---
- name: process
  title: Process
  group: 2
  short: These fields contain information about a process.
  description: >
    These fields contain information about a process.

    These fields can help you correlate metrics information with a process id/name
    from a log message.  The `process.pid` often stays in the metric itself and is
    copied to the global field for correlation.
  reusable:
    top_level: true
    expected:
      - DoubleReuse
  type: group
  fields:
    - name: test_base
      level: custom
      type: keyword
      description: Object for all custom defined fields to live in.

custom_double_reuse.yml

---
- name: DoubleReuse
  title: DoubleReuse
  group: 2
  short: double reuse example.
  description: double reuse example
    
  type: group
  fields:
    - name: process
      level: custom
      type: object
      description: >
        Process.

To make things a little easier you can short circuit the generator like so:

diff --git a/scripts/generator.py b/scripts/generator.py
index b7ae2a4..3b5140b 100644
--- a/scripts/generator.py
+++ b/scripts/generator.py
@@ -43,6 +43,9 @@ def main():
     fields = loader.load_schemas(ref=args.ref, included_files=args.include)
     cleaner.clean(fields)
     finalizer.finalize(fields)
+    ecs_helpers.yaml_dump('ecs.yml', fields)
+    import sys
+    sys.exit()
     fields = subset_filter.filter(fields, args.subset, out_dir)
     nested, flat = intermediate_files.generate(fields, os.path.join(out_dir, 'ecs'), default_dirs)

Run python scripts/generator.py --include <path to test_schema_reuse> --ref v1.6.0

Examine the output of ecs.yml:

DoubleReuse section of ecs.yml
DoubleReuse:
  field_details:
    dashed_name: DoubleReuse
    description: double reuse example
    flat_name: DoubleReuse
    name: DoubleReuse
    node_name: DoubleReuse
    short: double reuse example.
    type: group
  fields:
    process:
      field_details:
        dashed_name: DoubleReuse-process
        description: 'These fields contain information about a process.

          These fields can help you correlate metrics information with a process id/name
          from a log message.  The `process.pid` often stays in the metric itself
          and is copied to the global field for correlation.'
        flat_name: DoubleReuse.process
        intermediate: true
        name: process
        node_name: process
        original_fieldset: process
        short: These fields contain information about a process.
        type: group
      fields:
        args:
          field_details:
            dashed_name: DoubleReuse-process-args
            description: 'Array of process arguments, starting with the absolute path
              to the executable.

              May be filtered to protect sensitive information.'
            example:
            - /usr/bin/ssh
            - -l
            - user
            - 10.0.0.16
            flat_name: DoubleReuse.process.args
            ignore_above: 1024
            level: extended
            name: args
            node_name: args
            normalize:
            - array
            original_fieldset: process
            short: Array of process arguments.
            type: keyword
        args_count:
          field_details:
            dashed_name: DoubleReuse-process-args-count
            description: 'Length of the process.args array.

              This field can be useful for querying or performing bucket analysis
              on how many arguments were provided to start a process. More arguments
              may be an indication of suspicious activity.'
            example: 4
            flat_name: DoubleReuse.process.args_count
            level: extended
            name: args_count
            node_name: args_count
            normalize: []
            original_fieldset: process
            short: Length of the process.args array.
            type: long
        code_signature:
          field_details:
            dashed_name: DoubleReuse-process-code-signature
            description: These fields contain information about binary code signatures.
            flat_name: DoubleReuse.process.code_signature
            intermediate: true
            name: code_signature
            node_name: code_signature
            original_fieldset: code_signature
            short: These fields contain information about binary code signatures.
            type: group
          fields:
            exists:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-exists
                description: Boolean to capture if a signature is present.
                example: 'true'
                flat_name: DoubleReuse.process.code_signature.exists
                level: core
                name: exists
                node_name: exists
                normalize: []
                original_fieldset: code_signature
                short: Boolean to capture if a signature is present.
                type: boolean
            status:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-status
                description: 'Additional information about the certificate status.

                  This is useful for logging cryptographic errors with the certificate
                  validity or trust status. Leave unpopulated if the validity or trust
                  of the certificate was unchecked.'
                example: ERROR_UNTRUSTED_ROOT
                flat_name: DoubleReuse.process.code_signature.status
                ignore_above: 1024
                level: extended
                name: status
                node_name: status
                normalize: []
                original_fieldset: code_signature
                short: Additional information about the certificate status.
                type: keyword
            subject_name:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-subject-name
                description: Subject name of the code signer
                example: Microsoft Corporation
                flat_name: DoubleReuse.process.code_signature.subject_name
                ignore_above: 1024
                level: core
                name: subject_name
                node_name: subject_name
                normalize: []
                original_fieldset: code_signature
                short: Subject name of the code signer
                type: keyword
            trusted:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-trusted
                description: 'Stores the trust status of the certificate chain.

                  Validating the trust of the certificate chain may be complicated,
                  and this field should only be populated by tools that actively check
                  the status.'
                example: 'true'
                flat_name: DoubleReuse.process.code_signature.trusted
                level: extended
                name: trusted
                node_name: trusted
                normalize: []
                original_fieldset: code_signature
                short: Stores the trust status of the certificate chain.
                type: boolean
            valid:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-valid
                description: 'Boolean to capture if the digital signature is verified
                  against the binary content.

                  Leave unpopulated if a certificate was unchecked.'
                example: 'true'
                flat_name: DoubleReuse.process.code_signature.valid
                level: extended
                name: valid
                node_name: valid
                normalize: []
                original_fieldset: code_signature
                short: Boolean to capture if the digital signature is verified against
                  the binary content.
                type: boolean
        command_line:
          field_details:
            dashed_name: DoubleReuse-process-command-line
            description: 'Full command line that started the process, including the
              absolute path to the executable, and all arguments.

              Some arguments may be filtered to protect sensitive information.'
            example: /usr/bin/ssh -l user 10.0.0.16
            flat_name: DoubleReuse.process.command_line
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.command_line.text
              name: text
              norms: false
              type: text
            name: command_line
            node_name: command_line
            normalize: []
            original_fieldset: process
            short: Full command line that started the process.
            type: keyword
        entity_id:
          field_details:
            dashed_name: DoubleReuse-process-entity-id
            description: 'Unique identifier for the process.

              The implementation of this is specified by the data source, but some
              examples of what could be used here are a process-generated UUID, Sysmon
              Process GUIDs, or a hash of some uniquely identifying components of
              a process.

              Constructing a globally unique identifier is a common practice to mitigate
              PID reuse as well as to identify a specific process over time, across
              multiple monitored hosts.'
            example: c2c455d9f99375d
            flat_name: DoubleReuse.process.entity_id
            ignore_above: 1024
            level: extended
            name: entity_id
            node_name: entity_id
            normalize: []
            original_fieldset: process
            short: Unique identifier for the process.
            type: keyword
        executable:
          field_details:
            dashed_name: DoubleReuse-process-executable
            description: Absolute path to the process executable.
            example: /usr/bin/ssh
            flat_name: DoubleReuse.process.executable
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.executable.text
              name: text
              norms: false
              type: text
            name: executable
            node_name: executable
            normalize: []
            original_fieldset: process
            short: Absolute path to the process executable.
            type: keyword
        exit_code:
          field_details:
            dashed_name: DoubleReuse-process-exit-code
            description: 'The exit code of the process, if this is a termination event.

              The field should be absent if there is no exit code for the event (e.g.
              process start).'
            example: 137
            flat_name: DoubleReuse.process.exit_code
            level: extended
            name: exit_code
            node_name: exit_code
            normalize: []
            original_fieldset: process
            short: The exit code of the process.
            type: long
        hash:
          field_details:
            dashed_name: DoubleReuse-process-hash
            description: 'The hash fields represent different hash algorithms and
              their values.

              Field names for common hashes (e.g. MD5, SHA1) are predefined. Add fields
              for other hashes by lowercasing the hash algorithm name and using underscore
              separators as appropriate (snake case, e.g. sha3_512).'
            flat_name: DoubleReuse.process.hash
            intermediate: true
            name: hash
            node_name: hash
            original_fieldset: hash
            short: Hashes, usually file hashes.
            type: group
          fields:
            md5:
              field_details:
                dashed_name: DoubleReuse-process-hash-md5
                description: MD5 hash.
                flat_name: DoubleReuse.process.hash.md5
                ignore_above: 1024
                level: extended
                name: md5
                node_name: md5
                normalize: []
                original_fieldset: hash
                short: MD5 hash.
                type: keyword
            sha1:
              field_details:
                dashed_name: DoubleReuse-process-hash-sha1
                description: SHA1 hash.
                flat_name: DoubleReuse.process.hash.sha1
                ignore_above: 1024
                level: extended
                name: sha1
                node_name: sha1
                normalize: []
                original_fieldset: hash
                short: SHA1 hash.
                type: keyword
            sha256:
              field_details:
                dashed_name: DoubleReuse-process-hash-sha256
                description: SHA256 hash.
                flat_name: DoubleReuse.process.hash.sha256
                ignore_above: 1024
                level: extended
                name: sha256
                node_name: sha256
                normalize: []
                original_fieldset: hash
                short: SHA256 hash.
                type: keyword
            sha512:
              field_details:
                dashed_name: DoubleReuse-process-hash-sha512
                description: SHA512 hash.
                flat_name: DoubleReuse.process.hash.sha512
                ignore_above: 1024
                level: extended
                name: sha512
                node_name: sha512
                normalize: []
                original_fieldset: hash
                short: SHA512 hash.
                type: keyword
        name:
          field_details:
            dashed_name: DoubleReuse-process-name
            description: 'Process name.

              Sometimes called program name or similar.'
            example: ssh
            flat_name: DoubleReuse.process.name
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.name.text
              name: text
              norms: false
              type: text
            name: name
            node_name: name
            normalize: []
            original_fieldset: process
            short: Process name.
            type: keyword
        pe:
          field_details:
            dashed_name: DoubleReuse-process-pe
            description: These fields contain Windows Portable Executable (PE) metadata.
            flat_name: DoubleReuse.process.pe
            intermediate: true
            name: pe
            node_name: pe
            original_fieldset: pe
            short: These fields contain Windows Portable Executable (PE) metadata.
            type: group
          fields:
            architecture:
              field_details:
                dashed_name: DoubleReuse-process-pe-architecture
                description: CPU architecture target for the file.
                example: x64
                flat_name: DoubleReuse.process.pe.architecture
                ignore_above: 1024
                level: extended
                name: architecture
                node_name: architecture
                normalize: []
                original_fieldset: pe
                short: CPU architecture target for the file.
                type: keyword
            company:
              field_details:
                dashed_name: DoubleReuse-process-pe-company
                description: Internal company name of the file, provided at compile-time.
                example: Microsoft Corporation
                flat_name: DoubleReuse.process.pe.company
                ignore_above: 1024
                level: extended
                name: company
                node_name: company
                normalize: []
                original_fieldset: pe
                short: Internal company name of the file, provided at compile-time.
                type: keyword
            description:
              field_details:
                dashed_name: DoubleReuse-process-pe-description
                description: Internal description of the file, provided at compile-time.
                example: Paint
                flat_name: DoubleReuse.process.pe.description
                ignore_above: 1024
                level: extended
                name: description
                node_name: description
                normalize: []
                original_fieldset: pe
                short: Internal description of the file, provided at compile-time.
                type: keyword
            file_version:
              field_details:
                dashed_name: DoubleReuse-process-pe-file-version
                description: Internal version of the file, provided at compile-time.
                example: 6.3.9600.17415
                flat_name: DoubleReuse.process.pe.file_version
                ignore_above: 1024
                level: extended
                name: file_version
                node_name: file_version
                normalize: []
                original_fieldset: pe
                short: Process name.
                type: keyword
            imphash:
              field_details:
                dashed_name: DoubleReuse-process-pe-imphash
                description: 'A hash of the imports in a PE file. An imphash -- or
                  import hash -- can be used to fingerprint binaries even after recompilation
                  or other code-level transformations have occurred, which would change
                  more traditional hash values.

                  Learn more at https://www.fireeye.com/blog/threat-research/2014/01/tracking-malware-import-hashing.html.'
                example: 0c6803c4e922103c4dca5963aad36ddf
                flat_name: DoubleReuse.process.pe.imphash
                ignore_above: 1024
                level: extended
                name: imphash
                node_name: imphash
                normalize: []
                original_fieldset: pe
                short: A hash of the imports in a PE file.
                type: keyword
            original_file_name:
              field_details:
                dashed_name: DoubleReuse-process-pe-original-file-name
                description: Internal name of the file, provided at compile-time.
                example: MSPAINT.EXE
                flat_name: DoubleReuse.process.pe.original_file_name
                ignore_above: 1024
                level: extended
                name: original_file_name
                node_name: original_file_name
                normalize: []
                original_fieldset: pe
                short: Internal name of the file, provided at compile-time.
                type: keyword
            product:
              field_details:
                dashed_name: DoubleReuse-process-pe-product
                description: Internal product name of the file, provided at compile-time.
                example: "Microsoft\xAE Windows\xAE Operating System"
                flat_name: DoubleReuse.process.pe.product
                ignore_above: 1024
                level: extended
                name: product
                node_name: product
                normalize: []
                original_fieldset: pe
                short: Internal product name of the file, provided at compile-time.
                type: keyword
        pgid:
          field_details:
            dashed_name: DoubleReuse-process-pgid
            description: Identifier of the group of processes the process belongs
              to.
            flat_name: DoubleReuse.process.pgid
            format: string
            level: extended
            name: pgid
            node_name: pgid
            normalize: []
            original_fieldset: process
            short: Identifier of the group of processes the process belongs to.
            type: long
        pid:
          field_details:
            dashed_name: DoubleReuse-process-pid
            description: Process id.
            example: 4242
            flat_name: DoubleReuse.process.pid
            format: string
            level: core
            name: pid
            node_name: pid
            normalize: []
            original_fieldset: process
            short: Process id.
            type: long
        ppid:
          field_details:
            dashed_name: DoubleReuse-process-ppid
            description: Parent process' pid.
            example: 4241
            flat_name: DoubleReuse.process.ppid
            format: string
            level: extended
            name: ppid
            node_name: ppid
            normalize: []
            original_fieldset: process
            short: Parent process' pid.
            type: long
        start:
          field_details:
            dashed_name: DoubleReuse-process-start
            description: The time the process started.
            example: '2016-05-23T08:05:34.853Z'
            flat_name: DoubleReuse.process.start
            level: extended
            name: start
            node_name: start
            normalize: []
            original_fieldset: process
            short: The time the process started.
            type: date
        test_base:
          field_details:
            dashed_name: DoubleReuse-process-test-base
            description: Object for all custom defined fields to live in.
            flat_name: DoubleReuse.process.test_base
            ignore_above: 1024
            level: custom
            name: test_base
            node_name: test_base
            normalize: []
            original_fieldset: process
            short: Object for all custom defined fields to live in.
            type: keyword
        thread:
          field_details:
            dashed_name: DoubleReuse-process-thread
            flat_name: DoubleReuse.process.thread
            intermediate: true
            name: thread
            node_name: thread
            original_fieldset: process
            type: object
          fields:
            id:
              field_details:
                dashed_name: DoubleReuse-process-thread-id
                description: Thread ID.
                example: 4242
                flat_name: DoubleReuse.process.thread.id
                format: string
                level: extended
                name: thread.id
                node_name: id
                normalize: []
                original_fieldset: process
                short: Thread ID.
                type: long
            name:
              field_details:
                dashed_name: DoubleReuse-process-thread-name
                description: Thread name.
                example: thread-0
                flat_name: DoubleReuse.process.thread.name
                ignore_above: 1024
                level: extended
                name: thread.name
                node_name: name
                normalize: []
                original_fieldset: process
                short: Thread name.
                type: keyword
        title:
          field_details:
            dashed_name: DoubleReuse-process-title
            description: 'Process title.

              The proctitle, some times the same as process name. Can also be different:
              for example a browser setting its title to the web page currently opened.'
            flat_name: DoubleReuse.process.title
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.title.text
              name: text
              norms: false
              type: text
            name: title
            node_name: title
            normalize: []
            original_fieldset: process
            short: Process title.
            type: keyword
        uptime:
          field_details:
            dashed_name: DoubleReuse-process-uptime
            description: Seconds the process has been up.
            example: 1325
            flat_name: DoubleReuse.process.uptime
            level: extended
            name: uptime
            node_name: uptime
            normalize: []
            original_fieldset: process
            short: Seconds the process has been up.
            type: long
        working_directory:
          field_details:
            dashed_name: DoubleReuse-process-working-directory
            description: The working directory of the process.
            example: /home/alice
            flat_name: DoubleReuse.process.working_directory
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.working_directory.text
              name: text
              norms: false
              type: text
            name: working_directory
            node_name: working_directory
            normalize: []
            original_fieldset: process
            short: The working directory of the process.
            type: keyword
  schema_details:
    group: 2
    nestings:
    - DoubleReuse.process
    prefix: DoubleReuse.
    reused_here:
    - full: DoubleReuse.process
      schema_name: process
      short: These fields contain information about a process.
    root: false
    title: DoubleReuse

Notice that the name DoubleReuse-process-parent does not exist in the ecs.yml file. The initial field DoubleReuse-process-test-base does though.

The endpoint team leverages the ability to reuse process and the parent fields in custom schema for malware: https://github.com/elastic/endpoint-package/blob/master/custom_schemas/custom_process.yml#L15

This works for ecs version 1.5 because the parent fields were defined manually.

@jonathan-buttner jonathan-buttner added bug Something isn't working 1.6.0 labels Aug 27, 2020
@ebeahan
Copy link
Member

ebeahan commented Aug 31, 2020

Thanks for opening with the detailed bug report and notes @jonathan-buttner!

After a brief review, it looks like the process schema is copied under DoubleReuse before the self nesting is done during the finalizer phase. Explains why the reuse under DoubleReuse doesn't contain the process.parent nestings under DoubleReuse.process.

@jonathan-buttner
Copy link
Contributor Author

Yeah no worries, thanks for taking a look @ebeahan!

@webmat
Copy link
Contributor

webmat commented Aug 31, 2020

The fact that self-reuse doesn't get included in reuses to other field sets is intentional. It's been built like this in order to avoid having a self-referential loop: process.parent.parent.parent... and also to avoid having the upcoming new user fields (user.effective.*, user.target.*) appear in all places where the user fields are currently reused (there will be no source.user.effective.*).

If you need to reuse process and you also want *.process.parent to be present at the destination, I would recommend performing the reuse twice, as a workaround. You'll have to use the new explicit notation to get the .parent name:

expected:
  - DoubleReuse
  - { at: DoubleReuse, as: parent }

Note that this new notation has only been used for self-nesting so far. It should also work when reusing to another field set. If not, please let us know.

@jonathan-buttner
Copy link
Contributor Author

Ah thanks @webmat. I'll try the explicit notation and then close this ticket.

@ebeahan
Copy link
Member

ebeahan commented Aug 31, 2020

Thanks @webmat for the detail. Makes sense why that approach was taken.

I tinkered a bit and found the following YAML would generate the additional nesting of process underneath DoubleReuse.process, but would be named process and not parent (DoubleReuse-process-process):

expected:
  - DoubleReuse
  - { at: DoubleReuse.process, as: parent }

With a couple of adjustments in schema.finalizer.perform_reuse for the foreign reuse phase, the as value instead of the nesting schema's name will be assigned (this is what's done today for the self-nesting phase).

I'll open a PR to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.6.0 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants