Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Synchronous EQL querying REST API #49634

Closed
colings86 opened this issue Nov 27, 2019 · 12 comments
Closed

Create Synchronous EQL querying REST API #49634

colings86 opened this issue Nov 27, 2019 · 12 comments
Labels
:Analytics/EQL EQL querying

Comments

@colings86
Copy link
Contributor

colings86 commented Nov 27, 2019

The first mode of execution for EQL queries will be running ad hoc EQL queries against historical data (i.e. running the query over large amounts of data already stored in an index in a single run). For this issue we will make the API a synchronous request/response where the execution of the query will complete before returning the response. In a later issue we will address long running EQL queries and explore converting this to an asynchronous API.

Request

Parameters on the request should be (note that we can probably define sensible defaults for everything except the index and the rule):

  • Index/index pattern/alias (normal index definition and expansion, including wildcard expansion options and ignore_throttled)
  • Narrowing query (Using ES Query DSL) - allowing the user to select a subset of the index on which to run the rule. Defaults to null (no query)
  • EQL rule to run
  • Size of response? - To limit the number of results to hold in memory and return to the user and to allow pagination. Defaults to 50?
  • Search_after for join key? - to enable scrolling through results when there are more than one page, the value for this search_after would be the join value for which all previous join values should be excluded. Defaults to null.
  • Field to use as timestamp. Defaults to @timestamp
  • Field to use as event type (process, file, network etc.). Defaults to event.type
  • Field to use as an implicit join key - This defines a field that will be implicitly added as the first join key. This can be used to prevent sequences and join matching across e.g. edge nodes if the rule should consider data from edges nodes completely separate. The default for this field would be null so by default we would only use the join keys specified in the EQL rule. This option will be useful for the Endpoint use case since we need to be able to run the same rules on Elasticsearch as on the Endpoints but when querying the endpoints, each endpoint is considered individually so we will need some control outside of the rule to get the same behaviour in Elasticsearch.

Note parameter names are not intended to be final suggestions

Example minimal request:

GET index-pattern-*/_eql/search?sync_search_threshold=5s
{
  “rule”: “””
              sequence with maxspan=5h 
              [file where user != ‘SYSTEM’ by file_path]
              [process where user = ‘SYSTEM’ by process_path]
              ”””
}

Example request with all options:

GET index-pattern-*/_eql/search?sync_search_threshold=5s
{
  “query”: {
    “match” : {
      “foo”: “bar”
    }
  },
  “timestamp_field”: “@timestamp”,
  “event_type_field”: “event.type”,
  “implicit_join_key_field”: “device.id”,
  “size”: 100,
  "search_after": [ "device-20184", "/user/local/foo.exe", "2019-11-26T00:45:43.542" ]
  “rule”: “””
            sequence with maxspan=5h 
              [file where user != ‘SYSTEM’ by file_path]
              [process where user = ‘SYSTEM’ by process_path]
          ”””
}

Response

Although the response does not need to be tabular, it is much easier for UIs and users to consume the results if the response is easily converted to a table.

Information required in response:

  • Number of results returned
  • Indication whether there are more results?
  • Indication if these are partial results (we will need to decide if we want to support interim results and/or if we want to return results if a shard fails on one of the searches
  • Rule results

Information required for each rule result:

  • Join values (if applicable)
  • Events that make up the result

Current format of results

For EQL queries without pipes, the results of an EQL query are always a flat list of events. This means that if the query is a sequence the ordering of the events in the results defines the sequence rather than the sequence being defined by structure. For example if the query was looking for file events followed by a process event the results would look like the following:

  1. File event
  2. Process event
  3. File event
  4. Process event
  5. File event
  6. Process event

From the list above you can see that every 2 events make up an instance of the sequence we are looking for. The downside here is that the client needs to understand the query being run to be able to understand the results. We will probably need to support this style of results output in order to fit in with the way that the endpoint SMP Server currently uses EQL. Note that the SMP Server currently pushes the understanding of the sequences to the user (i.e. it shows the flat events output as returned) for cases where the query is defined by the user. For cases where the server itself defines the EQL (such as in the resolver view) the server has implicit knowledge of what it's asking for so knows how to interpret the results.

Alternatives to current result output

For clients like Kibana (and probably SIEM) it would be better if the client does not need to understand the query in order to interpret the results. The difference here compared with the SMP server is that in Kibana the user will define an arbitrary EQL query and expect Kibana to know how to render it in a way that makes sense. This means that Kibana should not have to understand the query (since we don’t want to have to add a query parser in Kibana as well as ES) but does need the results in a generic understandable form. If sequences are defined as structure Kibana can identify sequences without understanding the query itself (it just needs to understand it might get sequences back containing 1 or more events each). Another option would be to have a “sequence group id” field in the response for each event so events in the same sequence can be matched without having to have explicit response structure.

The endgame CLI client also has the option to define --flat --columns which pivot the result data into a table form with the specified columns. This may also be something we would want to support since it will put the results into a much more consumable form for clients like Kibana and is the kind of operation analysts will naturally reach for following the search anyway.

@colings86 colings86 added the :Analytics/EQL EQL querying label Nov 27, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/EQL)

@stacey-gammon
Copy link
Contributor

@wylieconlon @chrisdavies -- would be interested to hear your thoughts on the last section Alternatives to current result output in relation to Lens and suggested visualization types. Is there a visualization type we could suggest if the return structure was formatted a certain way? Maybe something like nested tables? A cell in a table actually contains a whole table itself? Like the result of a sequence query could be something like this:

Screen Shot 2019-12-02 at 2 26 00 PM

or something like this:

Screen Shot 2019-12-02 at 2 26 07 PM

Note that a row can belong in multiple sequences, hence the sequence_id column is an array type, not a string type. I feel like the first structure might be easier for suggesting visualizations? Especially if we want a generic data table layer like we were talking about, I think we probably don't want to do something like, if the table has a column named "sequenceIds", suggest this visualization.

This would probably be a new visualization type as well... maybe something like a table where each row is a sequence but it has collapsible sub rows (kind of like that EAH demo from RBC)?

@colings86
Copy link
Contributor Author

colings86 commented Dec 4, 2019

For the JSON response of the API, I am leaning towards the response providing a flat list of events with a field for sequence_id. The reason for this is that it will be easily compatible with the current output of results from Endpoint, it fits into the way that SIEM is likely to want to display the results and the data can be used to create the kinds of nested tables that @stacey-gammon mentions above since all the information will be present.

As for the fact that an event can belong in multiple sequences, one solution to this would be to duplicate the event rather than make the sequence_id field multi-valued.

@rw-access How does the current EQL implementation output results where the sequences overlap? does it duplicate the events that are shared by multiple sequences in the results?

@rw-access
Copy link
Contributor

rw-access commented Dec 4, 2019

Sorry for jamming several replies in one comment. Buckle in.

Field to use as an implicit join key - This defines a field that will be implicitly added as the first join key.

What's the rationale for making this implicit vs explicit? One thing I observed in the POC is that the order of the join keys can impact total search time significantly, because of the search_after skipping. For instance, using endpoint_id as the last key took one query 40 minutes, but when I put it as the first key, it was ~15-20% complete after four hours. I eventually just canceled it. If we guess this ordering wrong for implicit join keys, we could degrade performance.

Field to use as event type (process, file, network etc.). Defaults to event.type

I still don't think we should only allow this to be a specific field, because it might not be flexible enough. If we do, I think event.category is a better default with how we use it in ECS now. This is another approach that is flexible and reusable:

{
  "event_mapping": {
    "file" : {
      "index": "endgame-file-*"
      },
     "process": {
       "index": ["endgame-*"],
       "filter": "event.category == 'process'"
     }
  }
}

For EQL queries without pipes, the results of an EQL query are always a flat list of events
This experience is different depending on where you view it. Within all of the EQL engines, each result is always an array of events. For single event queries, that size is always 1, so it's easy to flatten. For sequences or joins, the length of the array mirrors that of the sequence. It's not pipes that turns it into an array, but it's sequences or joins.

Within the Endgame platform, EQL was a bit of an afterthought, so for I had to flatten every array regardless, and added some fields so that arrays could be reassembled, with each event as a separate "card", similar to how kibana presents data. I think we'll have to brainstorm more what the best representation is. This means that if an event is in two sequences (more below), then it is outputted twice.

The endgame CLI client also has the option to define --flat --columns which pivot the result data into a table form with the specified columns. This may also be something we would want to support since it will put the results into a much more consumable form for clients like Kibana and is the kind of operation analysts will naturally reach for following the search anyway.

This client still operates on the flattened results, with each event as a separate row. If you display columns across different data types, you end up with a sparse matrix.

@rw-access How does the current EQL implementation output results where the sequences overlap? does it duplicate the events that are shared by multiple sequences in the results?

Good question. Overlaps are technically possible but only for events that could satisfy multiple positions in a sequence. But you won't find one event in the same position for two separate sequences (with one undocumented exception). For reference, https://eql.readthedocs.io/en/latest/query-guide/implementation.html#sequences

For instance, this sequence will have no overlaps because there are no events that satisfy file where true and process where true

sequence
 [file where true]
 [process where true]

But this sequence would link each process to its first child. Since every process (minus the initial one at the of the chain or ones without children) is both a parent and a child, so it'll be in two sequences:

sequence
  [process where true] by pid
  [process where true] by pid

If you have a lineage of A -> B -> C -> D, then you'll see sequences for (A, B), (B, C), (C, D).

In my opinion, the most clear representation of sequences was the first picture that @stacey-gammon showed:
Screen Shot 2019-12-02 at 2 26 00 PM

If you went with the second view, sequence_ids needs to be updated to reflect {"sequence_id": 1, "position": 0}. This doesn't help when events are intermingled. If you had events in this order:

_id ppid pid
a 0 4
b 4 8
c 4 12
d 8 16
e 12 20
f 16 24

Then you end up with results (a, b), (b, d), (c, e), (d, f). That's tricky if you require each event to only be shown once. Also, note that the join key is different for each pair.

As for the fact that an event can belong in multiple sequences, one solution to this would be to duplicate the event rather than make the sequence_id field multi-valued.
Agreed, otherwise you end up with this undecipherable table:

sequence (id, pos, join key) _id ppid pid
(0, 0, 4) a 0 4
(0, 1, 4) (1, 0, 8) b 4 8
(2, 0, 12) c 4 12
(1, 1, 8) (3, 0, 16) d 8 16
(2, 1, 12) e 12 20
(3, 1, 16) f 16 24

@colings86
Copy link
Contributor Author

What's the rationale for making this implicit vs explicit?

The rationale is that we want the same rule as written to run against both Elasticsearch and Endpoint so we need some way to replicate the per endpoint querying in Elasticsearch without needing to change the rule that's run between the endpoint and Elasticsearch.

I still don't think we should only allow this to be a specific field, because it might not be flexible enough. If we do, I think event.category is a better default with how we use it in ECS now.

I think it will be flexible enough, especially when combined with #49713. The ability to create a constant field in the index will allow users to effectively do the reverse of your suggestion and will enable users to query over many indexes whilst still efficiently filtering on the event type. The problem with your suggestion is that its another kinda of mapping which needs to be stored and maintained which will make the users experience (especially whilst users are learning the feature) more complicated.

This means that if an event is in two sequences (more below), then it is outputted twice.

This will actually make the response much easier since each "row" will only belong to a single sequence

@colings86
Copy link
Contributor Author

colings86 commented Dec 10, 2019

There seems to be a preference for defining sequences in structure rather than a flat list of events with a sequence id. This was also shared by @tsg when I spoke to him the other day about how he would use EQL.

@scunningham would Endpoint be ok with getting back structured sequences in the response from EQL in Elasticsearch (essentially option 1 or 2 below) or would it need a flat event structure like it currently has for the endpoints (essentially option 3 below)?

One thing to note is that EQL results are not always sequences, a result can be made up of 1 event or multiple events. To cope with this we can either define the response format as if every result has multiple events (so a result contains an array of events) and the array will only contain a single event if the query does not use sequence or join, or we can define the response so the result can contain different types of payload; event or sequence. The former has the advantage that clients has one kind of response to process and can process it in the same way every time.

To help us make progress below are some examples of responses in the different forms.

Sequences as structure in the response

Option 1 - Same format for sequence and non-sequence results

Example 1 - non-sequence query

Query:

process where process.name = "interesting.exe"

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 1000,
            "relation" : "eq"
        },
        "hits" : [
            {
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "0",
                        "_source" : {
                            "date" : "2009-11-15T14:12:12",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                ...
                            },
                            "user": {
                                "name": "eric",
                                ...
                            },
                            ...
                        }
                    }
                ]
            },
            {
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "6",
                        "_source" : {
                            "date" : "2009-11-15T14:12:56",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                ...
                            },
                            "user": {
                                "name": "bob",
                                ...
                            },
                            ...
                        }
                    }
                ]
            },
            {
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "135",
                        "_source" : {
                            "date" : "2009-11-15T17:24:45",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                ...
                            },
                            "user": {
                                "name": "alice",
                                ...
                            },
                            ...
                        }
                    }
                ]
            },
            ...
        ]
    }
}

Example 2 - sequence query

Query:

sequence by pid
[process where process.name = "interesting.exe"]
[network where true]

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 100,
            "relation" : "eq"
        },
        "hits" : [
            {
                "join_keys": [ "4021" ],
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "0",
                        "_source" : {
                            "date" : "2009-11-15T14:12:12",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 4021,
                                ...
                            },
                            "user": {
                                "name": "eric",
                                ...
                            },
                            ...
                        }
                    },
                    {
                        "_index" : "my_index",
                        "_id" : "2",
                        "_source" : {
                            "date" : "2009-11-15T14:12:13",
                            "event": {
                                "type": "network",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 4021,
                                ...
                            },
                            "network":{
                                "port": 50392
                            }
                            ...
                        }
                    }
                ]
            },
            {
                "join_keys": [ "325" ],
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "6",
                        "_source" : {
                            "date" : "2009-11-15T14:12:56",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 325,
                                ...
                            },
                            "user": {
                                "name": "bob",
                                ...
                            },
                            ...
                        }
                    },
                    {
                        "_index" : "my_index",
                        "_id" : "20",
                        "_source" : {
                            "date" : "2009-11-15T14:13:08",
                            "event": {
                                "type": "network",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 325,
                                ...
                            },
                            "network":{
                                "port": 50392
                            }
                            ...
                        }
                    }
                ]
            },
            {
                "join_keys": [ "200" ],
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "135",
                        "_source" : {
                            "date" : "2009-11-15T17:24:45",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 200,
                                ...
                            },
                            "user": {
                                "name": "alice",
                                ...
                            },
                            ...
                        }
                    },
                    {
                        "_index" : "my_index",
                        "_id" : "136",
                        "_source" : {
                            "date" : "2009-11-15T17:24:45",
                            "event": {
                                "type": "network",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 200,
                                ...
                            },
                            "network":{
                                "port": 50392
                            }
                            ...
                        }
                    }
                ]
            },
            ...
        ]
    }
}

Option 2 - Different format for sequence and non-sequence results

Example 1 - non-sequence query

Query:

process where process.name = "interesting.exe"

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 1000,
            "relation" : "eq"
        },
        "hits" : [
            {
                "event": {
                    "_index" : "my_index",
                    "_id" : "0",
                    "_source" : {
                        "date" : "2009-11-15T14:12:12",
                        "event": {
                            "type": "process",
                            ...
                        },
                        "process": {
                            "name": "interesting.exe",
                            ...
                        },
                        "user": {
                            "name": "eric",
                            ...
                        },
                        ...
                    }
                }
            },
            {
                "event": {
                    "_index" : "my_index",
                    "_id" : "6",
                    "_source" : {
                        "date" : "2009-11-15T14:12:56",
                        "event": {
                            "type": "process",
                            ...
                        },
                        "process": {
                            "name": "interesting.exe",
                            ...
                        },
                        "user": {
                            "name": "bob",
                            ...
                        },
                        ...
                    }
                }
            },
            {
                "event": {
                    "_index" : "my_index",
                    "_id" : "135",
                    "_source" : {
                        "date" : "2009-11-15T17:24:45",
                        "event": {
                            "type": "process",
                            ...
                        },
                        "process": {
                            "name": "interesting.exe",
                            ...
                        },
                        "user": {
                            "name": "alice",
                            ...
                        },
                        ...
                    }
                }
            },
            ...
        ]
    }
}

Example 2 - sequence query

Query:

sequence by pid
[process where process.name = "interesting.exe"]
[network where true]

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 100,
            "relation" : "eq"
        },
        "hits" : [
            {
                "sequence": {
                    "join_keys": [ "4021" ],
                    "events":[
                        {
                            "_index" : "my_index",
                            "_id" : "0",
                            "_source" : {
                                "date" : "2009-11-15T14:12:12",
                                "event": {
                                    "type": "process",
                                    ...
                                },
                                "process": {
                                    "name": "interesting.exe",
                                    "pid": 4021,
                                    ...
                                },
                                "user": {
                                    "name": "eric",
                                    ...
                                },
                                ...
                            }
                        },
                        {
                            "_index" : "my_index",
                            "_id" : "2",
                            "_source" : {
                                "date" : "2009-11-15T14:12:13",
                                "event": {
                                    "type": "network",
                                    ...
                                },
                                "process": {
                                    "name": "interesting.exe",
                                    "pid": 4021,
                                    ...
                                },
                                "network":{
                                    "port": 50392
                                }
                                ...
                            }
                        }
                    ]
                }
            },
            {
                "sequence": {
                    "join_keys": [ "325" ],
                    "events":[
                        {
                            "_index" : "my_index",
                            "_id" : "6",
                            "_source" : {
                                "date" : "2009-11-15T14:12:56",
                                "event": {
                                    "type": "process",
                                    ...
                                },
                                "process": {
                                    "name": "interesting.exe",
                                    "pid": 325,
                                    ...
                                },
                                "user": {
                                    "name": "bob",
                                    ...
                                },
                                ...
                            }
                        },
                        {
                            "_index" : "my_index",
                            "_id" : "20",
                            "_source" : {
                                "date" : "2009-11-15T14:13:08",
                                "event": {
                                    "type": "network",
                                    ...
                                },
                                "process": {
                                    "name": "interesting.exe",
                                    "pid": 325,
                                    ...
                                },
                                "network":{
                                    "port": 50392
                                }
                                ...
                            }
                        }
                    ]
                }
            },
            {
                "sequence": {
                    "join_keys": [ "200" ],
                    "events":[
                        {
                            "_index" : "my_index",
                            "_id" : "135",
                            "_source" : {
                                "date" : "2009-11-15T17:24:45",
                                "event": {
                                    "type": "process",
                                    ...
                                },
                                "process": {
                                    "name": "interesting.exe",
                                    "pid": 200,
                                    ...
                                },
                                "user": {
                                    "name": "alice",
                                    ...
                                },
                                ...
                            }
                        },
                        {
                            "_index" : "my_index",
                            "_id" : "136",
                            "_source" : {
                                "date" : "2009-11-15T17:24:45",
                                "event": {
                                    "type": "network",
                                    ...
                                },
                                "process": {
                                    "name": "interesting.exe",
                                    "pid": 200,
                                    ...
                                },
                                "network":{
                                    "port": 50392
                                }
                                ...
                            }
                        }
                    ]
                }
            },
            ...
        ]
    }
}

Option 3 - Results as flat list of events with sequence id

Example 1 - non-sequence query

Query:

process where process.name = "interesting.exe"

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 1000,
            "relation" : "eq"
        },
        "hits" : [
            {
                "_index" : "my_index",
                "_id" : "0",
                "_sequence_id": 0,
                "_source" : {
                    "date" : "2009-11-15T14:12:12",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        ...
                    },
                    "user": {
                        "name": "eric",
                        ...
                    },
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "6",
                "_sequence_id": 1,
                "_source" : {
                    "date" : "2009-11-15T14:12:56",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        ...
                    },
                    "user": {
                        "name": "bob",
                        ...
                    },
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "135",
                "_sequence_id": 2,
                "_source" : {
                    "date" : "2009-11-15T17:24:45",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        ...
                    },
                    "user": {
                        "name": "alice",
                        ...
                    },
                    ...
                }
            },
            ...
        ]
    }
}

Example 2 - sequence query

Query:

sequence by pid
[process where process.name = "interesting.exe"]
[network where true]

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 100,
            "relation" : "eq"
        },
        "hits" : [
            {
                "_index" : "my_index",
                "_id" : "0",
                "_sequence_id": 0,
                "_source" : {
                    "date" : "2009-11-15T14:12:12",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        "pid": 4021,
                        ...
                    },
                    "user": {
                        "name": "eric",
                        ...
                    },
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "2",
                "_sequence_id": 0,
                "_source" : {
                    "date" : "2009-11-15T14:12:13",
                    "event": {
                        "type": "network",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        "pid": 4021,
                        ...
                    },
                    "network":{
                        "port": 50392
                    }
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "6",
                "_sequence_id": 1,
                "_source" : {
                    "date" : "2009-11-15T14:12:56",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        "pid": 325,
                        ...
                    },
                    "user": {
                        "name": "bob",
                        ...
                    },
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "20",
                "_sequence_id": 1,
                "_source" : {
                    "date" : "2009-11-15T14:13:08",
                    "event": {
                        "type": "network",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        "pid": 325,
                        ...
                    },
                    "network":{
                        "port": 50392
                    }
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "135",
                "_sequence_id": 2,
                "_source" : {
                    "date" : "2009-11-15T17:24:45",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        "pid": 200,
                        ...
                    },
                    "user": {
                        "name": "alice",
                        ...
                    },
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "136",
                "_sequence_id": 2,
                "_source" : {
                    "date" : "2009-11-15T17:24:45",
                    "event": {
                        "type": "network",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        "pid": 200,
                        ...
                    },
                    "network":{
                        "port": 50392
                    }
                    ...
                }
            },
            ...
        ]
    }
}

@colings86
Copy link
Contributor Author

Another option that we discussed today:

Option 4 - Define results type at top level

Example 1 - non-sequence query

Query:

process where process.name = "interesting.exe"

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 1000,
            "relation" : "eq"
        },
        "events" : [
            {
                "_index" : "my_index",
                "_id" : "0",
                "_source" : {
                    "date" : "2009-11-15T14:12:12",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        ...
                    },
                    "user": {
                        "name": "eric",
                        ...
                    },
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "6",
                "_source" : {
                    "date" : "2009-11-15T14:12:56",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        ...
                    },
                    "user": {
                        "name": "bob",
                        ...
                    },
                    ...
                }
            },
            {
                "_index" : "my_index",
                "_id" : "135",
                "_source" : {
                    "date" : "2009-11-15T17:24:45",
                    "event": {
                        "type": "process",
                        ...
                    },
                    "process": {
                        "name": "interesting.exe",
                        ...
                    },
                    "user": {
                        "name": "alice",
                        ...
                    },
                    ...
                }
            },
            ...
        ]
    }
}

Example 2 - sequence query

Query:

sequence by pid
[process where process.name = "interesting.exe"]
[network where true]

Response:

{
    "took" : 5,
    "timed_out" : false,
    "hits" : {
        "total" : {
            "value" : 100,
            "relation" : "eq"
        },
        "sequences" : [
            {
                "join_keys": [ "4021" ],
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "0",
                        "_source" : {
                            "date" : "2009-11-15T14:12:12",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 4021,
                                ...
                            },
                            "user": {
                                "name": "eric",
                                ...
                            },
                            ...
                        }
                    },
                    {
                        "_index" : "my_index",
                        "_id" : "2",
                        "_source" : {
                            "date" : "2009-11-15T14:12:13",
                            "event": {
                                "type": "network",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 4021,
                                ...
                            },
                            "network":{
                                "port": 50392
                            }
                            ...
                        }
                    }
                ]
            },
            {
                "join_keys": [ "325" ],
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "6",
                        "_source" : {
                            "date" : "2009-11-15T14:12:56",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 325,
                                ...
                            },
                            "user": {
                                "name": "bob",
                                ...
                            },
                            ...
                        }
                    },
                    {
                        "_index" : "my_index",
                        "_id" : "20",
                        "_source" : {
                            "date" : "2009-11-15T14:13:08",
                            "event": {
                                "type": "network",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 325,
                                ...
                            },
                            "network":{
                                "port": 50392
                            }
                            ...
                        }
                    }
                ]
            },
            {
                "join_keys": [ "200" ],
                "events":[
                    {
                        "_index" : "my_index",
                        "_id" : "135",
                        "_source" : {
                            "date" : "2009-11-15T17:24:45",
                            "event": {
                                "type": "process",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 200,
                                ...
                            },
                            "user": {
                                "name": "alice",
                                ...
                            },
                            ...
                        }
                    },
                    {
                        "_index" : "my_index",
                        "_id" : "136",
                        "_source" : {
                            "date" : "2009-11-15T17:24:45",
                            "event": {
                                "type": "network",
                                ...
                            },
                            "process": {
                                "name": "interesting.exe",
                                "pid": 200,
                                ...
                            },
                            "network":{
                                "port": 50392
                            }
                            ...
                        }
                    }
                ]
            },
            ...
        ]
    }
}

@aleksmaus
Copy link
Member

aleksmaus commented Dec 16, 2019

Adding example of counts query response here based on conversation with @rw-access

{
    "timed_out": false,
    "took": 5,
    "hits": {
        "total" : {
            "value" : 100,
            "relation" : "eq"
        },
        "counts": [
            {
                "_count": 40,
                "_keys": [...],
                "_percent": 0.4223148165093,
                "_values": [...]
            },
            {
                "_count": 15,
                "_keys": [...],
                "_percent": 0.170275,
                "_values": [...]
            }, ...
        ]
    }
} 

@pcsanwald
Copy link
Contributor

To cope with this we can either define the response format as if every result has multiple events (so a result contains an array of events) and the array will only contain a single event if the query does not use sequence or join, or we can define the response so the result can contain different types of payload; event or sequence. The former has the advantage that clients has one kind of response to process and can process it in the same way every time.

to add a drive by comment, I'm in favor of the former approach, unless there is a compelling reason why a single event returned would be truly a different type of thing here. Main reason being it adds complexity to both server and client in terms of constructing and parsing response, with usually a limited benefit

@stacey-gammon
Copy link
Contributor

The reason, imo, to allow the client to differentiate between the two different types of responses (without having to parse the request) is that the user will likely want to view them differently. How will the user want to view a sequence query result vs the result of a "process where true" query? The same or different? I think the answer is differently. If the result of all non-sequence queries is a table, these queries can be used as a data source in Lens and they can view the results with all of the usual visualization types. Sequence queries are special in that you probably don't want to view the results in something like a bar chart, but more like a specialized nested table structure.

@colings86
Copy link
Contributor Author

We meet to talk about this particularly in the context of SIEM and agreed that we will go with option 4 above for the response format.

@colings86
Copy link
Contributor Author

There are aspects to this which we need to work ourt like pagination and reposnses for pipes but they are tracked in separate issues so I'm closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying
Projects
None yet
Development

No branches or pull requests

6 participants