Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic SQL join results of multiple queries - discussion draft #31806

Closed
lalit-satapathy opened this issue Jun 2, 2022 · 8 comments · Fixed by #32394
Closed

Generic SQL join results of multiple queries - discussion draft #31806

lalit-satapathy opened this issue Jun 2, 2022 · 8 comments · Fixed by #32394
Assignees
Labels
discuss Issue needs further discussion. Draft enhancement Team:Service-Integrations Label for the Service Integrations team

Comments

@lalit-satapathy
Copy link
Contributor

Summary

In generic SQL, there is a need to join results generated by multiple queries in sql_queries to create a single event. This will enable creating a single event for cases, where multiple individual queries are executed to extract the specific metrics.

For example:

sql_queries:
  - query: "SELECT cntr_value As 'user_connections' FROM sys.dm_os_performance_counters WHERE counter_name= 'User Connections'"
    response_format: table

  - query: "SELECT cntr_value As 'buffer_cache_hit_ratio' FROM sys.dm_os_performance_counters WHERE counter_name = 'Buffer cache hit ratio' AND object_name like '%Buffer Manager%'"
    response_format: table

The above two queries will create two separate events one for each query, currently each with a single metric. It is desired to create single event with 2 of the metrics together.

Before diving into the solution space, let’s first summarise the current behaviour of SQL module.

Current behaviour

sql_queries have two response formats table and variables.

  • In table response format, for results with X rows and Y columns as below, where Cy is column name and Vxy is the value:
C1, C2, …..  CY
V11, V12,….. V1Y
V21, V22,….. V2Y
.....
.....
VX1,VX21….. VXY

Here, X events are generated one for each row. The event is of the format: ( “C1: V11”, “C2:V12”, …. “CY:V1Y”)

  • In the variable response format, results with 2 columns and X rows:
N1,V1
N2,V2
…
...
NX,VX

Here single event is generated of the format: (“N1:V1”, “N2:V2”…..”NX:VX”)

Solution feasibility

  1. It may be possible to merge multiple table queries together if each query produce X rows. In such a case X events can be created by merging each column of the corresponding table together.
  2. It may NOT be possible to merge multiple table queries together if each query produce different number of rows.
  3. It may be tricky to merge table and variable results together, but need exploration.
  4. Merging poses a potential challenge if two results sets have overlapping column name.
  5. Merging can be seamless if two results sets have divergent set of column names.

Solution details

TBD

@ruflin
Copy link
Collaborator

ruflin commented Jun 2, 2022

If we have the variable format, my understanding is we have always key / value pairs. In this scenario, is merging still an issue?

In which scenarios do we require table? The queries you showed above seem to always return just key / value pair?

@lalit-satapathy
Copy link
Contributor Author

If we have the variable format, my understanding is we have always key / value pairs. In this scenario, is merging still an issue?

In which scenarios do we require table? The queries you showed above seem to always return just key / value pair?

Merging only variable formats in a sql_queries is fine (only need to consider non-overlapping of names)
But variable formats queries are restrictive, since the onus is on the user to write the query in two-column table format. Variable format rejects any other multi-column queries. These types of queries are rather non-straightforward to write.

@lalit-satapathy
Copy link
Contributor Author

Solution details:

Merging metrics from multiple sql_queries, will be enabled only if specific criteria below is met.

Additional config will be added (request_merge default: false) will be added, using which users can request for a merge of generated metrics.

However, this is only a request to merge and will be done, if the table queries produce single row. Multiple row merge is currently not supported. This criteria however, is sufficient for our usage currently.

If the condition is not meet, metrics will be generated as-is and merge request will be ignored.

The merge process combines all the metrics generated into a single event.

Example Merging of 3 queries, "table1", "table2" & "variable1"
as below:

table1
C1, C2
V1, V2

table2
C3, C4
V3, V4

and

variable1
C5, V5
C6, V6

will produce the event:

(C1:V1, C2:V2, C3:V3, C4:V4, C5:V5, C6:V6)

Conditions of merge: All of the table queries produce only a single row.

Related metrics change:

  • Generated sql.query will have either have a fixed string as "MULTIPLE" or concatenation of the individual queries. (to be finalised)
  • overlapping key names will be ignored, ONLY the first overlapping key-name will be used. Example merging of k1:v1, k1:v2 and k1:v3 will produce k1:v1
  • merging table and variable response format is currently being considered, unless some complexities arise in the same.

@ruflin @jsoriano @sayden

@ruflin
Copy link
Collaborator

ruflin commented Jun 7, 2022

If the condition is not meet, metrics will be generated as-is and merge request will be ignored.

Could we instead throw / log an error instead? We should let any dev now as soon as possible that something is not as expected?

@lalit-satapathy
Copy link
Contributor Author

If the condition is not meet, metrics will be generated as-is and merge request will be ignored.

Could we instead throw / log an error instead? We should let any dev now as soon as possible that something is not as expected?

Yes, we can flag an error/warning if merging was skipped and any other conditions like overlapping key names.

@jsoriano
Copy link
Member

jsoriano commented Jun 7, 2022

+1 to throwing an error if something configured cannot work.

The request_merge approach looks good, but I would change the name of the setting (naming is complicated 🙂).

I would also like to see an example with this setting, I wonder if you are planning to add this setting at the module level or at the query level. Settings at the query-level would give more control, but it is probably better to have this at the module level, so all the events generated for a data stream have similar sets of fields. Different data streams/modules can be used if different sets of events are wanted.
And thinking about this, I wonder if request_merge: true should be the default when multiple queries are defined.

Having the setting at the module level it would be like this.

request_merge: true
sql_queries:
  - query: ...
    response_format: table
  - query: ...
    response_format: table

At the query level:

sql_queries:
  - query: ...
    response_format: table
    request_merge: true
  - query: ...
    response_format: table
    request_merge: true

Another option I see would be to add a new response format for tables with one row, and when everything in sql_queries are variables or one-rows, they are merged together. If any time the response format is not the expected one, it fails. But this is somehow implicit, and "Explicit is better than implicit".

sql_queries:
  - query: ...
    response_format: onerow
  - query: ...
    response_format: onerow

For context, the JMX module also has some automatic logic to merge fields or not. This behaviour can be controlled by adding an event setting with a name, and all the fields with the same event are merged together. Applying this strategy here it'd be something like this:

sql_queries:
  - query: ...
    response_format: table
    event: performance
  - query: ...
    response_format: table
    event: performance

@lalit-satapathy
Copy link
Contributor Author

Hi @jsoriano,

request_merge is at module level. Now the PR is out with an example usage, which gives the exact details.

I would also like to see an example with this setting, I wonder if you are planning to add this setting at the module level or at the query level. Settings at the query-level would give more control, but it is probably better to have this at the module level, so all the events generated for a data stream have similar sets of fields. Different data streams/modules can be used if different sets of events are wanted.

@lalit-satapathy
Copy link
Contributor Author

Two changes being done as per the PR reviews:

  • request_merge is now called as merge_results
  • sql.query not populated for merged_results case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. Draft enhancement Team:Service-Integrations Label for the Service Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants