Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor alert_processor proc to be based on the ECS correlation logic #63

Merged
merged 25 commits into from
Apr 11, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
be8dbf9
Refactor JS procedure
sfc-gh-nlele Mar 14, 2023
dcfa8c2
Update alert_processor.js
sfc-gh-nlele Mar 16, 2023
3994fe8
Update alert_processor.js
sfc-gh-nlele Mar 16, 2023
9825453
Update alert_processor.js
sfc-gh-nlele Mar 17, 2023
6011087
Remove data.alerts template param from processor
sfc-gh-nlele Mar 17, 2023
3854f45
Formatting
sfc-gh-nlele Mar 20, 2023
11179a0
Add CORRELATION_PERIOD_MINUTES arg
sfc-gh-nlele Mar 24, 2023
1dce935
Update alert_processor.js
sfc-gh-nlele Mar 24, 2023
334e729
Update alert_processor.js
sfc-gh-nlele Mar 24, 2023
6dc2de3
Update alert_processor.js
sfc-gh-nlele Mar 24, 2023
4e7afd8
Update procedures.tf
sfc-gh-nlele Mar 24, 2023
9b6b4fd
Add task and schedule for the alert processor
sfc-gh-nlele Mar 24, 2023
9d570f7
Misc refactor
sfc-gh-nlele Mar 28, 2023
ce43a2e
Update alert processor default schedule
sfc-gh-nlele Mar 28, 2023
29e1f9d
Update alert_processor.js
sfc-gh-nlele Mar 28, 2023
0c84ad5
Update alert_processor.js
sfc-gh-nlele Mar 29, 2023
180d128
Update alert_processor.js
sfc-gh-nlele Mar 29, 2023
1134e4c
Update alert_processor.js
sfc-gh-nlele Mar 29, 2023
2970cbf
Update tfvars sample
sfc-gh-nlele Mar 29, 2023
9f23813
Update alert_processor.js
sfc-gh-nlele Mar 29, 2023
19e8abe
Update alert_processor.js
sfc-gh-nlele Apr 3, 2023
2b19dd0
Rename variable
sfc-gh-nlele Apr 3, 2023
209cd20
TF changes to run the correlation task after the suppressor
sfc-gh-nlele Apr 3, 2023
3c742dc
Return null instead of generate_uuid when finding correlation id
sfc-gh-nlele Apr 11, 2023
c1d02b2
Remove unnneded generate_uuid function
sfc-gh-nlele Apr 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions procedures.tf
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,6 @@ resource "snowflake_procedure" "alert_processor" {
local.results_schema,
local.alerts_table,
])
data_alerts_view = join(".", [
local.snowalert_database_name,
local.data_schema,
snowflake_view.alerts.name,
])
})

depends_on = [
Expand Down
140 changes: 83 additions & 57 deletions procedures_js/alert_processor.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
// args
sfc-gh-afedorov marked this conversation as resolved.
Show resolved Hide resolved
var CORRELATION_PERIOD_MINUTES
const CORRELATION_PERIOD_MINUTES = -60

CORRELATION_PERIOD_MINUTES = CORRELATION_PERIOD_MINUTES || 60
sfc-gh-afedorov marked this conversation as resolved.
Show resolved Hide resolved
var alert_correlation_result = []

// library
function exec(sqlText, binds = []) {
Expand All @@ -24,60 +23,87 @@ function exec(sqlText, binds = []) {
return retval
}

CORRELATE = `
MERGE INTO ${results_alerts_table} dst
USING (
SELECT
d.id alert_id_to_update,
COALESCE(
p.correlation_id,
d.correlation_id,
UUID_STRING()
) correlation_id
FROM ${data_alerts_view} d -- destination
LEFT OUTER JOIN ${data_alerts_view} p -- potential chain
ON (
d.id != p.id
AND (
d.alert_time > p.alert_time
OR (
d.alert_time = p.alert_time
AND d.id > p.id
)
)
AND p.alert_time > d.alert_time - INTERVAL '1 hour'
AND p.correlation_id IS NOT NULL
AND p.actor = d.actor
AND (
p.object = d.object
OR p.action = d.action
)
)
WHERE d.suppressed = FALSE
AND d.alert_time > CURRENT_TIMESTAMP - INTERVAL '$${CORRELATION_PERIOD_MINUTES} minutes'
QUALIFY 1=ROW_NUMBER() OVER (
PARTITION BY d.id -- one update per destination id
ORDER BY -- most recent wins
p.alert_time DESC, p.id DESC
)
) src
ON (
dst.alert:ALERT_ID = src.alert_id_to_update
AND (
dst.correlation_id IS NULL
OR dst.correlation_id != src.correlation_id
)
)
WHEN MATCHED THEN UPDATE SET
correlation_id = src.correlation_id
GET_CORRELATED_ALERT = `
sfc-gh-afedorov marked this conversation as resolved.
Show resolved Hide resolved
SELECT *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why select * if all you use is the CORRELATED_ID?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had kept the SQL to be same as the original, but makes sense, will just select the correlation_id.

FROM ${results_alerts_table}
WHERE alert:ACTOR = ?
AND (alert:OBJECT::STRING = ? OR alert:ACTION::STRING = ?)
AND correlation_id IS NOT NULL
AND NOT IS_NULL_VALUE(alert:ACTOR)
AND suppressed = FALSE
AND event_time > DATEADD(minutes, $${CORRELATION_PERIOD_MINUTES}, ?)
ORDER BY event_time DESC
LIMIT 1
`
var n,
results = []
do {
n = exec(CORRELATE)[0]['number of rows updated']
results.push(n)
} while (n != 0)

return {
ROWS_UPDATED: results,
function generate_uuid() {
GENERATE_UUID = `SELECT UUID_STRING()`
return exec(GENERATE_UUID)[0][['UUID_STRING()']]
}

function get_correlation_id(alert) {
if (
'ACTOR' in alert == false ||
'OBJECT' in alert == false ||
'ACTION' in alert == false ||
'EVENT_TIME' in alert == false
) {
return generate_uuid()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can return null now that you have the piece with COALESCE below?

} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else if the first if always returns so that it's more clearly an early bail condition on the rest of the function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right makes sense 👍

actor = alert['ACTOR']
object = alert['OBJECT']
action = alert['ACTION']
time = alert['EVENT_TIME']

if (object instanceof Array) {
o = object.join('","')
object = `'["$${o}"]'`
}
if (action instanceof Array) {
o = object.join('","')
object = `'["$${o}"]'`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean action?

(caught by GPT-4!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry yes.

(caught by GPT-4!)

Wow 😮

}
}

try {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I follow when this would throw an exception -- could you make it more clear, or remove the exception and just do match = exec(...)[0] which would be either the first row object or undefined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to try and catch any unexpected errors here, in order to match the ECS version in terms of exception handling. Will just get the first row 👍

match = exec(GET_CORRELATED_ALERT, [actor, object, action, time])
} catch (e) {
match = []
}
correlation_id =
match.length > 0 && 'CORRELATION_ID' in match[0]
? match[0]['CORRELATION_ID']
: generate_uuid()

return correlation_id
}

GET_ALERTS_WITHOUT_CORREALTION_ID = `
SELECT *
FROM ${results_alerts_table}
WHERE correlation_id IS NULL
AND suppressed = FALSE
AND alert_time > DATEADD(hour, -2, CURRENT_TIMESTAMP())
`

UPDATE_ALERT_CORRELATION_ID = `
UPDATE ${results_alerts_table}
SET correlation_id = ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to call UUID_STRING() here, instead, in something like COLLATE(?, UUID_STRING())

Copy link
Collaborator Author

@sfc-gh-nlele sfc-gh-nlele Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean COALESCE instead? Would we want it to serve as a fail-safe, and not remove generate_uuid() from find_related_correlation_id?

WHERE alert:EVENT_TIME > DATEADD(minutes, $${CORRELATION_PERIOD_MINUTES}, ?)
AND alert:ALERT_ID = ?
`

UNCORRELATED_ALERTS = exec(GET_ALERTS_WITHOUT_CORREALTION_ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this naming is a little confusing since in other places they are constant strings. since it's only used once, could you move this into the for loop directly instead of naming it?


for (const x of UNCORRELATED_ALERTS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use row instead of x

alert_body = x['ALERT']
correlation_id = get_correlation_id(alert_body)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this queries the db, let's maybe call it find_related_correlation_id or something else that hints the database is going to be selected?

event_time = String(alert_body['EVENT_TIME'])
alert_id = alert_body['ALERT_ID']

alert_correlation_result.push(
exec(UPDATE_ALERT_CORRELATION_ID, [correlation_id, event_time, alert_id])
)
}

return { alert_correlation_result: alert_correlation_result }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to repeat the key in JS instead of {x: x} you can just do {x}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you determine whether this return value is something that can be recorded? if not, let's remove this as it's misleading. if so, what value do you expect to be returned here?

Copy link
Collaborator Author

@sfc-gh-nlele sfc-gh-nlele Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we can store the results, without an insert statement running in the proc, to record these values.

However, the return value here is something like below right now, we could maybe pair these with alert_id for more context.

But I think since the individual call statement results will still be available for 24 hours in the query history, it might not hurt to return these?

{
  "alert_correlation_result": [
     {
        "number of multi-joined rows updated": 0,
        "number of rows updated": 1
     },
     {
        "number of multi-joined rows updated": 0,
        "number of rows updated": 1
    },
    {
        "number of multi-joined rows updated": 0,
        "number of rows updated": 1
   }
  ]
}