Refactor distributed query #558

zhuoyuan-liu · 2024-11-05T12:56:01Z

@javuto , This PR is not completed but shows the idea of how we move all the calculations from each node's distributed query request to the distributed query creation.

In this idea, we would create a new table that records which node should execute which query. In this case, we only iteration all nodes when we create the distributed query. When osquery sends the distributed query result, we only check this table and we don't need to go through the whole list of distributed queries.

I also preferred a small change:

Use a single status column instead of several ones. It would be simplicity and readability. Also, it would be efficient when querying the status, we don't have to write a query for each column. A classic example would be "active = ? AND completed = ? AND deleted = ? AND expired = ? AND type = ? AND environment_id = ?",

In this case, we can get rid of several tables, we only need one table to track the status of all distributed queries.

TODO after this merged:

Update the logic of completing a distributed query
Update the logic shown expected node and completed node
Remove several tables that are no longer needed

queries/queries.go

zhuoyuan-liu · 2024-11-14T12:48:07Z

@javuto Could you please take a look and give some feedback? It was tested in our dev cluster. All the core functions work well. However, there is a bug when using multiple tags: Currently, it uses union instead of intersection. i.e. If we use two tags: {env:dev, platform:ubuntu}, in the past, it should select nodes that have both tags, but now it will select all nodes that have either one of the tags. It will be fixed when we refactor the old logic and in the future, it will supports more query based on different tags. e.g. all nodes in dev environment but not running on the windows platform.

I tried to make this PR as small as possible and only replaced the way of creating new queries.

Here are the follow-up tasks after this PR:
Update the logic of completing a distributed query
Update the logic showing expected node and completed node
Remove several tables that are no longer needed

admin/handlers/post.go

queries/queries.go

queries/queries_test.go

javuto · 2024-12-09T15:06:08Z

Let's merge this and continue the refactor, nice job! 👏 🫡

zhuoyuan-liu added 2 commits November 4, 2024 15:09

add a new table for node query

8cc7af0

update the behaviour when creating query

ad49573

zhuoyuan-liu mentioned this pull request Nov 5, 2024

Run distributed query/carve based on custom tags #529

Open

zhuoyuan-liu commented Nov 5, 2024

View reviewed changes

queries/queries.go Outdated Show resolved Hide resolved

javuto added refactor Refactorization of code queries On-demand queries related issues labels Nov 5, 2024

javuto changed the title ~~Refactor distributed query~~ WIP: Refactor distributed query Nov 7, 2024

zhuoyuan-liu mentioned this pull request Nov 11, 2024

Reduce the frequency for database updating #562

Open

zhuoyuan-liu added 3 commits November 13, 2024 14:46

Merge remote-tracking branch 'upstream/main' into distributed-query

be56032

add query

e5bddf3

bug fix

5a759a4

zhuoyuan-liu marked this pull request as draft November 13, 2024 15:50

zhuoyuan-liu added 6 commits November 14, 2024 11:07

fix

2fd500c

update for osctrl-admin

e69b42f

Add test for node query

5b7e559

Get the query id for osctrl-admin

5e36655

Add test for CreateNodeQuery

92f6f81

add real-time process stats back

81909d9

zhuoyuan-liu marked this pull request as ready for review November 14, 2024 12:48

zhuoyuan-liu commented Nov 14, 2024

View reviewed changes

admin/handlers/post.go Show resolved Hide resolved

zhuoyuan-liu changed the title ~~WIP: Refactor distributed query~~ Refactor distributed query Nov 14, 2024

javuto reviewed Nov 20, 2024

View reviewed changes

queries/queries.go Show resolved Hide resolved

javuto reviewed Nov 20, 2024

View reviewed changes

queries/queries.go Outdated Show resolved Hide resolved

javuto reviewed Nov 20, 2024

View reviewed changes

queries/queries_test.go Show resolved Hide resolved

Use const for status

fb2ec8c

javuto approved these changes Dec 9, 2024

View reviewed changes

javuto merged commit e8b9832 into jmpsec:main Dec 12, 2024
24 checks passed

javuto mentioned this pull request Dec 27, 2024

On-demand queries never get completed #577

Closed

BrewTestBot mentioned this pull request Jan 10, 2025

osctrl-cli 0.4.2 Homebrew/homebrew-core#203834

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor distributed query #558

Refactor distributed query #558

zhuoyuan-liu commented Nov 5, 2024 •

edited

Loading

zhuoyuan-liu commented Nov 14, 2024 •

edited

Loading

javuto commented Dec 9, 2024

Refactor distributed query #558

Refactor distributed query #558

Conversation

zhuoyuan-liu commented Nov 5, 2024 • edited Loading

zhuoyuan-liu commented Nov 14, 2024 • edited Loading

javuto commented Dec 9, 2024

zhuoyuan-liu commented Nov 5, 2024 •

edited

Loading

zhuoyuan-liu commented Nov 14, 2024 •

edited

Loading