-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix wait_for for delete search #1399
Conversation
@@ -416,7 +416,7 @@ | |||
(store/delete-search | |||
query | |||
identity-map | |||
(wait_for->refresh (:wait_for params)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still want to also set :refresh true
in case :wait_for
is true. Otherwise, search requests still might get deleted data during some time after delete task is finished. I stumbled upon this behaviour during testing cleanup logic for iroh-events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good suggestion, but it will force a full refresh of the index, and I am worried that it might be an attack vector if we enable that.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, it is possible to degrade performance of a cluster by overusing refresh=true
. At the same time, I feel safe if we know for sure when this method is called and don't expose it for direct http call. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the same time, I feel safe if we know for sure when this method is called and don't expose it for direct http call. :)
Here that's the parameter exposed in all CRUD route that is transformed in "wait_for", wait_for->refresh
never sets the parameter to "true". We could introduce the use of "true" later if we find a safe scenario of the warning from ES doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html
"This should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint."
"true creates less efficient indexes constructs (tiny segments) that must later be merged into more efficient index constructs (larger segments). Meaning that the cost of true is paid at index time to create the tiny segment, at search time to search the tiny segment, and at merge time to make the larger segments. "
Could you fix the hack I put in |
@@ -93,7 +93,7 @@ | |||
:REALLY_DELETE_ALL_THESE_ENTITIES true | |||
:wait_for true} | |||
:headers {"Authorization" "player-2-token"}) | |||
|
|||
_ (Thread/sleep 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DeLaGuardo I am puzzled :-)
wait_for is supposed to be ignored with a warning and when I removed it, it broke this test.
Looking for the reason (like wait_for would be changed to true or bad doc I do not know), I found some more explanation of why wait_for
is not supposed to be supported:
https://discuss.elastic.co/t/delete-by-query-refresh-wait-for-support/154118/5
Not supporting wait_for is my fault on both counts. I made wait_for and I made the reindex infrastructure that powers delete-by-query and I didn't link them because it is hard. wait_for hooks pretty deep into the shard to know when a refresh occurs. But delete-by-query functions at a much higher level. It could stick use refresh=wait_for on every bulk request that it sends but that'd slow down every bulk request while it waits for a refresh for each one. Delete-by-query can't hook into the wait_for infrastructure in any other way. So it doesn't. But the docs don't reflect that. I'll fix that.
Sadly I cannot, unless I wait in the route itself (see discussions above). |
This PR modifies current behaviour to use `wait_for_completion` instead of refresh to wait for the completion of the task. If it's set to `true` it returns the count of documents that match the request.
I've done this a number of times using different orgs because I ran into some request quotas. =) The documents I created looked like this: {
"description": "w1XNUT2UpehQrSAc 9236b3e7-71eb-41ad-88b6-b63dda3b755d",
"assignees": [],
"schema_version": "1.3.13",
"type": "incident",
"title": "verify-1399-1",
"incident_time": {
"opened": "2024-01-12T22:24:44.245Z"
},
"status": "New",
"id": "https://private.intel.test.iroh.site:443/ctia/incident/incident-88ac613b-7187-4a0a-bb58-df5f446b59ee",
"tlp": "amber",
"scores": {},
"client_id": "qa-test",
"groups": [
"c6586cc9-ad56-5c32-8a0e-2b39c9057659"
],
"timestamp": "2024-01-12T22:24:44.746Z",
"confidence": "High",
"owner": "2b06426f-9b03-4429-acbc-04de756ae00f"
} I've then deleted them with and with out the wait_for=true parameter. There is only about a half second difference on average. I.e.:
➜ ~/foghorn curl -w "@curl-format.txt" \
-X DELETE "https://private.intel.test.iroh.site/ctia/incident/search?REALLY_DELETE_ALL_THESE_ENTITIES=true&wait_for=true&query=18ff1df5-27c6-4105-aecf-e22dbdd69369" \
-H "accept: application/json" \
-H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6InBXMWY5TDdFdHdQQ0U1ajRrR1VvMUgyWXI1Q2hjTnFBLWZPWG9pUEg3QkkifQ.eyJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL3VzZXIvZW1haWwiOiJqYW1vc2VyK2psbSthZG1pbkBjaXNjby5jb20iLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL3VzZXIvaWRwL2lkIjoic3hzbyIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdXNlci9uaWNrIjoiSmFtZXMgTW9zZXIiLCJlbWFpbCI6ImphbW9zZXIramxtK2FkbWluQGNpc2NvLmNvbSIsImF1ZCI6WyJpcm9oLXVpIiwibG9naW4iXSwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy91c2VyL3JvbGUiOiJhZG1pbiIsInN1YiI6IjJiMDY0MjZmLTliMDMtNDQyOS1hY2JjLTA0ZGU3NTZhZTAwZiIsImlzcyI6IklST0ggQXV0aCIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvcmVmcmVzaC10b2tlbi1qdGkiOiJyZWZyZXNoLTQ4MzU0NTljLWFkZDAtNGY3Yi1hYzA3LTk3NTRmYWZiMzA3MiIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvc2NvcGVzIjpbImV2ZW50OnJlYWQiLCJpbnNpZ2h0cyIsInZhdWx0L2NvbmZpZ3M6cmVhZCIsImludGVncmF0aW9uIiwicHJpdmF0ZS1pbnRlbCIsImFkbWluIiwicHJvZmlsZSIsImluc3BlY3QiLCJhc3NldCIsImZlZWRiYWNrIiwic3NlIiwicmVnaXN0cnkiLCJ1c2VycyIsImludmVzdGlnYXRpb24iLCJpbnZpdGUiLCJwbGF5Ym9vayIsImNhc2Vib29rIiwidmF1bHQvY29uZmlnL21ldGFkYXRhOnJlYWQiLCJvcmJpdGFsIiwiZW5yaWNoIiwib2F1dGgiLCJjb2xsZWN0IiwicmVzcG9uc2UiLCJ1aS1zZXR0aW5ncyIsInRlbGVtZXRyeTp3cml0ZSIsIm9wZW5pZCIsIm5vdGlmaWNhdGlvbiIsImdsb2JhbC1pbnRlbDpyZWFkIiwid2ViaG9vayIsInZhdWx0L2NvbmZpZy9wb3N0dXJlOnJlYWQiLCJhbyJdLCJleHAiOjE3MDUxODI4MTQsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvY2xpZW50L25hbWUiOiJUaHJlYXQgUmVzcG9uc2UiLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29hdXRoL3VzZXIvaWQiOiIyYjA2NDI2Zi05YjAzLTQ0MjktYWNiYy0wNGRlNzU2YWUwMGYiLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29yZy9pZCI6ImM2NTg2Y2M5LWFkNTYtNWMzMi04YTBlLTJiMzljOTA1NzY1OSIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvZ3JhbnQiOiJsb2dpbiIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb3JnL25hbWUiOiJKTE0gU2VjdXJlWCIsImp0aSI6InRva2VuLWZmYjllYjI1LTNlMjQtNGU0Yy04NjYzLThlZWU1Y2Q3MjZhZiIsIm5iZiI6MTcwNTA5NjM1NCwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy91c2VyL25hbWUiOiJKYW1lcyBNb3NlciIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdXNlci9pZCI6IjJiMDY0MjZmLTliMDMtNDQyOS1hY2JjLTA0ZGU3NTZhZTAwZiIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvY2xpZW50L2lkIjoiaXJvaC11aSIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdmVyc2lvbiI6InYyLjEwLWM2ZDljZmM4NTU3OTg2NTA1YjkxIiwiaWF0IjoxNzA1MDk2NDE0LCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29hdXRoL2tpbmQiOiJzZXNzaW9uLXRva2VuIn0.Vep-wdt4Xs7dWfeDRFkg5TCh0GWBWGY0k0M9O-7sQlCumz7InlB5l0s6vJ2Aq_mBWyuZ3xPdXJhaaflyjv4gezOKHRdJvl6QB3LrZ50spmI9uPxoCPPocipFM0lffJ1fTBaXQRpLShqXaBvW0Y_G81eYdQId3mSZPvk7ErSQRBRkxfxoPoqlbDHEoAW8lz56GU1kv1DtHr6trV03JCFzPUfsrm0uHItGwuhzQ3nzQoR5ShCOLq4be6Jr3Hm8YxXiNZ4XeiVpJsiyQibTCzCzwnqfstQMEPwu2h2zmL3nzWJrW55kDBjf4qEzFCrctsKXhQ0hCo56mG_1ezzTHSlu-U1jbjW1XAt0cjLibA3mgCZjIzNjflr3Kji1dYQ_bqrRfu62Sj0N6CDxspznldcYpAz3aFKNzwdAce3mPorK30R2Da3fgLCCM_ntK2bSWAJKaMB2uQ1my2z5YsU5BDVr2dB3Sam3SjcLcPsASOAojt8KDHUnwuTfTbELHbzW9XxK6d1sZNv2M3XKsEUU1Da-u7dbQK9qBsIq70leYJkPPs-3Q0hBKvUnNbvVBOJ4eNmwARuU8QPGp-fX2TLJG_J1bVPZPcc-g0zRWUxqo-lC1QYD7m1r4xMz_EWI3wrM5p5JvMvrq4tV2OXjtDAGCPoTAaS335NxzGAip94XcD6zc_k"
10000
time_namelookup: 0.092481s
time_connect: 0.189726s
time_appconnect: 0.301800s
time_pretransfer: 0.302027s
time_redirect: 0.000000s
time_starttransfer: 0.951120s
----------
time_total: 0.951243s Though I did attempt to load one of the documents immediately after the request completed and they always returned a 404, with or without the paramter though. Unless someone has a more conclusive way for me to test this, I'm going to assume it works since there is consistently that half second longer for the request to finish. |
The only way would be to delete a bigger number of documents, you can try 100000 instead of 10000, you may see a bigger difference in the response time to finish with wait_for set to true (though the average half second may be related to the refresh interval of 1s of ES) |
I found in ES doc that the refresh parameters behaves otherwise with
_delete_by_query
and that "Unlike the delete API, it does not support wait_for".This PR modifies current behaviour to use
wait_for_completion
instead of refresh to wait for the completion of the task. If it's set totrue
it returns the count of documents that match the request.§ QA
Check that when
wait_for
is set to true forDELETE /ctia/incident/search
the http response wait for the completion of the task.This can be more easily verified on queries that match a large number of documents in which case it should take more time to complete.
One option is to insert 10000 incidents with a unique word in the description (like a uuid) and check that with
wait_for
set to true the query takes few seconds to complete, but once finished all the documents are actually deleted while withwait_for
set to false, the response will be faster but the documents will be deleted in the background.In both cases the number of deleted documents is returned.