Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix wait_for for delete search #1399

Merged
merged 8 commits into from
Dec 21, 2023
Merged

fix wait_for for delete search #1399

merged 8 commits into from
Dec 21, 2023

Conversation

ereteog
Copy link
Contributor

@ereteog ereteog commented Dec 12, 2023

I found in ES doc that the refresh parameters behaves otherwise with _delete_by_query and that "Unlike the delete API, it does not support wait_for".
This PR modifies current behaviour to use wait_for_completion instead of refresh to wait for the completion of the task. If it's set to true it returns the count of documents that match the request.

§ QA

Check that when wait_for is set to true for DELETE /ctia/incident/search the http response wait for the completion of the task.
This can be more easily verified on queries that match a large number of documents in which case it should take more time to complete.
One option is to insert 10000 incidents with a unique word in the description (like a uuid) and check that with wait_for set to true the query takes few seconds to complete, but once finished all the documents are actually deleted while with wait_for set to false, the response will be faster but the documents will be deleted in the background.
In both cases the number of deleted documents is returned.

@@ -416,7 +416,7 @@
(store/delete-search
query
identity-map
(wait_for->refresh (:wait_for params))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you still want to also set :refresh true in case :wait_for is true. Otherwise, search requests still might get deleted data during some time after delete task is finished. I stumbled upon this behaviour during testing cleanup logic for iroh-events.

Copy link
Contributor Author

@ereteog ereteog Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good suggestion, but it will force a full refresh of the index, and I am worried that it might be an attack vector if we enable that.
wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, it is possible to degrade performance of a cluster by overusing refresh=true. At the same time, I feel safe if we know for sure when this method is called and don't expose it for direct http call. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time, I feel safe if we know for sure when this method is called and don't expose it for direct http call. :)

Here that's the parameter exposed in all CRUD route that is transformed in "wait_for", wait_for->refresh never sets the parameter to "true". We could introduce the use of "true" later if we find a safe scenario of the warning from ES doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html
"This should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint."
"true creates less efficient indexes constructs (tiny segments) that must later be merged into more efficient index constructs (larger segments). Meaning that the cost of true is paid at index time to create the tiny segment, at search time to search the tiny segment, and at merge time to make the larger segments. "

@frenchy64
Copy link
Contributor

frenchy64 commented Dec 13, 2023

Could you fix the hack I put in purge-incidents! in the incidents tests to work around this? https://github.com/threatgrid/ctia/pull/1397/files

@@ -93,7 +93,7 @@
:REALLY_DELETE_ALL_THESE_ENTITIES true
:wait_for true}
:headers {"Authorization" "player-2-token"})

_ (Thread/sleep 1000)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DeLaGuardo I am puzzled :-)
wait_for is supposed to be ignored with a warning and when I removed it, it broke this test.
Looking for the reason (like wait_for would be changed to true or bad doc I do not know), I found some more explanation of why wait_for is not supposed to be supported:
https://discuss.elastic.co/t/delete-by-query-refresh-wait-for-support/154118/5

Not supporting wait_for is my fault on both counts. I made wait_for and I made the reindex infrastructure that powers delete-by-query and I didn't link them because it is hard. wait_for hooks pretty deep into the shard to know when a refresh occurs. But delete-by-query functions at a much higher level. It could stick use refresh=wait_for on every bulk request that it sends but that'd slow down every bulk request while it waits for a refresh for each one. Delete-by-query can't hook into the wait_for infrastructure in any other way. So it doesn't. But the docs don't reflect that. I'll fix that.

@ereteog
Copy link
Contributor Author

ereteog commented Dec 13, 2023

Could you fix the hack I put in purge-incidents! in the incidents tests to work around this? https://github.com/threatgrid/ctia/pull/1397/files

Sadly I cannot, unless I wait in the route itself (see discussions above).

@ereteog ereteog merged commit f0b02e6 into master Dec 21, 2023
14 checks passed
frenchy64 pushed a commit that referenced this pull request Dec 21, 2023
This PR modifies current behaviour to use `wait_for_completion` instead of refresh to wait for the completion of the task. If it's set to `true` it returns the count of documents that match the request.
@jamoser-cisco
Copy link

I've done this a number of times using different orgs because I ran into some request quotas. =)

The documents I created looked like this:

{
  "description": "w1XNUT2UpehQrSAc 9236b3e7-71eb-41ad-88b6-b63dda3b755d",
  "assignees": [],
  "schema_version": "1.3.13",
  "type": "incident",
  "title": "verify-1399-1",
  "incident_time": {
    "opened": "2024-01-12T22:24:44.245Z"
  },
  "status": "New",
  "id": "https://private.intel.test.iroh.site:443/ctia/incident/incident-88ac613b-7187-4a0a-bb58-df5f446b59ee",
  "tlp": "amber",
  "scores": {},
  "client_id": "qa-test",
  "groups": [
    "c6586cc9-ad56-5c32-8a0e-2b39c9057659"
  ],
  "timestamp": "2024-01-12T22:24:44.746Z",
  "confidence": "High",
  "owner": "2b06426f-9b03-4429-acbc-04de756ae00f"
}

I've then deleted them with and with out the wait_for=true parameter. There is only about a half second difference on average. I.e.:

➜  ~/foghorn curl -w "@curl-format.txt" \
  -X DELETE "https://private.intel.test.iroh.site/ctia/incident/search?REALLY_DELETE_ALL_THESE_ENTITIES=true&query=a3f9aae3-6c85-40dd-8fef-56fe8f90c03c" \
  -H "accept: application/json" \
  -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6InBXMWY5TDdFdHdQQ0U1ajRrR1VvMUgyWXI1Q2hjTnFBLWZPWG9pUEg3QkkiLCJ0eXAiOiJKV1QifQ.eyJhdWQiOiJqWWpXb3lmUHE1dGg1VFdVOUNIckJVNDVQTVN2bXVyaiIsImVtYWlsIjoiZ2VuZXJhdGVJbmNpZGVudHMiLCJleHAiOjE3MDUxMDQwNzYsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvY2xpZW50L2lkIjoicWEtdGVzdCIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvY2xpZW50L25hbWUiOiJ2bWMxNm9lMFUxN05DSkdWeWdKQVM1Y2pVMzE5c3paZCIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvZ3JhbnQiOiJsb2dpbiIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgva2luZCI6InNlc3Npb24tdG9rZW4iLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29hdXRoL3JlZnJlc2gtdG9rZW4tanRpIjoiZDdiMTFiMDQtNzI4My00NTMyLTg1MGYtODMxNjJhYzlkYmVmIiwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy9vYXV0aC91c2VyL2lkIjoiZ2VuZXJhdGVJbmNpZGVudHMiLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29yZy9pZCI6ImdlbmVyYXRlSW5jaWRlbnRzIiwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy9vcmcvbmFtZSI6ImdlbmVyYXRlSW5jaWRlbnRzIiwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy9zY29wZXMiOlsiYWRtaW4iLCJhbyIsImFzc2V0IiwiY2FzZWJvb2siLCJjaXNjby9mZWF0dXJlLWZsYWcveGRyIiwiY29sbGVjdCIsImVucmljaCIsImV2ZW50OnJlYWQiLCJmZWVkYmFjayIsImdsb2JhbC1pbnRlbDpyZWFkIiwiaW5zaWdodHMiLCJpbnNwZWN0IiwiaW50ZWdyYXRpb24iLCJpbnZpdGUiLCJpbnZlc3RpZ2F0aW9uIiwibm90aWZpY2F0aW9uIiwib2F1dGgiLCJvcGVuaWQiLCJvcmJpdGFsIiwicHJpdmF0ZS1pbnRlbCIsInByb2ZpbGUiLCJyZWdpc3RyeSIsInJlc3BvbnNlIiwic3NlIiwidGVsZW1ldHJ5OndyaXRlIiwidWktc2V0dGluZ3MiLCJ1c2VycyIsInZhdWx0L2NvbmZpZy9tZXRhZGF0YTpyZWFkIiwidmF1bHQvY29uZmlnL3Bvc3R1cmU6cmVhZCIsInZhdWx0L2NvbmZpZ3M6cmVhZCIsIndlYmhvb2siXSwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy91c2VyL2VtYWlsIjoiZ2VuZXJhdGVJbmNpZGVudHMiLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL3VzZXIvaWQiOiJnZW5lcmF0ZUluY2lkZW50cyIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdXNlci9pZHAvaWQiOiJpZGItYW1wLXN0YWdpbmciLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL3VzZXIvbmFtZSI6ImdlbmVyYXRlSW5jaWRlbnRzIiwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy91c2VyL25pY2siOiJnZW5lcmF0ZUluY2lkZW50cyIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdXNlci9yb2xlIjoiYWRtaW4iLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL3ZlcnNpb24iOiJ2Mi4xMC1jNmQ5Y2ZjODU1Nzk4NjUwNWI5MSIsImlhdCI6MTcwNTEwMDQ1NiwiaXNzIjoiSVJPSCBBdXRoIiwianRpIjoiZmI2NGY2ZmUtMzZjMS00Mzk3LTk3NWItYmQ0NGQzZGY1YzhhIiwibmJmIjoxNzA1MTAwNDU2LCJzdWIiOiJnZW5lcmF0ZUluY2lkZW50cyJ9.o7412m9ZUoiHNqG496Hs21e9yzfYoCRntY1GZEYj87-1DbJ85XspOqBj38HwEhDQOz26WmLjgRdZhytKCKtcmt9l0rkUi7xMkLj_o3GlX0_ndT-o9dTEWRyKfv4VqH5241raexvj6OOedS7otcZmqMSupeHqwXavmCG6tRkpxZ2m7P_109cx0UJPEcd3P3fVuwLP7V5VOzOYd36cBoDy3USyeU2x-4DT8xUbHBMXlAJ8h9GRUDfVfX8fD54YrazC4NUnBgoDcWHNHodiSzDK4WnHUFoDPF1tOjt-G3uNWGeXgqaFeGjeZ0oGFHgrel-oAMMA8IImB4Tz9gdu-RMlTvG4RIM9Fk74SLLQiuW6ys4UfggaD9-LoUAYUAkVhVlC-HPzM8ysXxsLEcAsl7g0T4b9IRHkMcgNK_NFfJXoqovEB3k4uJKySB2sDUshHNlItfynyEdPsNstFNwFZoWxnnNyJpdg3wWzVbPsAPZPNsoeqLKMVQLumgQAf2fvCx-P1GQQPT70J7hN6aWFLCyv0QP9zioGv6197l5n_Hg_Y3XCAU6gijjab86lXS8EQaJn89QEfX3tgrKg4eK4vgqYceW7xm279rOsg0iWPHQ56F-flVfz-t28BR-7Lvr2UPYQBI3AmTazReUuDwpBZSGUwduVBqxZFUy70rPFrvRADn4"
10000
     time_namelookup:  0.080011s
        time_connect:  0.181044s
     time_appconnect:  0.285594s
    time_pretransfer:  0.285914s
       time_redirect:  0.000000s
  time_starttransfer:  0.400236s
                     ----------
          time_total:  0.400480s
~/foghorn curl -w "@curl-format.txt" \
  -X DELETE "https://private.intel.test.iroh.site/ctia/incident/search?REALLY_DELETE_ALL_THESE_ENTITIES=true&wait_for=true&query=18ff1df5-27c6-4105-aecf-e22dbdd69369" \
  -H "accept: application/json" \
  -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6InBXMWY5TDdFdHdQQ0U1ajRrR1VvMUgyWXI1Q2hjTnFBLWZPWG9pUEg3QkkifQ.eyJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL3VzZXIvZW1haWwiOiJqYW1vc2VyK2psbSthZG1pbkBjaXNjby5jb20iLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL3VzZXIvaWRwL2lkIjoic3hzbyIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdXNlci9uaWNrIjoiSmFtZXMgTW9zZXIiLCJlbWFpbCI6ImphbW9zZXIramxtK2FkbWluQGNpc2NvLmNvbSIsImF1ZCI6WyJpcm9oLXVpIiwibG9naW4iXSwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy91c2VyL3JvbGUiOiJhZG1pbiIsInN1YiI6IjJiMDY0MjZmLTliMDMtNDQyOS1hY2JjLTA0ZGU3NTZhZTAwZiIsImlzcyI6IklST0ggQXV0aCIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvcmVmcmVzaC10b2tlbi1qdGkiOiJyZWZyZXNoLTQ4MzU0NTljLWFkZDAtNGY3Yi1hYzA3LTk3NTRmYWZiMzA3MiIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvc2NvcGVzIjpbImV2ZW50OnJlYWQiLCJpbnNpZ2h0cyIsInZhdWx0L2NvbmZpZ3M6cmVhZCIsImludGVncmF0aW9uIiwicHJpdmF0ZS1pbnRlbCIsImFkbWluIiwicHJvZmlsZSIsImluc3BlY3QiLCJhc3NldCIsImZlZWRiYWNrIiwic3NlIiwicmVnaXN0cnkiLCJ1c2VycyIsImludmVzdGlnYXRpb24iLCJpbnZpdGUiLCJwbGF5Ym9vayIsImNhc2Vib29rIiwidmF1bHQvY29uZmlnL21ldGFkYXRhOnJlYWQiLCJvcmJpdGFsIiwiZW5yaWNoIiwib2F1dGgiLCJjb2xsZWN0IiwicmVzcG9uc2UiLCJ1aS1zZXR0aW5ncyIsInRlbGVtZXRyeTp3cml0ZSIsIm9wZW5pZCIsIm5vdGlmaWNhdGlvbiIsImdsb2JhbC1pbnRlbDpyZWFkIiwid2ViaG9vayIsInZhdWx0L2NvbmZpZy9wb3N0dXJlOnJlYWQiLCJhbyJdLCJleHAiOjE3MDUxODI4MTQsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvY2xpZW50L25hbWUiOiJUaHJlYXQgUmVzcG9uc2UiLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29hdXRoL3VzZXIvaWQiOiIyYjA2NDI2Zi05YjAzLTQ0MjktYWNiYy0wNGRlNzU2YWUwMGYiLCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29yZy9pZCI6ImM2NTg2Y2M5LWFkNTYtNWMzMi04YTBlLTJiMzljOTA1NzY1OSIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvZ3JhbnQiOiJsb2dpbiIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb3JnL25hbWUiOiJKTE0gU2VjdXJlWCIsImp0aSI6InRva2VuLWZmYjllYjI1LTNlMjQtNGU0Yy04NjYzLThlZWU1Y2Q3MjZhZiIsIm5iZiI6MTcwNTA5NjM1NCwiaHR0cHM6Ly9zY2hlbWFzLmNpc2NvLmNvbS9pcm9oL2lkZW50aXR5L2NsYWltcy91c2VyL25hbWUiOiJKYW1lcyBNb3NlciIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdXNlci9pZCI6IjJiMDY0MjZmLTliMDMtNDQyOS1hY2JjLTA0ZGU3NTZhZTAwZiIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvb2F1dGgvY2xpZW50L2lkIjoiaXJvaC11aSIsImh0dHBzOi8vc2NoZW1hcy5jaXNjby5jb20vaXJvaC9pZGVudGl0eS9jbGFpbXMvdmVyc2lvbiI6InYyLjEwLWM2ZDljZmM4NTU3OTg2NTA1YjkxIiwiaWF0IjoxNzA1MDk2NDE0LCJodHRwczovL3NjaGVtYXMuY2lzY28uY29tL2lyb2gvaWRlbnRpdHkvY2xhaW1zL29hdXRoL2tpbmQiOiJzZXNzaW9uLXRva2VuIn0.Vep-wdt4Xs7dWfeDRFkg5TCh0GWBWGY0k0M9O-7sQlCumz7InlB5l0s6vJ2Aq_mBWyuZ3xPdXJhaaflyjv4gezOKHRdJvl6QB3LrZ50spmI9uPxoCPPocipFM0lffJ1fTBaXQRpLShqXaBvW0Y_G81eYdQId3mSZPvk7ErSQRBRkxfxoPoqlbDHEoAW8lz56GU1kv1DtHr6trV03JCFzPUfsrm0uHItGwuhzQ3nzQoR5ShCOLq4be6Jr3Hm8YxXiNZ4XeiVpJsiyQibTCzCzwnqfstQMEPwu2h2zmL3nzWJrW55kDBjf4qEzFCrctsKXhQ0hCo56mG_1ezzTHSlu-U1jbjW1XAt0cjLibA3mgCZjIzNjflr3Kji1dYQ_bqrRfu62Sj0N6CDxspznldcYpAz3aFKNzwdAce3mPorK30R2Da3fgLCCM_ntK2bSWAJKaMB2uQ1my2z5YsU5BDVr2dB3Sam3SjcLcPsASOAojt8KDHUnwuTfTbELHbzW9XxK6d1sZNv2M3XKsEUU1Da-u7dbQK9qBsIq70leYJkPPs-3Q0hBKvUnNbvVBOJ4eNmwARuU8QPGp-fX2TLJG_J1bVPZPcc-g0zRWUxqo-lC1QYD7m1r4xMz_EWI3wrM5p5JvMvrq4tV2OXjtDAGCPoTAaS335NxzGAip94XcD6zc_k"

10000
     time_namelookup:  0.092481s
        time_connect:  0.189726s
     time_appconnect:  0.301800s
    time_pretransfer:  0.302027s
       time_redirect:  0.000000s
  time_starttransfer:  0.951120s
                     ----------
          time_total:  0.951243s

Though I did attempt to load one of the documents immediately after the request completed and they always returned a 404, with or without the paramter though.

Unless someone has a more conclusive way for me to test this, I'm going to assume it works since there is consistently that half second longer for the request to finish.

@ereteog
Copy link
Contributor Author

ereteog commented Jan 15, 2024

Unless someone has a more conclusive way for me to test this, I'm going to assume it works since there is consistently that half second longer for the request to finish.

The only way would be to delete a bigger number of documents, you can try 100000 instead of 10000, you may see a bigger difference in the response time to finish with wait_for set to true (though the average half second may be related to the refresh interval of 1s of ES)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants