Adds a controllably randomized version of nyc_taxis, for testing caching behavior #154

peteralfonsi · 2024-01-16T19:44:25Z

Description

Adds a version of nyc_taxis, called "cacheable_nyc_taxis", which uses random values in the gte/lte fields of the queries. This allows randomization, which is useful for testing caching behavior or other things.

The randomization is controlled by a workload parameter, "repeat_freq", between 0 and 1. This is the fraction of values that will be repeated (mimicking a cache hit for example). If repeat_freq is 0.3, then 30% of the time we use a repeatable "standard value" that has been precomputed before the workload. The index we use is drawn from a Zipf distribution, which mirrors the actual distribution of web cache requests. The other 70% of the time, we generate a totally new pair of values, which likely won't have been sen before.

Example usage:
opensearch-benchmark execute-test --pipeline=benchmark-only --workload-path=/home/ec2-user/osb/opensearch-benchmark-workloads/modified_nyc_taxis --workload-params='{"requests_cache_enabled":"true", "repeat_freq":"0.3"}' --target-host=http://localhost:9200

Issues Resolved

Resolves #152.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…zed parameters. Signed-off-by: Peter Alfonsi <[email protected]>

Signed-off-by: Peter Alfonsi <[email protected]>

IanHoang · 2024-05-15T21:28:30Z

Hi @peteralfonsi, sorry for the very late response. Thanks for contributing this. Is there a possibility to include this in the already existing version of NYC Taxis, perhaps as a test procedure? I see your intentions and understand that separating it into a separate workload might be cleaner, but I wonder if there's a way we can merge them together to avoid duplication of some elements. If that's not possible, I'm wondering if we can create a separate search corpora for this so that this stands alone as a workload.

IanHoang

Left a comment above. Let's work together on this to see if we can merge this into the existing NYC Taxis or add a new corpora to make this a stand-alone workload.

IanHoang · 2024-05-20T19:51:30Z

Closing for now. Feel free to reopen.

peteralfonsi · 2024-06-21T18:16:43Z

@IanHoang Sorry for the late response - I didn't see this until now. I can look into doing this.

peteralfonsi requested review from IanHoang and gkamat as code owners January 16, 2024 19:44

Peter Alfonsi added 4 commits January 16, 2024 11:44

Adds a cacheable version of nyc_taxis which uses controllably randomi…

3ea51e0

…zed parameters. Signed-off-by: Peter Alfonsi <[email protected]>

Removed misc files that had been used for our own benchmarking

12aec50

Signed-off-by: Peter Alfonsi <[email protected]>

Restored download.sh

52b11ca

Signed-off-by: Peter Alfonsi <[email protected]>

Updated gitignore

02e7a48

Signed-off-by: Peter Alfonsi <[email protected]>

peteralfonsi force-pushed the zipf-final branch from bdf3b9e to 02e7a48 Compare January 16, 2024 19:44

Peter Alfonsi added 2 commits January 16, 2024 13:02

Corrected iteration numbers

1eeba69

Signed-off-by: Peter Alfonsi <[email protected]>

increased iters to 10k/40k

10ea45d

Signed-off-by: Peter Alfonsi <[email protected]>

peteralfonsi force-pushed the zipf-final branch from b1e1ff5 to 10ea45d Compare January 17, 2024 20:27

removed number of passenger query

d5da673

peteralfonsi mentioned this pull request Jan 22, 2024

[RFC] Adding capabilities to introduce randomness in workload queries opensearch-project/opensearch-benchmark#443

Closed

IanHoang requested changes May 15, 2024

View reviewed changes

IanHoang closed this May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a controllably randomized version of nyc_taxis, for testing caching behavior #154

Adds a controllably randomized version of nyc_taxis, for testing caching behavior #154

peteralfonsi commented Jan 16, 2024

IanHoang commented May 15, 2024

IanHoang left a comment

IanHoang commented May 20, 2024

peteralfonsi commented Jun 21, 2024

Adds a controllably randomized version of nyc_taxis, for testing caching behavior #154

Adds a controllably randomized version of nyc_taxis, for testing caching behavior #154

Conversation

peteralfonsi commented Jan 16, 2024

Description

Issues Resolved

IanHoang commented May 15, 2024

IanHoang left a comment

Choose a reason for hiding this comment

IanHoang commented May 20, 2024

peteralfonsi commented Jun 21, 2024