-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a controllably randomized version of nyc_taxis, for testing caching behavior #154
Conversation
…zed parameters. Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
bdf3b9e
to
02e7a48
Compare
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
b1e1ff5
to
10ea45d
Compare
Hi @peteralfonsi, sorry for the very late response. Thanks for contributing this. Is there a possibility to include this in the already existing version of NYC Taxis, perhaps as a test procedure? I see your intentions and understand that separating it into a separate workload might be cleaner, but I wonder if there's a way we can merge them together to avoid duplication of some elements. If that's not possible, I'm wondering if we can create a separate search corpora for this so that this stands alone as a workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment above. Let's work together on this to see if we can merge this into the existing NYC Taxis or add a new corpora to make this a stand-alone workload.
Closing for now. Feel free to reopen. |
@IanHoang Sorry for the late response - I didn't see this until now. I can look into doing this. |
Description
Adds a version of nyc_taxis, called "cacheable_nyc_taxis", which uses random values in the gte/lte fields of the queries. This allows randomization, which is useful for testing caching behavior or other things.
The randomization is controlled by a workload parameter, "repeat_freq", between 0 and 1. This is the fraction of values that will be repeated (mimicking a cache hit for example). If repeat_freq is 0.3, then 30% of the time we use a repeatable "standard value" that has been precomputed before the workload. The index we use is drawn from a Zipf distribution, which mirrors the actual distribution of web cache requests. The other 70% of the time, we generate a totally new pair of values, which likely won't have been sen before.
Example usage:
opensearch-benchmark execute-test --pipeline=benchmark-only --workload-path=/home/ec2-user/osb/opensearch-benchmark-workloads/modified_nyc_taxis --workload-params='{"requests_cache_enabled":"true", "repeat_freq":"0.3"}' --target-host=http://localhost:9200
Issues Resolved
Resolves #152.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.