Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a controllably randomized version of nyc_taxis, for testing caching behavior #154

Closed
wants to merge 7 commits into from

Conversation

peteralfonsi
Copy link
Contributor

Description

Adds a version of nyc_taxis, called "cacheable_nyc_taxis", which uses random values in the gte/lte fields of the queries. This allows randomization, which is useful for testing caching behavior or other things.

The randomization is controlled by a workload parameter, "repeat_freq", between 0 and 1. This is the fraction of values that will be repeated (mimicking a cache hit for example). If repeat_freq is 0.3, then 30% of the time we use a repeatable "standard value" that has been precomputed before the workload. The index we use is drawn from a Zipf distribution, which mirrors the actual distribution of web cache requests. The other 70% of the time, we generate a totally new pair of values, which likely won't have been sen before.

Example usage:
opensearch-benchmark execute-test --pipeline=benchmark-only --workload-path=/home/ec2-user/osb/opensearch-benchmark-workloads/modified_nyc_taxis --workload-params='{"requests_cache_enabled":"true", "repeat_freq":"0.3"}' --target-host=http://localhost:9200

Issues Resolved

Resolves #152.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Peter Alfonsi added 4 commits January 16, 2024 11:44
Peter Alfonsi added 2 commits January 16, 2024 13:02
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
@IanHoang
Copy link
Collaborator

Hi @peteralfonsi, sorry for the very late response. Thanks for contributing this. Is there a possibility to include this in the already existing version of NYC Taxis, perhaps as a test procedure? I see your intentions and understand that separating it into a separate workload might be cleaner, but I wonder if there's a way we can merge them together to avoid duplication of some elements. If that's not possible, I'm wondering if we can create a separate search corpora for this so that this stands alone as a workload.

Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment above. Let's work together on this to see if we can merge this into the existing NYC Taxis or add a new corpora to make this a stand-alone workload.

@IanHoang
Copy link
Collaborator

Closing for now. Feel free to reopen.

@IanHoang IanHoang closed this May 20, 2024
@peteralfonsi
Copy link
Contributor Author

@IanHoang Sorry for the late response - I didn't see this until now. I can look into doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PROPOSAL] Adding capabilities to introduce randomness in workload queries
2 participants