-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generated / custom datasets for workloads #99
Comments
Hopefully those datasets aren't "random", but maybe "configurable or auto-generated data sets"? |
@dblock : Agreed - updated the issue. |
@dblock @achitojha Is the plan here to create automated workloads (Ex: using |
We don’t have a finalized approach for this. The above suggestion could be a possible step forward. Another option here could be to specify certain attributes and have OpenSearch benchmark general a data-set based on the specified attributes. There may still be more alternate approaches |
Initial focus is on providing a capability to increase the data corpus size for a workload. |
One thing that I wanted to add is it would be handy if in this or future versions we could add some random-ness that could be used to demonstrate anomaly detection. |
I'd like to be able to generate such data as "1B IP addresses skewed towards US-based IPs" (or other data that flows some statistical distribution). Maybe there are existing tools that can do that well? |
Is this a dup/subset of #253? |
@dblock yes, this is technically a duplicate/subset of RFC as this issue was created before. The RFC dives deeper. |
Let's close! |
OpenSearch Benchmark workloads currently use an existing dataset to ingest records into OpenSearch/ElasticSearch. The goal for this task is to build support for automatically generated datasets for a workload. This would enable workloads to have large auto-generated datasets without requiring any specific data.
Acceptance Criteria
The text was updated successfully, but these errors were encountered: