Provide a capability to increase the data corpus size for a workload #254

gkamat · 2023-04-04T16:54:06Z

The data corpora supplied with the included workloads are generally small, under ~75 GB. They do not suffice for performance testing larger clusters, scale testing and longevity testing.

Since acquiring larger data sets is not straightforward, it would be helpful to be able to provide some mechanism to increase the corpus size for a workload. This could be done through duplicating (and appropriately modifying) the existing documents in the corpus or by synthesizing documents.

dblock · 2023-04-25T16:39:28Z

Is this a dup/subset of #253?

gkamat · 2023-05-25T16:39:38Z

Is this a dup/subset of #253?

It is a child issue, referenced in that one. Additional issues will be added to that parent issue as work progresses on these items.

gkamat · 2023-05-25T16:44:14Z

This capability is now available for the http_logs workload There are several enhancements possible, including modifying the documents and regenerating queries, adding a similar capability to the other workloads, etc., but those will be addressed in dedicated issues that will be opened separately.

Closing this one.

gkamat added the enhancement New feature or request label Apr 4, 2023

gkamat self-assigned this Apr 4, 2023

github-actions bot added the untriaged label Apr 4, 2023

gkamat added this to OpenSearch Engineering Effectiveness Apr 4, 2023

github-project-automation bot moved this to Backlog in OpenSearch Engineering Effectiveness Apr 4, 2023

gkamat moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Apr 4, 2023

gkamat removed the untriaged label Apr 4, 2023

gkamat mentioned this issue Apr 4, 2023

Auto-generated / custom datasets for workloads #99

Closed

3 tasks

IanHoang added the High Priority label May 9, 2023

gkamat moved this from In Progress to Done in OpenSearch Engineering Effectiveness May 25, 2023

gkamat mentioned this issue May 25, 2023

[RFC] Enhancements for OSB Workloads #253

Open

gkamat closed this as completed May 25, 2023

github-project-automation bot added this to OpenSearch Benchmark Roadmap Aug 30, 2024

github-project-automation bot moved this to Completed in OpenSearch Benchmark Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a capability to increase the data corpus size for a workload #254

Provide a capability to increase the data corpus size for a workload #254

gkamat commented Apr 4, 2023

dblock commented Apr 25, 2023

gkamat commented May 25, 2023

gkamat commented May 25, 2023

Provide a capability to increase the data corpus size for a workload #254

Provide a capability to increase the data corpus size for a workload #254

Comments

gkamat commented Apr 4, 2023

dblock commented Apr 25, 2023

gkamat commented May 25, 2023

gkamat commented May 25, 2023