Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Child Issue] Exploring Distributed Workload Generation #506

Open
3 tasks
IanHoang opened this issue Apr 4, 2024 · 2 comments
Open
3 tasks

[Child Issue] Exploring Distributed Workload Generation #506

IanHoang opened this issue Apr 4, 2024 · 2 comments
Assignees
Labels
Child Issue enhancement New feature or request

Comments

@IanHoang
Copy link
Collaborator

IanHoang commented Apr 4, 2024

This is a subtask for Distributed Workload Generation Analysis and Scale Testing. For more details on this subtask, see this meta issue.

We're aware that OSB has limitations, but we don't know exactly under what circumstances they occur. For example, it's known that OSB can test single node and small multi node (3,4,5 etc.) OpenSearch clusters. However, when OSB tests a really large cluster (unclear on how large but let's just say 20 nodes), it tends to fail. To combat this, OSB has a feature called Distributed Workload Generation (DWG) which basically allows users to get over that hurdle of testing large 20 node clusters.

Questions

  • When using DWG against timeseries workloads, does OSB break up corpora by big chunks and assigns each client a chunk to ingest or do we use many clients and ingest sequentially (i.e. if client finishes first, it picks up the next line of the corpus)?
  • How many nodes can we use for DWG? Is it only max 3 nodes?
  • What parameters do we need to ensure that users include when using DWG?
  • Does running DWG on a single node cluster or small multi node clusters perform better than just using standard OSB (1 node) ?

Test Plan

Each test setup should be run 5 times to get consistent data.

  • 1. Run OSB without DWG: Run a single load driver with big5 workload against a large cluster (15+ node cluster). This will be our baseline to understand how OSB distributes the chunks of corpora with a single load driver.
  • 2. Run OSB with DWG: Run with two load drivers and one coordinator with big5 workload against a large cluster (15+ node cluster).
  • 3. How many DWG nodes can OSB run with: Run with 3 load drivers, then 5 load drivers, and then try 10 load drivers against a large cluster (15+ node cluster).
@IanHoang IanHoang added the enhancement New feature or request label Apr 4, 2024
@IanHoang IanHoang self-assigned this Apr 4, 2024
@IanHoang IanHoang changed the title Exploring Distributed Workload Generation [Child Issue] Exploring Distributed Workload Generation Apr 4, 2024
@gkamat gkamat moved this from Todo to In Progress in Performance Roadmap May 15, 2024
@gkamat gkamat moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Jun 4, 2024
@IanHoang
Copy link
Collaborator Author

Closing this for now. This is something we might look at in the future once we have a better understanding of scale testing. #555

@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap Jun 20, 2024
@gkamat gkamat reopened this Aug 14, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in Performance Roadmap Aug 14, 2024
@gkamat
Copy link
Collaborator

gkamat commented Aug 14, 2024

Reopening issue, since there is an intention to look into this in more detail after the scale-up investigation has concluded.

@gkamat gkamat removed the untriaged label Aug 14, 2024
@gkamat gkamat moved this from In Progress to Now (This Quarter) in Performance Roadmap Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Child Issue enhancement New feature or request
Projects
Status: Next Quarter
Status: Not started
Status: Now (This Quarter)
Development

No branches or pull requests

2 participants