Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scaling Investigation] Stress Load Generation Host & Determine Max Clients Per Worker Actor #558

Open
IanHoang opened this issue Jun 20, 2024 · 0 comments
Assignees
Labels
Child Issue enhancement New feature or request

Comments

@IanHoang
Copy link
Collaborator

IanHoang commented Jun 20, 2024

Experiment 2:

This experiment is related to the scale testing RFC. For more details, see this RFC here.

To see other experiments in this analysis, see the META issue.

  • How many clients can OSB simulate on a single load generation host?
  • What is the max number of clients that a worker actor can have?

To answer the second and third questions, we will run rounds with load generation hosts with various physical CPU cores and GB of RAM to determine how many search clients a single load generation host can successfully simulate. Contrary to the advice of running OSB with an over-provisioned load generation host to avoid bottlenecks, we will focus on discovering at what point is the number of clients causing a bottleneck in the load generation host. It’s believed that a single physical CPU core can handle up to 2500 simulated clients. Knowing that the number of worker actors provisioned by OSB is constrained by the number vCPUs, it might be worth finding out at what point is there too many clients per workers.

It’s worth mentioning that this experiment has limited scope as it focuses only on a few instance types from a single instance family and specific cloud provider (AWS). We also suspect that finding a definitive answer to these questions will be difficult since the number of clients on a load generation host depends on other factors, such as workload type, SUT characteristics, network, and disk storage. However, there is still value in investigating this to get more clarity. Users will be able to use these results to make educated decisions when deciding how many physical cores of CPU and GB of RAM their load generation hosts should have and can avoid overpaying for over-provisioned load generation hosts.

The following rounds will be run. Any bottlenecks encountered will help us get a better idea of how many physical CPU cores and GB of RAM are needed to simulate N number of clients.

LG Hosts with OpenSearch Benchmark Simulated Clients (search_clients:N) Instance Type Instance Count vCPUs Memory (GB) Clients Per Worker Actor
Round 1 2500 c5.large 1 2 4 1250
Round 2 5000 c5.xlarge 1 4 8 2500
Round 3 10000 c5.2xlarge 1 8 16 5000
Round 4 20000 c5.4xlarge 1 16 32 10000
@IanHoang IanHoang added enhancement New feature or request untriaged labels Jun 20, 2024
@IanHoang IanHoang changed the title [Scale Testing] Experiment 2 [Scale Testing] Experiment 2: Max Clients Per Worker Jun 20, 2024
@IanHoang IanHoang changed the title [Scale Testing] Experiment 2: Max Clients Per Worker [Scale Testing] Experiment 2: Max Clients Per Worker Actor Jun 20, 2024
@IanHoang IanHoang self-assigned this Jun 20, 2024
@IanHoang IanHoang changed the title [Scale Testing] Experiment 2: Max Clients Per Worker Actor [Scaling Investigation] Experiment 2: Determine Max Clients Per Worker Actor Jul 24, 2024
@IanHoang IanHoang changed the title [Scaling Investigation] Experiment 2: Determine Max Clients Per Worker Actor [Scaling Investigation] Determine Max Clients Per Worker Actor Jul 24, 2024
@IanHoang IanHoang changed the title [Scaling Investigation] Determine Max Clients Per Worker Actor [Scaling Investigation] Stress Load Generation Host & Determine Max Clients Per Worker Actor Jul 24, 2024
@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Child Issue enhancement New feature or request
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

1 participant