Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OSB doesn't show benchmark progress on stdout #444

Closed
CaptainDredge opened this issue Jan 23, 2024 · 5 comments
Closed

[BUG] OSB doesn't show benchmark progress on stdout #444

CaptainDredge opened this issue Jan 23, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@CaptainDredge
Copy link

CaptainDredge commented Jan 23, 2024

Describe the bug
Opensearch benchmark doesn't show which test-procedure its executing and what's the progress on it on stdout. Initially, it prints the usual design of Opensearch benchmark on the terminal but then doesn't print anything afterwards. Although, by monitoring logs we can see that the benchmark is running.

Just this gets printed
Screenshot 2024-01-23 at 3 06 15 PM

To Reproduce

  1. Install osb 1.1.0 version
  2. Set benchmark.ini file as
config.version = 17

[system]
env.name = local

[node]
root.dir = /home/ec2-user/.benchmark/benchmarks
src.root.dir = /home/ec2-user/.benchmark/benchmarks/src

[source]
remote.repo.url = https://github.com/opensearch-project/OpenSearch.git
opensearch.src.subdir = opensearch

[benchmarks]
local.dataset.cache = /home/ec2-user/.benchmark/benchmarks/data

[results_publishing]
datastore.type = opensearch
datastore.host = <IP>
datastore.port = 443
datastore.secure = True
datastore.user = <user>
datastore.password = <password>
datastore.ssl.verification_mode = none

[workloads]
default.url = https://github.com/opensearch-project/opensearch-benchmark-workloads

[provision_configs]
default.dir = default-provision-config

[defaults]
preserve_benchmark_candidate = false

[distributions]
release.cache = true
  1. Run any benchmark. eg. command
opensearch-benchmark execute-test --pipeline="benchmark-only" --workload-path="/home/ec2-user/.benchmark/benchmarks/data/so" --test-execution-id="so_flush_false_ingest_2_parallel_5per_1" --target-hosts="<DNS>" --client-options="basic_auth_user:'<user>',basic_auth_password:'<passwd>',timeout:60" --telemetry-params='telemetry-params.json' --telemetry="node-stats,segment-stats"  --workload-params="number_of_shards:5,number_of_replicas:0,bulk_size:4000,bulk_indexing_clients:2,ingest_percentage:5" --distribution-version="2.3.0" --results-format csv --results-file ~/so_flush_update_false_ingest_2_parallel_5per --user-tag "intention:so-flush-false" --kill-running-processes"

Expected behavior
Expected that osb shows benchmark progress on stdout

More Context (please complete the following information):

  • Workload: nyc_taxis
  • Service: Opensearch
  • Version: 1.1.0
@CaptainDredge CaptainDredge added bug Something isn't working untriaged labels Jan 23, 2024
@IanHoang
Copy link
Collaborator

Hi @CaptainDredge, thanks for bringing this to our attention. To help us understand the situation better, could you address these follow-up questions:

  • To see if this is an issue with --kill-running-processes, could you run ps aux | grep opensearch-benchmark and ensure there are no lingering PIDs before running a test? In the past, when users encountered "OpenSearch Benchmark" logo without the progress in the stdout, it was caused be lingering processes for a previous run. Many users saw this despite appending --kill-running-processes in their commands. However, this shouldn't happen anymore since we introduced a fix for this a while back.
  • Have you tried this on OSB 1.2.0 or 1.0.0? Curious to see if it's isolated to the version you're running on.

@CaptainDredge
Copy link
Author

To see if this is an issue with --kill-running-processes, could you run ps aux | grep opensearch-benchmark and ensure there are no lingering PIDs before running a test? In the past, when users encountered "OpenSearch Benchmark" logo without the progress in the stdout, it was caused be lingering processes for a previous run. Many users saw this despite appending --kill-running-processes in their commands. However, this shouldn't happen anymore since we introduced a fix for this a while back.

Yes @IanHoang there were lingering osb process. Once I killed them manually progress started coming on stdout. So essentially --kill-running-processes isn't working as expected. Can you point me to the fix for this?

Have you tried this on OSB 1.2.0 or 1.0.0? Curious to see if it's isolated to the version you're running on.

I tested it on 0.1 and 1.1 and it happens on both. I don't have the setup to test for different versions

@IanHoang
Copy link
Collaborator

Could you provide the exact command you’re running ? When you curl the cluster, are you seeing docs actually being ingested? When you check ~/.benchmark/logs/benchmark.log are you seeing the operations being run?

Ran a few tests recently on OSB 1.2.0 and on OSB 1.1.0 but am seeing progress and output:

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[INFO] Executing test with workload [movies-new], test_procedure [default-test-procedure] and provision_config_instance ['external'] with version [2.5.0].

Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running index-append                                                           [100% done]
Running refresh-after-index                                                    [100% done]
Running default                                                                [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |         Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |              |     2.62908 |    min |
|             Min cumulative indexing time across primary shards |              |           0 |    min |
|          Median cumulative indexing time across primary shards |              | 0.000891667 |    min |
|             Max cumulative indexing time across primary shards |              |      0.5247 |    min |
|            Cumulative indexing throttle time of primary shards |              |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |              |           0 |    min |
| Median cumulative indexing throttle time across primary shards |              |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |              |           0 |    min |
|                        Cumulative merge time of primary shards |              |   0.0424667 |    min |
|                       Cumulative merge count of primary shards |              |          12 |        |
|                Min cumulative merge time across primary shards |              |           0 |    min |
|             Median cumulative merge time across primary shards |              |           0 |    min |
|                Max cumulative merge time across primary shards |              |   0.0108167 |    min |
|               Cumulative merge throttle time of primary shards |              |           0 |    min |
|       Min cumulative merge throttle time across primary shards |              |           0 |    min |
|    Median cumulative merge throttle time across primary shards |              |           0 |    min |
|       Max cumulative merge throttle time across primary shards |              |           0 |    min |
|                      Cumulative refresh time of primary shards |              |    0.259283 |    min |
|                     Cumulative refresh count of primary shards |              |         927 |        |
|              Min cumulative refresh time across primary shards |              |           0 |    min |
|           Median cumulative refresh time across primary shards |              | 0.000266667 |    min |
|              Max cumulative refresh time across primary shards |              |   0.0418833 |    min |
|                        Cumulative flush time of primary shards |              |     0.00195 |    min |
|                       Cumulative flush count of primary shards |              |          28 |        |
|                Min cumulative flush time across primary shards |              |           0 |    min |
|             Median cumulative flush time across primary shards |              |           0 |    min |
|                Max cumulative flush time across primary shards |              |     0.00195 |    min |
|                                        Total Young Gen GC time |              |           0 |      s |
|                                       Total Young Gen GC count |              |           0 |        |
|                                          Total Old Gen GC time |              |           0 |      s |
|                                         Total Old Gen GC count |              |           0 |        |
|                                                     Store size |              |    0.149794 |     GB |
|                                                  Translog size |              | 0.000391557 |     GB |
|                                         Heap used for segments |              |           0 |     MB |
|                                       Heap used for doc values |              |           0 |     MB |
|                                            Heap used for terms |              |           0 |     MB |
|                                            Heap used for norms |              |           0 |     MB |
|                                           Heap used for points |              |           0 |     MB |
|                                    Heap used for stored fields |              |           0 |     MB |
|                                                  Segment count |              |          75 |        |
|                                                 Min Throughput | index-append |     2645.87 | docs/s |
|                                                Mean Throughput | index-append |     2645.87 | docs/s |
|                                              Median Throughput | index-append |     2645.87 | docs/s |
|                                                 Max Throughput | index-append |     2645.87 | docs/s |
|                                        50th percentile latency | index-append |     334.671 |     ms |
|                                       100th percentile latency | index-append |     347.148 |     ms |
|                                   50th percentile service time | index-append |     334.671 |     ms |
|                                  100th percentile service time | index-append |     347.148 |     ms |
|                                                     error rate | index-append |           0 |      % |
|                                                 Min Throughput |      default |        4.09 |  ops/s |
|                                                Mean Throughput |      default |        4.09 |  ops/s |
|                                              Median Throughput |      default |        4.09 |  ops/s |
|                                                 Max Throughput |      default |        4.09 |  ops/s |
|                                       100th percentile latency |      default |     449.604 |     ms |
|                                  100th percentile service time |      default |     204.521 |     ms |
|                                                     error rate |      default |           0 |      % |


--------------------------------
[INFO] SUCCESS (took 21 seconds)

@gkamat
Copy link
Collaborator

gkamat commented Jan 31, 2024

This is likely fixed in OSB 1.2 with the following commit. The setup for 1.2 should be no different than that for earlier versions.

2023-11-03 00:04:20 -0700  a0adc906  Govind Kamat
Fix the predicate used for the --kill-running-processes flag.

@IanHoang
Copy link
Collaborator

Going to close this since no activity since January

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants