Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce common operations directory for OSB workloads #514

Merged
merged 5 commits into from
Dec 12, 2024

Conversation

OVI3D0
Copy link
Member

@OVI3D0 OVI3D0 commented Nov 20, 2024

Description

This change consolidates all of the common operations in OSB workloads into one shared directory. Now, rather than repeating these operations multiple times in each workload, they can simply all be pulled in with one line:
{{ benchmark.collect(parts="../../shared_operations/common_workload_setup.json") }},

Issues Resolved

#489

Testing

  • New functionality includes testing

Tested by importing the common ops file into other workloads and executing tests

Backport to Branches:

  • 6
  • 7
  • 1
  • 2
  • 3

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@IanHoang
Copy link
Collaborator

Had some discussions offline with @OVI3D0. Michael proposed that instead of creating a single shared_operations file that contains the common setup steps in each workload, we should go forward with a modular approach. However, although most workloads have the same chronological steps, some workloads are missing some steps.

Adding comments here for visibility:

do you think it’s maybe worth making these common operations at the beginning of each workload all exactly the same?
I went through all the original workloads (aside from semantic search workload and vectorsearch workload) and most have the same structure:

  1. delete-index
  2. create-index
  3. check-cluster-health
  4. index-append
  5. refresh-after-index
  6. force-merge
  7. refresh-after-force-merge
  8. wait-until-merge-finishes

Some workloads like eventdata or so are missing one of these steps (like a refresh after force merge or refresh after indexing) but let's go with your idea of streamlining them to be the same. Refreshes should not impact or change the behavior of the workload. I think one of the workloads has like a put-settings operation before it deletes the index. You're probably aware but just be sure to keep that operation.

To add to your idea, what do you think of doing this?

  • Have individual files for each shared operation. One for delete_index.json, one for create_index.json, etc.
  • Have one file that combines all of these with the benchmark.collect. This file will exist in the shared_operations folder and can be called common_workload_setup.json or something else that you think is appropriate. This file will be inserted into the test_procedures of the official workloads. That way, we don't need to write the following in each workload:
{{ benchmark.collect(parts="../../shared_operations/delete_index.json")}}
{{ benchmark.collect(parts="../../shared_operations/create_index.json")}}
{{ benchmark.collect(parts="../../shared_operations/check_cluster_health.json")}}

# Instead each workload will have this in test procedures at the beginning
{{ benchmark.collect(parts="../../shared_operations/common_workload_setup.json")}}

@OVI3D0 OVI3D0 force-pushed the main branch 2 times, most recently from 80c8492 to f7d5cfa Compare November 22, 2024 21:53
{
"operation": {
"operation-type": "force-merge",
"request-timeout": 7200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OVI3D0 Could you add these params to allow users to manage segments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't see this. Added now👍

@OVI3D0
Copy link
Member Author

OVI3D0 commented Dec 4, 2024

After some more discussions offline with @IanHoang, we came up with a new configuration for the common operations.

create-index typically takes an index_settings param, with different default settings depending on the test procedure.

with this latest revision, the default index_settings value has been renamed to a new default_index_settings value; and after a change to the create_index operation, if no index_settings value is passed, OSB should fall back to the default_index_settings value. After testing this and running a curl for my cluster settings, I can see the default settings are still being passed through.

This will allow us to reduce these common operations down to something like this:

        {% with default_index_settings={
          "index.sort.field": ["country_code.raw", "admin1_code.raw"],
          "index.sort.order": ["asc", "asc"]
        } %}
        {{ benchmark.collect(parts="../../common_operations/common_workload_setup.json") }}
        {% endwith %}

In the case of the geonames workload, this will reduce the file size by half!

@OVI3D0 OVI3D0 requested a review from IanHoang December 4, 2024 21:45
{
"operation": {
"operation-type": "create-index",
"settings": {{index_settings | default(default_index_settings | default({})) | tojson}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is default({})) necessary if the workload has nothing specified?

{% with default_index_settings={} %}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that when I removed that final default({}) I was getting jinja2 templating errors, even if I specified default_settings={} in the geonames workload

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary because when default_index_settings is not specified, the {} is needed for the JSON to be valid.

Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments but the rest looks good to me

@OVI3D0 OVI3D0 requested a review from IanHoang December 9, 2024 23:00
Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@gkamat gkamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will really improve the workloads, thanks for working on it @OVI3D0. Will this model be applied to all the workloads in a separate check-in? I notice that only geonames is enhanced here.

{
"operation": {
"operation-type": "create-index",
"settings": {{index_settings | default(default_index_settings | default({})) | tojson}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary because when default_index_settings is not specified, the {} is needed for the JSON to be valid.

common_operations/common_workload_setup.json Outdated Show resolved Hide resolved
{
"operation": {
"operation-type": "force-merge",
"request-timeout": 7200{%- if max_num_segments is defined %},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you feel request_timeout should be parameterized here, so someone who wants to override it can do it? It's unlikely, but you never know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the opinion that there's no harm in parameterizing most of the workload values. I added a param for request-timeout in the latest revision. Let me know what you think!

@@ -0,0 +1,10 @@
{
"operation": "index-append",
"warmup-time-period": 120,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion as the force merge request timeout.. If you do add parameters, the README's will need to be updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the request-timeout comment, I added a param for this as well.

@OVI3D0
Copy link
Member Author

OVI3D0 commented Dec 11, 2024

This change will really improve the workloads, thanks for working on it @OVI3D0. Will this model be applied to all the workloads in a separate check-in? I notice that only geonames is enhanced here.

Yeah I will change the other workloads in separate PR's, I wanted to include geonames here as an example of the changes I plan to apply.

@OVI3D0 OVI3D0 requested a review from gkamat December 11, 2024 22:30
{
"operation": {
"operation-type": "force-merge",
"request-timeout": {{ request_timeout | default(60) | tojson }}{%- if max_num_segments is defined %},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to make this explicit, since it applies only to the force-merge operation. So calling it something like force_merge_request_timeout will make it clear it does not apply to all operations in general.

I'd do this in the next set of changes, just so this PR can go in as is.

@OVI3D0 OVI3D0 added backport 2 Backport to the "2" branch backport 1 backport 3 Backport to the "3" branch backport 7 Backport to the "7" branch labels Dec 12, 2024
@OVI3D0 OVI3D0 merged commit 5119393 into opensearch-project:main Dec 12, 2024
6 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* refactor common ops changes based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* rework common ops configuration

Signed-off-by: Michael Oviedo <[email protected]>

* add missing params and update geonames workload

Signed-off-by: Michael Oviedo <[email protected]>

* parameterize request-timeout and warmup-time-period

Signed-off-by: Michael Oviedo <[email protected]>

---------

Signed-off-by: Michael Oviedo <[email protected]>
(cherry picked from commit 5119393)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* refactor common ops changes based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* rework common ops configuration

Signed-off-by: Michael Oviedo <[email protected]>

* add missing params and update geonames workload

Signed-off-by: Michael Oviedo <[email protected]>

* parameterize request-timeout and warmup-time-period

Signed-off-by: Michael Oviedo <[email protected]>

---------

Signed-off-by: Michael Oviedo <[email protected]>
(cherry picked from commit 5119393)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* refactor common ops changes based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* rework common ops configuration

Signed-off-by: Michael Oviedo <[email protected]>

* add missing params and update geonames workload

Signed-off-by: Michael Oviedo <[email protected]>

* parameterize request-timeout and warmup-time-period

Signed-off-by: Michael Oviedo <[email protected]>

---------

Signed-off-by: Michael Oviedo <[email protected]>
(cherry picked from commit 5119393)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* refactor common ops changes based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* rework common ops configuration

Signed-off-by: Michael Oviedo <[email protected]>

* add missing params and update geonames workload

Signed-off-by: Michael Oviedo <[email protected]>

* parameterize request-timeout and warmup-time-period

Signed-off-by: Michael Oviedo <[email protected]>

---------

Signed-off-by: Michael Oviedo <[email protected]>
(cherry picked from commit 5119393)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
OVI3D0 pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian



* refactor common ops changes based on discussion with ian



* rework common ops configuration



* add missing params and update geonames workload



* parameterize request-timeout and warmup-time-period



---------


(cherry picked from commit 5119393)

Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
OVI3D0 pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian



* refactor common ops changes based on discussion with ian



* rework common ops configuration



* add missing params and update geonames workload



* parameterize request-timeout and warmup-time-period



---------


(cherry picked from commit 5119393)

Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
OVI3D0 pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian



* refactor common ops changes based on discussion with ian



* rework common ops configuration



* add missing params and update geonames workload



* parameterize request-timeout and warmup-time-period



---------


(cherry picked from commit 5119393)

Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
OVI3D0 pushed a commit that referenced this pull request Dec 12, 2024
* restructure common operations folder based on discussion with ian



* refactor common ops changes based on discussion with ian



* rework common ops configuration



* add missing params and update geonames workload



* parameterize request-timeout and warmup-time-period



---------


(cherry picked from commit 5119393)

Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
joshpalis pushed a commit to joshpalis/opensearch-benchmark-workloads that referenced this pull request Dec 18, 2024
…roject#514)

* restructure common operations folder based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* refactor common ops changes based on discussion with ian

Signed-off-by: Michael Oviedo <[email protected]>

* rework common ops configuration

Signed-off-by: Michael Oviedo <[email protected]>

* add missing params and update geonames workload

Signed-off-by: Michael Oviedo <[email protected]>

* parameterize request-timeout and warmup-time-period

Signed-off-by: Michael Oviedo <[email protected]>

---------

Signed-off-by: Michael Oviedo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1 backport 2 Backport to the "2" branch backport 3 Backport to the "3" branch backport 7 Backport to the "7" branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants