Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add understanding workloads section #6164

Merged
merged 21 commits into from
Jan 30, 2024
Merged

Add understanding workloads section #6164

merged 21 commits into from
Jan 30, 2024

Conversation

Naarcha-AWS
Copy link
Collaborator

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
@Naarcha-AWS Naarcha-AWS self-assigned this Jan 18, 2024
@Naarcha-AWS Naarcha-AWS added the 3 - Tech review PR: Tech review in progress label Jan 18, 2024
@Naarcha-AWS Naarcha-AWS marked this pull request as ready for review January 18, 2024 17:55
@Naarcha-AWS Naarcha-AWS added the backport 2.11 PR: Backport label for 2.11 label Jan 18, 2024

## General search clusters

For benchmarking clusters built for general search use cases, start with [nyc_taxis](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/nyc_taxis). The `nyc_taxis` workload data about the rides performed by yellow taxis in New York in 2015.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"workload data about" -> "workload contains data about"

Consider the following when deciding which workload would work best for benchmarking your cluster:

- Consider the use case of your cluster.
- Consider what data types your cluster uses by comparing it the data structure of the documents contained in the workload. Each workload contains an example document so you can compare data types. Also, you can go to `index.json` file in the workload to see the data type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to see the data. type" --> "to see the index mappings and data types"


## _operations and _test-procedures

To make the workload more human-readable, operations and test procedures are seperated into two different directors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"different directors" -> "different directories"


The index.json file contains all of the data mappings and parameters used to index any documents contained inside the workload, as well as the index settings needed when the `create-index` operations in run during the workload.

For example, in the `nyc_taxis` workload, the `settings` array gives you the ability to customize the number of shards, replicas, and tells the index whether to cache queries or requests. All mappings are based off of a single document, usually called in the `files.txt` file, and includes each mapping parameter and its format, as shown in the following example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rephrase these two paragraphs to something such as:

When OSB creates an index for the workload, it uses the index settings and mappings template found in index.json. Mappings in index.json are based off of the mappings of a single document from the workload's corpus, which can be found in files.txt. For example, the following is the index.json for nyc_taxis workload. Users can customize fields such as number_of_shards, number_of_replicas, query_cache_enabled, and requests_cache_enabled. 


### Schedule

The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rephrase this to something like:

The schedule element contains a list of operations that are run according to the order they appear in. 

Copy link
Contributor

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

@Naarcha-AWS Naarcha-AWS added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Jan 25, 2024
@Naarcha-AWS Naarcha-AWS mentioned this pull request Jan 29, 2024
1 task
Copy link
Collaborator

@vagimeli vagimeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naarcha-AWS @natebower Please see my review comments. Thank you, Melissa

Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
@Naarcha-AWS Naarcha-AWS added 6 - Done but waiting to merge PR: The work is done and ready to merge and removed 4 - Doc review PR: Doc review in progress labels Jan 30, 2024
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naarcha-AWS Nice job on this 😄. Please tag me on the rewrites of lines 107 and 161 in the second file so that I can verify them. Thanks!

Naarcha-AWS and others added 9 commits January 30, 2024 10:42
Co-authored-by: Nathan Bower <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
@Naarcha-AWS Naarcha-AWS merged commit bf4ae72 into main Jan 30, 2024
4 checks passed
@Naarcha-AWS Naarcha-AWS deleted the understanding-workloads branch January 30, 2024 17:50
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 30, 2024
* Add understanding workloads section.

Signed-off-by: Naarcha-AWS <[email protected]>

* Add additional anatomy sections

Signed-off-by: Naarcha-AWS <[email protected]>

* Add section headers

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix link

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix typos

Signed-off-by: Naarcha-AWS <[email protected]>

* Change example to fix build error.

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update anatomy-of-a-workload.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix build errors

Signed-off-by: Naarcha-AWS <[email protected]>

* Update anatomy-of-a-workload.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update anatomy-of-a-workload.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update concepts.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update index.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit bf4ae72)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Naarcha-AWS pushed a commit that referenced this pull request Jan 30, 2024
* Add understanding workloads section.



* Add additional anatomy sections



* Add section headers



* Fix link



* Fix typos



* Change example to fix build error.



* Apply suggestions from code review



* Apply suggestions from code review



* Apply suggestions from code review




* Apply suggestions from code review




* Apply suggestions from code review





* Apply suggestions from code review



* Update anatomy-of-a-workload.md



* Fix build errors



* Update anatomy-of-a-workload.md



* Update anatomy-of-a-workload.md



* Update concepts.md



* Update index.md



---------





(cherry picked from commit bf4ae72)

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 6 - Done but waiting to merge PR: The work is done and ready to merge labels Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete backport 2.11 PR: Backport label for 2.11
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants