Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor aggregate #708

Merged
merged 4 commits into from
Dec 18, 2024
Merged

Conversation

OVI3D0
Copy link
Member

@OVI3D0 OVI3D0 commented Dec 10, 2024

Description
Refactor the aggregate feature to make individual functions more single-purpose, as well as renaming variables and adding more explanatory comments.

Testing
New functionality includes testing
make test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@OVI3D0 OVI3D0 marked this pull request as ready for review December 10, 2024 22:37
# extract the necessary data from the first test execution, since the configurations should be identical for all test executions
return aggregated_results

def update_config_object(self, test_execution):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using type hints for methods in this class, we should ensure all methods have type hints:

def update_config_object(self, test_execution: TestExecution) -> None: 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, added type hints to all the functions now!

self.config.add(config.Scope.applicationOverride, "builder", "plugin.params", test_exe.plugin_params)
self.config.add(config.Scope.applicationOverride, "workload", "latency.percentiles", test_exe.latency_percentiles)
self.config.add(config.Scope.applicationOverride, "workload", "throughput.percentiles", test_exe.throughput_percentiles)
self.update_config_object(test_exe)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, makes build_aggregated_results() easier to read

all_jsons = [self.test_store.find_by_test_execution_id(id).results for id in self.test_executions.keys()]

# retrieve nested value from a dictionary given a key path
def get_nested_value(obj: Dict[str, Any], path: List[str]) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should avoid vague terms like obj or path if possible as it doesn't inform the reader what these are and might make them insert debugging statements to find out what they are. Possible replacements for obj could be json_obj, json_data, json_content, metric_data, nested_json, or others you might have. path works but might be better to make it plural so that users know it's a group of keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, renamed a bunch of variables in the aggregate_json_by_key function. Hopefully it's more clear now

@@ -66,8 +66,7 @@ def aggregate_helper(objects: List[Any]) -> Any:
if not objects:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Two comments here:

  • While naming it a helper is fine since it is restricted to the scope of aggregate_by_json_key, it would be better to name it something that more closely describes what it does.
  • Also, it would help to add a quick comment to each if statement describing what it's doing. Would improve the readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed + added comments now! lmk what you think

self.accumulated_iterations[test_execution_id][task_name] = iterations
else:
raise ValueError(f"Test procedure '{self.test_procedure_name}' not found in the loaded workload.")
for task in self.loaded_workload.find_test_procedure_or_default(self.test_procedure_name).schedule:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more clean!

self.config.add(config.Scope.applicationOverride, "workload", "latency.percentiles", test_execution.latency_percentiles)
self.config.add(config.Scope.applicationOverride, "workload", "throughput.percentiles", test_execution.throughput_percentiles)

def build_aggregated_results(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a return type hint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

@OVI3D0 OVI3D0 requested a review from IanHoang December 17, 2024 21:14
Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making these changes

@IanHoang IanHoang merged commit 477aad9 into opensearch-project:main Dec 18, 2024
10 checks passed
@OVI3D0 OVI3D0 deleted the refactor-aggregate branch December 18, 2024 18:22
OVI3D0 added a commit to OVI3D0/opensearch-benchmark that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants