Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15960 tests: Improvements for io_sys_admin test #14503

Merged
merged 9 commits into from
Aug 8, 2024

Conversation

saurabhtandan
Copy link
Contributor

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Use different set of oclasses for ior and mdtest.
Add intercept option in mdtest_test_base.
Use interception for POSIX runs in the test.

Test-tag: pr test_io_sys_admin

Signed-off-by: Saurabh Tandan <[email protected]>
    Use different set of oclasses for ior and mdtest.
    Add intercept option in mdtest_test_base.
    Use interception for POSIX runs in the test.

    Test-tag: pr test_io_sys_admin

Signed-off-by: Saurabh Tandan <[email protected]>
Copy link

github-actions bot commented Jun 4, 2024

Ticket title is 'io_sys_admin improvements'
Status is 'In Progress'
Labels: 'scrubbed_2.8,tds'
https://daosio.atlassian.net/browse/DAOS-15960

@saurabhtandan saurabhtandan changed the title Standan/daos 15960 DAOS-15960 tests: Improvements for io_sys_admin test Jun 4, 2024
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14503/1/testReport/

Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are flake and pylint errors

Comment on lines 140 to 141
def run_mdtest(self, manager, processes, display_space=True, pool=None, out_queue=None,
intercept=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't add another option for this. The test itself could just do

mdtest_cmd.env.update(LD_PRELOAD=intercept, D_IL_REPORT='1')

Comment on lines 70 to 71
- [SX, EC_2P1GX]
- [S1, EC_2P1G1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please split this into ior_object_class and mdtest_object_class so the config and logic is more clear?
And could we leave a comment saying something like file oclass, dir oclass?
Also, for mdtest we can use *X for the directory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we really intend to use SX/S1 for files and EC for directories?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this not file, dir? I think splitting into ior_object_class and mdtest_object_class will make this easier to understand

Comment on lines 64 to 67
for idx, _ in enumerate(object_class):
self.ior_cmd.dfs_oclass.update(ior_oclass[idx])
self.mdtest_cmd.dfs_oclass.update(mdtest_oclass[idx])
self.ior_cmd.dfs_dir_oclass.update(ior_oclass[idx])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is enumerating object_class but then using the index like ior_oclass[idx] which I think is confusing. What if object_class is longer than ior_oclass?

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14503/1/execution/node/1421/log

Comment on lines 37 to 38
if dir_oclass:
container.dir_oclass.update(dir_oclass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be bad if we do, e.g., container create --oclass S1 because then the DFS root object will be S1. Which means ALL client ranks will read root from a single server target.

We should instead do

container create --file-oclass S1

If you just want to change the file oclass

Test-tag: pr test_io_sys_admin test_largefilecount test_smallfilecount

Signed-off-by: Saurabh Tandan <[email protected]>
Test-tag: pr test_io_sys_admin test_largefilecount test_smallfilecount
Test-tag: pr test_io_sys_admin test_largefilecount test_smallfilecount

Signed-off-by: Saurabh Tandan <[email protected]>
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some lint errors that need to be resolved before merging

Comment on lines 23 to 24
file_oclass: file object class of container
dir_oclass: dir object class of container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_oclass: file object class of container
dir_oclass: dir object class of container
file_oclass (str, optional): file object class of container. Defaults to None
dir_oclass (str, optional): dir object class of container. Defaults to None

Comment on lines 45 to 49
"""
rd_fac: redundancy factor

Returns: value for dir_oclass
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
rd_fac: redundancy factor
Returns: value for dir_oclass
"""
"""
Args
rd_fac (int): redundancy factor
Returns:
str: value for dir_oclass
"""

try:
self.processes = mdtest_np
self.ppn = mdtest_ppn
self.execute_mdtest()
if self.mdtest_cmd.api.value == 'POSIX':
self.mdtest_cmd.env.update(LD_PRELOAD=intercept, D_IL_REPORT='1')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to force all tests using this test base to use interception. Is that really what we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what we want as of now.

@@ -95,6 +117,8 @@ def run_file_count(self):
self.ior_cmd.api.update('HDF5')
self.run_ior_with_pool(
create_pool=False, plugin_path=hdf5_plugin_path, mount_dir=mount_dir)
elif self.ior_cmd.api.value == 'POSIX':
self.run_ior_with_pool(create_pool=False, intercept=intercept)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar - this is going to force all tests using this test base to use interception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what we want as of now.

Test-tag: pr test_io_sys_admin test_largefilecount test_smallfilecount

Signed-off-by: Saurabh Tandan <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14503/4/testReport/

if api == "DFS":
self.mdtest_cmd.test_dir.update("/")
if self.mdtest_cmd.api.value in ['DFS', 'POSIX']:
for oclass in mdtest_oclass:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better if the config did this?

    mdtest_oclass:
      # file, dir
      - [S1, SX]
      - [EC_2P1G1, RP_2GX]

And then the code could do

for file_oclass, dir_oclass in mdtest_oclass:

Which would

  1. Allow the dir oclass to be configurable without modifying the code
  2. Make it easier to understand what is being ran when looking at the config. Right now you have to dig into the code because it's hardcoded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, let's ignore this for now since you will be OOO soon

Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a test failure
https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-14503/4/tests

Because this function is hardcoded to use -F

self.run_ior_with_params(
"DAOS", daos_path, read_back_pool, read_back_cont,
flags="-r -R -F -k")

But this PR removed -F

flags: "-v -D 300 -W -w -r -R"

One solution is to update that function to be dynamic instead of hardocded

# Original flags used for write
flags = self.ior_cmd.flags.value

# Remove read and write from flags if present
flags = re.sub(" *-r", "", flags)
flags = re.sub(" *-R", "", flags)
flags = re.sub(" *-w", "", flags)
flags = re.sub(" *-W", "", flags)

# Remove stonewall
flags = re.sub(" *-D [0-9]+", "", flags)

# Add read flags
flags += " -r -R"


self.run_ior_with_params(
    "DAOS", daos_path, read_back_pool, read_back_cont,
    flags=flags)

Test-tag: pr test_io_sys_admin test_largefilecount test_smallfilecount test_basic_checkout_dm

Signed-off-by: Saurabh Tandan <[email protected]>
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming tests pass

@daltonbohning daltonbohning marked this pull request as ready for review August 8, 2024 18:53
@daltonbohning daltonbohning requested review from a team as code owners August 8, 2024 18:53
@daltonbohning daltonbohning added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Aug 8, 2024
@daltonbohning daltonbohning requested a review from a team August 8, 2024 18:58
@daltonbohning daltonbohning merged commit a755552 into master Aug 8, 2024
55 checks passed
@daltonbohning daltonbohning deleted the standan/DAOS-15960 branch August 8, 2024 18:58
saurabhtandan added a commit that referenced this pull request Aug 8, 2024
Use different set of oclasses for ior and mdtest.
Add intercept option in mdtest_test_base.
Use interception for POSIX runs in the test.

Test-tag: pr test_io_sys_admin test_largefilecount test_smallfilecount test_basic_checkout_dm

Signed-off-by: Saurabh Tandan <[email protected]>
daltonbohning pushed a commit that referenced this pull request Aug 8, 2024
Use different set of oclasses for ior and mdtest.
Add intercept option in mdtest_test_base.
Use interception for POSIX runs in the test.

Test-tag: pr test_io_sys_admin test_largefilecount test_smallfilecount test_basic_checkout_dm

Signed-off-by: Saurabh Tandan <[email protected]>
daltonbohning pushed a commit that referenced this pull request Aug 12, 2024
Use different set of oclasses for ior and mdtest.
Add intercept option in mdtest_test_base.
Use interception for POSIX runs in the test.

Signed-off-by: Saurabh Tandan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed.
Development

Successfully merging this pull request may close these issues.

4 participants