Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt slurm checks to alps #253

Open
wants to merge 21 commits into
base: alps
Choose a base branch
from
Open

Conversation

ekouts
Copy link
Collaborator

@ekouts ekouts commented Dec 19, 2024

No description provided.

@ekouts ekouts self-assigned this Dec 19, 2024
@ekouts ekouts requested review from teojgo and jgphpc January 7, 2025 08:24
Copy link
Collaborator

@teojgo teojgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change that I would add is the copyring year to have 2025 in it.

@@ -11,7 +11,7 @@
class HaswellFmaCheck(rfm.CompileOnlyRegressionTest):
def __init__(self):
self.descr = 'check for avx2 instructions'
self.valid_systems = ['dom:login', 'daint:login']
self.valid_systems = []
Copy link
Collaborator

@jgphpc jgphpc Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ we no longer have haswell systems
2/ this test will fail on alps:

Failed to find the following module(s): "craype-haswell"

do you want to update it (using procinfo) or delete the test ? + it seems that the roadmap for cpe on alps is in CE

])

def __init__(self):
self.descr = f'LibSciAcc symlink check of {self.lib_name}'
self.valid_systems = [
'daint:login', 'daint:gpu',
'dom:login', 'dom:gpu',
'daint:login', 'daint:normal'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not portable

@@ -11,8 +11,7 @@ class LibSciResolveBaseTest(rfm.CompileOnlyRegressionTest):
sourcesdir = 'src/libsci_resolve'
sourcepath = 'libsci_resolve.f90'
executable = 'libsciresolve.x'
valid_systems = ['daint:login', 'daint:gpu', 'dom:login', 'dom:gpu']
modules = ['craype-haswell']
valid_systems = ['daint:login', 'daint:normal']
maintainers = ['AJ', 'LM']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update maintainers please

@@ -23,7 +22,7 @@ def set_postbuild_cmds(self):

@rfm.simple_test
class NvidiaResolveTest(LibSciResolveBaseTest):
accel_nvidia_version = parameter(['60'])
accel_nvidia_version = parameter(['90'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe an opportunity to deactivate the mkl test (below) ?

Copy link
Collaborator

@jgphpc jgphpc Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • line 41

You can't view this issue

looks like the issue is that you must load cray-libsci_acc too.
On alps, it fails with:

At least one of these module(s) must be loaded: nvhpc PrgEnv-nvhpc nvhpc-mixed atleast("cudatoolkit","12.0")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex needs an update:

0x0000000000000001 (NEEDED)             Shared library: [libsci_cray.so.6]
0x0000000000000001 (NEEDED)             Shared library: [libsci_gnu_123.so.6]

'CRAY_XPMEM:CRAY_DMAPP:CRAY_PMI:CRAY_UGNI:'
'CRAY_UDREG:CRAY_LIBSCI:CRAYPE:CRAY:'
'PERFTOOLS:CRAYPAT'),
'PE_PRODUCT_LIST': ('CRAYPE_ARM_GRACE:CRAYPE:PERFTOOLS:CRAYPAT'),
'SCRATCH': r'/scratch/[\S+]',
'XDG_RUNTIME_DIR': r'/run/user/[\d+]'
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssh will fail now on alps because of MFA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants