Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INSTALL] SCOTCH v7.0.4 on RDHPCS machines #748

Closed
MatthewMasarik-NOAA opened this issue Aug 28, 2023 · 25 comments
Closed

[INSTALL] SCOTCH v7.0.4 on RDHPCS machines #748

MatthewMasarik-NOAA opened this issue Aug 28, 2023 · 25 comments
Assignees
Labels
INFRA JEDI Infrastructure

Comments

@MatthewMasarik-NOAA
Copy link

Which software in the stack would you like installed?
SCOTCH v7.0.4.

What is the version/tag of the software?
v7.0.4: https://gitlab.inria.fr/scotch/scotch/-/tags/v7.0.4

What compilation options would you like set?
This may be a work in progress.

Intel builds of SCOTCH have relied on also module loading gcc to reference newer headers. For hera using hpc-stack I used the following modules

module load cmake/3.20.1
module load intel/2022.1.2
module load impi/2022.1.2
module use  /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
module load gnu/9.2.0

which builds scotch/7.0.4 successfully. Attempting to get the analogous spack-stack environment I did this

module use /scratch1/NCEPDEV/jcsda/jedipara/spack-stack/modulefiles
module load miniconda/3.9.12
module load ecflow/5.5.3
module load mysql/8.0.31

module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.4.1/envs/unified-env/install/modulefiles/Core
module load stack-intel/2021.5.0
module load stack-intel-oneapi-mpi/2021.5.1
module load stack-python/3.9.12

module load cmake/3.23.1
module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.4.0/envs/unified-env/install/modulefiles/Core
module load stack-gcc/9.2.0

which gives an Lmod error trying to load stack-gcc when stack-intel is already loaded.

I'm not sure how to get around this, so I am curious if there's any suggestions how to handle this? I also wanted to ask
for scotch that has a recipe already for 7.0.3, and this new release is just a bugfix for 7.0.3, could you advise what I should fill in for instructions at this point? I'm just thinking of avoiding redundant information if that would be the case..

Installation timeframe: Would you like this package to be installed in an upcoming quarterly spack-stack release, or sooner?
August quarterly release.

Any other relevant information that we should know to correctly install the software??
There is currently a recipe for scotch/7.0.3.

Additional context
NA

@climbfuji
Copy link
Collaborator

The Intel-GNU issues described don't apply to spack-stack because of the way we set up the compilers.

@MatthewMasarik-NOAA
Copy link
Author

The Intel-GNU issues described don't apply to spack-stack because of the way we set up the compilers.

Okay. Just to be sure I'm clear, for the Intel build I would not load stack-gcc/9.2.0 in this case?

@climbfuji
Copy link
Collaborator

Correct. For the example of Hera, this is essentially because of lines 15-17 in https://github.com/JCSDA/spack-stack/blob/develop/configs/sites/hera/compilers.yaml which then gets baked into the Intel compiler meta module automatically.

@MatthewMasarik-NOAA
Copy link
Author

Okay, I see. That's very helpful, thanks @climbfuji.

Regarding the build instructions, will the 7.0.3 instructions suffice, or should more be added?

@climbfuji
Copy link
Collaborator

According to @AlexanderRichert-NOAA the 7.0.3 instructions are sufficient.

@MatthewMasarik-NOAA
Copy link
Author

Great. thank you.

@AlexanderRichert-NOAA
Copy link
Collaborator

Indeed. To the best of my understanding, 7.0.4 doesn't affect build options since it's just various bug fixes within the compiled code.

@MatthewMasarik-NOAA
Copy link
Author

You're correct. It is just the bugfixes for the scaling related issue, as well as the openmpi related issue you found.

@climbfuji
Copy link
Collaborator

[email protected] has been installed on all RDHPCS systems as part of spack-stack-1.5.0.

Ok to close this or do you want to wait until the last UFS application has migrated to spack-stack (hopefully in this decade)?

@MatthewMasarik-NOAA
Copy link
Author

Hi @climbfuji, that's great news! I could test WW3 on each platform to confirm from my end, and then close the issue. Does that sound good?

@climbfuji
Copy link
Collaborator

Yes, thanks. Note that there is a PR for the ufs-weather-model to update to spack-stack-1.5.0, which includes the scotch update: ufs-community/ufs-weather-model#1920

@JessicaMeixner-NOAA
Copy link

I'm having trouble loading modules on orion with spack-stack-1.5.0 that is used in ufs-community/ufs-weather-model#1920 is that expected ?

@climbfuji
Copy link
Collaborator

Can you be a little more specific please?

@JessicaMeixner-NOAA
Copy link

Here's a quick way to reproduce what I'm seeing:

git clone https://github.com/climbfuji/ufs-weather-model test
cd test 
git checkout feature/spack_stack_150
cd modulefiles/
module use `pwd`
module load ufs_orion.intel 

And then things just hangs... I'm trying to update the ufs-weather-model version in global-workflow so that I can use scotch 7.0.4 there as well as for other reasons.

@JessicaMeixner-NOAA
Copy link

I have tried multiple log-in nodes. I haven not tried /work instead of /work2, I'll try that now.

@climbfuji
Copy link
Collaborator

I wonder ... on gaea c5 I think we used an option -t because lmod would just hang. module load -t .... But can you check if you have some other modulepaths added to the default login environment / some modules loaded automatically? We recommend a clean .bashrc / .profile, no user mods.

@JessicaMeixner-NOAA
Copy link

I only load: module load contrib noaatools
I try to keep a clean bashrc usually but I do have these on orion. I'll see if it helps to clean that out. I have tried module purges, it doesnt' seem to help and /work did not help and thank you for your quick response and help @climbfuji !

@JessicaMeixner-NOAA
Copy link

A clean environment did not help. module load -t ufs_orion.intel also did not help.

Not sure if this is a "me" issue or if others are having the same problem or not.

@JessicaMeixner-NOAA
Copy link

I did double check and I have no issues with the versions in the develop of the ufs-weather-model

@AlexanderRichert-NOAA
Copy link
Collaborator

Looks like the ufs_orion.intel.lua has the wrong MODULEPATH? It's using /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.0/envs/unified-env so I think it's hanging because it's searching the whole installation directory structure for module files. When I add /install/modulefiles/Core and change the stack-python version to 3.10.8, I can get it to load for me.

@JessicaMeixner-NOAA
Copy link

@AlexanderRichert-NOAA thank you!!!! I made the changes you described and can now load the modules.

@climbfuji
Copy link
Collaborator

climbfuji commented Oct 3, 2023 via email

@climbfuji
Copy link
Collaborator

climbfuji commented Oct 3, 2023 via email

@climbfuji
Copy link
Collaborator

Closing this as completed. [email protected] is installed everywhere as part of 1.5.0. Pleas report issues with spack-wtsack-1.5.0 separately here and/or in the ufs-weather-model PR that updates it to 1.5.0. Thanks!

@MatthewMasarik-NOAA
Copy link
Author

MatthewMasarik-NOAA commented Oct 5, 2023

Hi @climbfuji, sounds good, thank you. Ps, I'm virtually attending a wave workshop all this week, so my testing has been affected by this. I'll pick up next week again, and report any issues if they arise. Thanks for your work completing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
INFRA JEDI Infrastructure
Projects
No open projects
Development

No branches or pull requests

6 participants