Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mksurfdat toolchain: portability across platforms #645

Closed
slevis-lmwg opened this issue Feb 26, 2019 · 31 comments
Closed

mksurfdat toolchain: portability across platforms #645

slevis-lmwg opened this issue Feb 26, 2019 · 31 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability

Comments

@slevis-lmwg
Copy link
Contributor

Make sure the tool build can work on a wide range of systems.

@billsacks billsacks added the enhancement new capability or improved behavior of existing capability label Feb 26, 2019
@ekluzek
Copy link
Collaborator

ekluzek commented Sep 25, 2019

Part of this can be accomplished by using the package manager for OCGIS as the requirements for the mapping step is the most difficult part of the toolchain. I was going to make a separate issue for just mkmapdata, but I think it's fine to use this for that purpose.

@slevisconsulting is going to make sure the OCGIS package manager allows him to easily work on different platforms.

@bekozi

@slevis-lmwg
Copy link
Contributor Author

@bekozi here may be a better place to send me the initial instructions that we discussed, instead of by email.

@bekozi
Copy link

bekozi commented Sep 27, 2019

Installation instructions for ocgis and conda-forge:

  1. Install Anaconda Python
  2. Install ocgis with: conda install -c conda-forge -n ocgis ocgis esmpy mpi4py cf_units rtree nose mock
  3. Activate the conda environment: conda activate ocgis
  4. Remove ocgis since we'll want to work off the master initially: conda remove ocgis
  5. Set/append to PYTHONPATH the path <git ocgis directory>/src.
  6. Unless ESMF 8.0 is out, we'll also need a development build of ESMPy. Once ocgis is importable on the target platform(s) we can address this.

Ideally, this should work unless we have platform-specific install issues...which are not necessarily uncommon.

We should note that this installs a serial netcdf4-python build that will eventually need to be parallel. There may be a solution in conda-forge for this already.

FYI, the documentation for an ocgis installation is here: https://ocgis.readthedocs.io/en/latest/install.html#using-the-package-manager

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Sep 30, 2019
@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Sep 30, 2019
@slevis-lmwg
Copy link
Contributor Author

First followed advice from here:

On izumi:
cd ~
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b -p ~/anaconda
rm Anaconda3-2019.10-Linux-x86_64.sh
Added setenv PATH ~/anaconda/bin:$PATH to my .cshrc
source .cshrc
conda update conda

Then followed advice from here:
conda config --add channels conda-forge
conda config --set channel_priority strict

Next I tried the command that you gave in step (2) @bekozi:
conda install -c conda-forge -n ocgis ocgis esmpy mpi4py cf_units rtree nose mock
but got this error:
EnvironmentLocationNotFound: Not a conda environment: /home/slevis/anaconda/envs/ocgis

Tried this next
conda init tcsh
and repeated the conda install but got the same error.

@bekozi could you give me advice at this point. I don't want to mess anything up...

@slevis-lmwg
Copy link
Contributor Author

@bekozi pls ignore the request for help in my previous post. I made progress as follows:
cd anaconda/envs
mkdir ocgis
conda install -c conda-forge -n ocgis ocgis esmpy mpi4py cf_units rtree nose mock
conda activate ocgis
conda remove ocgis

@slevis-lmwg
Copy link
Contributor Author

Next at github.com/NCPP/ocgis:
git clone -b master https://github.com/NCPP/ocgis.git ocgis

I also tried the same at github.com/NESII/esmpy-feedstock:
git clone -b master https://github.com/NESII/esmpy-feedstock.git esmf
but decided that I didn't need this. Not sure, yet...

@slevis-lmwg
Copy link
Contributor Author

Modified PYTHONPATH in script.

@slevis-lmwg
Copy link
Contributor Author

conda search esmpy --channel conda-forge
showed me that ESMF 8 is available.

@slevis-lmwg
Copy link
Contributor Author

Submitted my script ./subset_20191214.sh to run interactively on izumi and got this error...
ImportError: No module named click

@slevis-lmwg
Copy link
Contributor Author

@bekozi let me know if you have any ideas or if we should set up a time to debug together...

@slevis-lmwg
Copy link
Contributor Author

I tried various tests found here and all gave errors. Can't say if I'm using the tests correctly though...

@bekozi
Copy link

bekozi commented Dec 16, 2019

click is the optional package used to create the CLI. You should be able to install it via conda-forge or pip (it's widely used). I'll make sure this is added to the docs.

@bekozi
Copy link

bekozi commented Dec 16, 2019

I tried various tests found here and all gave errors.

Forgot to ask - could you please attach the error output from the testing you tried? There should not be extensive test failures.

@slevis-lmwg
Copy link
Contributor Author

click is the optional package used to create the CLI. You should be able to install it via conda-forge or pip (it's widely used). I'll make sure this is added to the docs.

I repeated this with click added at the end:
conda install -c conda-forge -n ocgis ocgis esmpy mpi4py cf_units rtree nose mock click
Submitted my ocgis script: ./no_subset_20191214.sh
Got same error at first: ImportError: No module named click
Typed: module load lang/python/3.7.0
...and I got a new error:
No module named 'shapely'

I tried repeating the above with shapely added at the end:
conda install -c conda-forge -n ocgis ocgis esmpy mpi4py cf_units rtree nose mock click shapely
but failed with this error:
CustomValidationError: Parameter channel_priority = 'strict' declared in <<merged>> is invalid. The value 'strict' cannot be boolified.

@slevis-lmwg
Copy link
Contributor Author

Also here are the test failures:
python -c "from ocgis.test import run_simple; run_simple(verbose=False)"
Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'ocgis'

python setup.py test
extensions/gdal_wrap.cpp:3177:27: fatal error: cpl_vsi_error.h: No such file or directory
#include "cpl_vsi_error.h" ^ compilation terminated.
error: Setup script exited with error: command 'gcc' failed with exit status 1

@bekozi
Copy link

bekozi commented Dec 31, 2019

@slevisconsulting Thanks for the additional info. I think it's best we have a debug meeting to discuss. It looks like the environment paths are off and python is having difficulty finding basic packages.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Jan 2, 2020

Started over from conda create below:

First followed advice from here:

On izumi:
cd ~
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b -p ~/anaconda
rm Anaconda3-2019.10-Linux-x86_64.sh
Added setenv PATH ~/anaconda/bin:$PATH to my .cshrc
source .cshrc
conda update conda

Then followed advice from here:
conda config --add channels conda-forge
conda config --set channel_priority strict

Next I tried the command that you gave in step (2) @bekozi:

BUT changed install to create after typing this remove command:
rm -rf anaconda/envs/ocgis/
conda create -c conda-forge -n ocgis ocgis esmpy mpi4py cf_units rtree nose mock
conda activate ocgis
conda remove ocgis --force

@slevis-lmwg
Copy link
Contributor Author

Next, trying to run my ocgis script interactively on izumi:
Adding conda activate ocgis inside the script did NOT work.
The script ran after I typed 'conda activate ocgis` at the prompt.
@ekluzek do you know what we need to do to get this to work inside the script instead? Ben didn't know. If you also don't know, I can ask Mark Moore.

@ekluzek
Copy link
Collaborator

ekluzek commented Jan 2, 2020

Hmmm. I'd have to play around with it to see if I could get it to work. I can't see a reason why it would work interactively, but not inside the script. I'm only guessing that the call to ocgis, is starting up a new shell and isn't passing the environment down.

@slevis-lmwg
Copy link
Contributor Author

Aha! I added the next two lines:
conda init bash
source ~/.bashrc
before conda activate ocgis and it seems to have worked.

@slevis-lmwg
Copy link
Contributor Author

I pulled @bekozi 's latest updates and now my test scripts generate weight files successfully on izumi. These are my test scripts:
/fs/cgd/data0/slevis/ocgis_work/no_subset_20191214.sh
and
/fs/cgd/data0/slevis/ocgis_work/subset_1x1_20191213.sh

@slevis-lmwg
Copy link
Contributor Author

@ekluzek now that we have cheyenne and izumi working, do you have other machines in mind that I should try?

@ekluzek
Copy link
Collaborator

ekluzek commented Jan 10, 2020

@slevisconsulting cheyenne and izumi are our main test machines. But, I'd like to hear that things work for the type of machines that @barlage thinks WRF users might use. So it would be good to hear that it would be easy to run on a desktop and/or laptop (at least for say a single-point).

I'm thinking you could try hobart and it should probably just run out of the box. But, I'd be interested in hearing if it runs on the CGD machines such as thorodin. That might take a little more work. And then if it's possible to run on my mac laptop that would be interesting as well.

But, if any of this takes a ton of time, to port to, it's not worth it. It would just be good to know.

@slevis-lmwg
Copy link
Contributor Author

I'm thinking you could try hobart and it should probably just run out of the box. But, I'd be interested in hearing if it runs on the CGD machines such as thorodin. That might take a little more work. And then if it's possible to run on my mac laptop that would be interesting as well.

You are correct: I ran the test script on hobart and got identical results as from izumi.

I will look into thorodin next.

We can talk about trying on your mac laptop and see if we need help from Ben. Let me know.

@slevis-lmwg
Copy link
Contributor Author

The same sequence was successful on thorodin, too.
cd
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b -p ~/anaconda
rm Anaconda3-2019.10-Linux-x86_64.sh
Added setenv PATH ~/anaconda/bin to my .cshrc
source .cshrc
conda init tcsh
exit ! ...then log back on
conda update conda
conda config --add channels conda-forge
conda config --set channel_priority strict
conda create -c conda-forge -n ocgis ocgis esmpy mpi4py cf_units rtree nose mock
conda activate ocgis
conda remove ocgis --force

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Jan 12, 2020

@billsacks suggested that I get ocgis working in mkmapdata.sh on izumi (as I have done on cheyenne) if I first found that mkmapdata.sh worked on izumi. I have found that mkmapdata.sh gives this error on izumi:
Input SCRIP grid file does NOT exist: ERROR: unrecognized arguments: scripgriddata\n
Make sure CSMDATA environment variable is set correctly

...although I made this change:
-if [ -z "$CSMDATA" ]; then
- CSMDATA=/glade/p/cesm/cseg/inputdata
-fi
+case $hostname in
+
+ ##cheyenne
+ cheyenne* | r* )
+ if [ -z "$CSMDATA" ]; then
+ CSMDATA=/glade/p/cesm/cseg/inputdata
+ fi
+ ;;
+
+ ##hobart/izumi/thorodin
+ hobart* | izumi* | thorodin* )
+ if [ -z "$CSMDATA" ]; then
+ CSMDATA=/fs/cgd/csm/inputdata
+ fi
+ ;;
+
+esac
...and although I copied a whole bunch of SCRIPgrid files over to /fs/cgd/csm/inputdata/lnd/clm2/mappingdata/grids

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 14, 2022

This is being accomplished in a different manner. We won't be using OCGIS. PR #1663 provides for this by making mksurfdata a MPI program that inputs mesh files for the different grids directly. It then can be run on multiple processors. It does then require MPI and a parallel cluster to run on.

Currently, our build and run is only setup for cheyenne, but we can extend it for other machines. Possibly the way to do that might be to use the cime configure capability so that any cime machine could be used to do the build. Possibly the batch ability in cime could be used as well. But, this would need to be looked into.

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 24, 2022

In #1663 there are two issues now. The build is hardcoded to only work on cheyenne, and the batch submission is setup only for cheyenne. For generality both of these need to be made to work on more systems.

The build is using CMake, but has hardcoded paths for the libraries: MPI, ESMF, NetCDF, and PIO. This is mostly accomplished through the ESMF makefile esmf.mk fragment. If this is pointed to correctly the build will work anywhere. See this note from ESMF

## This Makefile must be able to find the "esmf.mk" Makefile fragment in the  ##
## 'include' line below. Following the ESMF User's Guide, a complete ESMF     ##
## installation should ensure that a single environment variable "ESMFMKFILE" ##
## is made available on the system. This variable should point to the         ##
## "esmf.mk" file.                                                            ##
##                                                                            ##
## This example Makefile uses the "ESMFMKFILE" environment variable.          ##
##                                                                            ##
## If you notice that this Makefile cannot find variable ESMFMKFILE then      ##
## please contact the person responsible for the ESMF installation on your    ##
## system.                                                                    ##
## As a work-around you can simply hardcode the path to "esmf.mk" in the      ##
## include line below. However, doing so will render this Makefile a lot less ##
## flexible and non-portable.  

Currently we do have a hardcoded path for ESMFMKFILE, but a more general mechanism using an env variable for ESMFMKFILE could easily be added.

Since, batch systems are widely varied, it might be OK to expect the user to customize the batch script that gen_mksurfdata_jobscript_single.py and gen_mksurfdata_jobscript_multi.py create.

@mvertens
Copy link

@ekluzek - I am aware of the hardwiring that is in place. This Makefile was only a temporary placeholder until we got a new ESMF library in place (which is the case now - given the latest beta snapshot that ESMF has created). This library was needed in order to be able to read in very high resolution mesh files without a corresponding memory bottleneck being encountered. I am planning on working with @jedwards4b - to move the entire build to CMake - but have not had a chance to do this until now.

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 25, 2022

@mvertens yes, sounds good. I'm adding to this issue so we document the path forward.

I actually don't think the build is that bad, it's using cime configure, and it's reliant on the ESMF build, which is required. I think it's good to be reliant on cime configure, since to run the model you have to port to cime anyway. And because the ESMF build is going to already bring: MPI, NetCDF, and PIO all together in one build, it's really not bad to be dependent on it. The ESMF team has also invested lots of effort at making ESMF reasonable to build on HPC systems. And many HPC systems are already building a version for users on their systems. The main thing I see to improve it is to document to users how to build/find the ESMF build on their system and include it in the build of mksurfdata_esmf.

Of course if you have plans to make this even more automated, that sounds great. But, just putting a little bit of effort to make the current system a little more general would really go really far. I suspect for example that it wouldn't be hard to get the current build to work on izumi, since we have ESMF built there and it's ported to cime. Showing that we could do that would give us a lot of confidence that the new mksurfdata_esmf could easily be ported to any HPC system.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 29, 2023

I think this is all handled in the new mksurfdata_esmf at this point, so I'm closing.

@ekluzek ekluzek closed this as completed Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability
Projects
None yet
Development

No branches or pull requests

5 participants