Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ILAMB on other machines #229

Closed
forsyth2 opened this issue Mar 25, 2022 · 14 comments
Closed

Support ILAMB on other machines #229

forsyth2 opened this issue Mar 25, 2022 · 14 comments
Assignees
Labels
semver: new feature New feature (will increment minor version)

Comments

@forsyth2
Copy link
Collaborator

forsyth2 commented Mar 25, 2022

Support ILAMB on other machines. Follow-up from #197, #230.

@forsyth2 forsyth2 added the semver: new feature New feature (will increment minor version) label Mar 25, 2022
@forsyth2 forsyth2 self-assigned this Mar 25, 2022
This was referenced Mar 25, 2022
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Apr 7, 2022

Will be useful to merge #233 first.

@forsyth2
Copy link
Collaborator Author

@xylar @chengzhuzhang This is the issue for supporting ILAMB on other machines -- where Mache could be useful. As discussed in today's meeting, there's a couple things Mache could help with: 1) syncing files and 2) different MPI calls on different machines.

@xylar
Copy link
Contributor

xylar commented Apr 22, 2022

Regarding syncing the files, there is a PR to introduce the syncing capability here:
E3SM-Project/mache#26
I'm just waiting to see if @milenaveneziani wants to give any feedback.

The question here is whether ILAMB data can be added under one of the existing diagnostics directory:
https://github.com/xylar/mache/blob/add_sync_script/mache/machines/anvil.cfg#L80-L84

# public diagnostics directory
public_diags = /lcrc/group/e3sm/public_html/diagnostics/

# private diagnostics directory
private_diags = /lcrc/group/e3sm/diagnostics_private/

The public directory is for diagnostics that we can put on the web server and share with people outside of the project without licensing problems. The private directory is for data that we can only share in the project (e.g. because it is not public or we are not allowed to host it publicly ourselves).

It would be easiest if you could include the data there. But if not, we could make another directory and a corresponding sync command. For example, we could add:

ilamb = /lcrc/group/e3sm/ilamb/

(probably a bad choice) and then make a new command:

mache sync ilamb

(again, probably a confusing name). Again, my preference is to put the ILAMB data somewhere in the existing public_diags or private_diags directories.

@xylar
Copy link
Contributor

xylar commented Apr 22, 2022

Regarding getting MPI information, you can already do this with the released mache:

from mache import MachineInfo

...
# determine the machine somehow, see below
machine_info = MachineInfo(machine=machine)
parallel_executable = machine_info.config.get('parallel', 'parallel_executable')

You could let the user specify the machine, let zppy use its existing methods for determining the machine, or use mache to automatically detect the machine (using the environment variable from E3SM-Unified if set):

from mache import discover_machine

machine = None

if 'E3SMU_MACHINE' in os.environ:
    machine = os.environ['E3SMU_MACHINE']

if machine is None:
    machine = discover_machine()

Let me know if there are questions. Obviously, I will review your implementation.

@minxu74
Copy link

minxu74 commented Apr 22, 2022

To sync ILAMB data files, ILAMB provides a tool named ilamb-fetch to download data from the publicly accessible web server https://www.ilamb.org/ILAMB-Data/DATA/. But it may take several hours to download the data.

@xylar
Copy link
Contributor

xylar commented Apr 22, 2022

@minxu74, that's not a big problem as long as there's a quicker way to update (when the data changes). It looks like that's the default behavior for ilamb-fetch.

We could add an ilamb_data config option for each machine in mache. A tool in either mache or zppy could be a simple wrapper around ilamb-fetch that knows what directory to store things in on each E3SM supported machine. We would call this tool on each machine before each zppy release, and more often if needed.

@xylar
Copy link
Contributor

xylar commented Apr 22, 2022

@forsyth2 and @chengzhuzhang, if you give the go-ahead, I can make a PR to mache with a sync tool and a suggested location for ilamb_data on each machine for you to review. But I don't want to jump the gun if another solution sounds better.

@minxu74
Copy link

minxu74 commented Apr 22, 2022

@xylar Yes, ilamb-fetch will compare the md5sum of files in local and server directories and only download the updated ones.

@xylar
Copy link
Contributor

xylar commented Apr 22, 2022

@forsyth2 and @chengzhuzhang, on second thought, I think having someone call ilamb-fetch "manually" on LCRC (either Anvil or Chrysalis) and then syncing with a variant of mache sync to other machines might work better:

  1. ilamb-fetch requires the ilamb conda package, which I do not want to add to mache.
  2. ilamb-fetch will not work on machines with a firewall, whereas I have ways of using rsync via ssh tunnels that do work
  3. The mache sync tool knows how to change permissions on each machine so everyone can read (and write if need be) to the directories

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Apr 22, 2022

Hey @xylar thank you for working on this.
both ilamb-data and the cmor tables were added in the diagnostics folder, which should allow working with mache sync.
I think mache sync will be a more straightforward mechanism for now to manage these analysis related folders.

Also having a unified analysis data structure on each machines (same as the lcrc data repo) will be really handy.

@xylar
Copy link
Contributor

xylar commented Apr 23, 2022

@chengzhuzhang, great, thanks!

I have been testing the sync branch again. I have synced (or am in the process of syncing) ILAMB data to most machines:

  • acme1
  • andes
  • anvil/chrysalis
  • badger/grizzly
  • compy
  • cooley
  • cori-haswell/cori-knl

As usual, I will need help syncing to acme1. My suggestion would be to hold off there until E3SM-Project/mache#26 gets merged unless you want to test there very soon.

@xylar
Copy link
Contributor

xylar commented Apr 23, 2022

On Anvil and Chrysalis, please note that you should be reading the diagnostics data from:

/lcrc/group/e3sm/diagnostics

but providing it in one of:

public_diags = /lcrc/group/e3sm/public_html/diagnostics/
private_diags = /lcrc/group/e3sm/diagnostics_private/

Hopefully, on all machines, you will just use:

diags_base = machine_info.config.get('diagnostics', 'base_path')

where machine_info is from mache as in #229 (comment)

@chengzhuzhang
Copy link
Collaborator

Thank you @xylar. I will take care of acme1 at a later time.
@forsyth2, using mache to identify machine-based MPI info and data paths should be able to simplify some zppy configurations. I think it is good to test it with this ILAMB integration, and if desired, it can be used to update templates for all machines.

@forsyth2
Copy link
Collaborator Author

Resolved by #247, #253.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver: new feature New feature (will increment minor version)
Projects
None yet
Development

No branches or pull requests

4 participants