Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put experiment configs in separate submodules #42

Closed
nichannah opened this issue Sep 20, 2017 · 24 comments
Closed

Put experiment configs in separate submodules #42

nichannah opened this issue Sep 20, 2017 · 24 comments

Comments

@nichannah
Copy link
Contributor

@aidanheerdegen has written some documentation about this here: https://github.com/OceansAus/access-om2/wiki/Contributing-to-model-configurations

However it has not been implemented yet.

@aidanheerdegen I have a question. Where do you think we should put these experiment repositories? I imagine there may eventually be many and I'm reluctant to clutter up OceansAus. I seem to remember you started a new organisation for this kind of thing?

Thanks.

@nichannah
Copy link
Contributor Author

nichannah commented Sep 29, 2017

One problem with splitting the configs out into separate repos and using them independently of the top level access-om2 repo is that a config hash is no longer tied to a source code hash. When the configs are kept in the top level with source code as submodules this is the case.

I wonder whether we should go to a 3-level (!) structure:

  1. access-om2 containing README.md and tests.
    ... 2. experiment submodules.
    ...... 3. source code submodules.

In this way people will be able to do a shallow clone of an experiment repo to get the payu experiment. Also a recursive clone will give the source code for that particular config.

In practice the top level access-om2 repo probably wouldn't be used so frequently. Mainly it would be just used to define releases and their documentation.

One good thing about this setup is that it supports a better model of forking / collaboration on the granularity of individual experiments. Users can fork just the exp config that they are interested in, change source code etc to easily create a brand new experiment.

@aidanheerdegen
Copy link
Contributor

Seems like a good solution but I don't have my work head on. Perhaps worth getting @marshallward to comment.

@marshallward
Copy link
Contributor

Re: hosting the numerous OceansAus experiments-

I think it would be good to find a solution which allows for independent (payu) experiment configuration repos. It's the direction we've been going for the last few years, and it seems to be working well for us. I also think in general that it's a better fit for academic development.

Also, the new payu github support should make it (potentially) easy to share and sync these without much additional effort.

(Tucking everything in a top-level repo feels very "subversion"-y and, consequently, might be a good fit for a Rose suite, BTW).

Re: other issues

I like the idea of adding source code as a submodule to an experiment. But doesn't this tie a commit to a specific source code hash?

Submodules are set by hash, right? And they are added on a commit (with its own hash), right? So doesn't this tie the submodule has to the config hash? Or am I wrong about this?

Maybe this would be better?

  • config repo (e.g. payu or otherwise)
    • which contains a submodule to its codebase (e.g. access-om2) with a specific hash
      • which may (or may not) contain its own submodules, but that is not the config's problem

Would that work? Or am I missing something?

@AndyHoggANU
Copy link
Contributor

Hi All,
I'm not sure I have the answers here, but I worry about the complexity of a 2-3 level hierarchy.
Is it possible to completely separate the experimental configs into stand-alone repositories??

@marshallward
Copy link
Contributor

I see the codebase submodules as optional, as they currently are in our current payu-managed runs. If one wants to keep everything simple, then they can create binaries and tuck them into the the lab-wide bin directory.

If a user wants to work in a more reproducible environment, then they can anchor their experiment to a git-managed submodule.

I don't generally like working with submodules, but I think anchoring the experiment to a hashed snapshot of the code is a very good approach. But I also don't see a problem with making this optional.

This is how I see it, at least. But I think this probably needs more discussion.

@AndyHoggANU
Copy link
Contributor

Fair enough. Let's discuss in an interactive forum like next week's MOM meeting ...

@nichannah
Copy link
Contributor Author

I've moved the experiment directories into their own repositories within the OceansAus organisation. If this gets too cluttered then we can move them.

They are also now submodules within the top-level access-om2 repo.

I think this is an improvement over what we had. We probably should still think more about how we make sure that we know which code/executable is attached to each experiment.

@aidanheerdegen
Copy link
Contributor

Great! Thanks @nicjhan.

@StephenGriffies
Copy link

Is this a move of use for the mom-ocean repo?

@aidanheerdegen
Copy link
Contributor

I'm not quite sure I understand your question @StephenGriffies.

If you mean, is OceansAus taking over from the mom-ocean repo as the location for MOM5 model configurations, then I would say no. These are ACCESS-OM2 (MOM5+CICE+MATM+OASIS) configs intended to be the COSIMA standard runs that others can use. In particular we will be able to share spinup runs and runs forked from the spinups.

@StephenGriffies
Copy link

My question refers to the mom5 repo.

A concern I have is that there is no Git version control for the experiment field tables, diag tables, input nml, data table. Instead, these files are housed elsewhere outside of Github. It would be nice to see these files on Github.

My presumption is that ACCESS has these files versioned. Is that true?

@aidanheerdegen
Copy link
Contributor

Yes @StephenGriffies you are correct, the ACCESS model configs are stored on github. This is not the case for the mom-ocean example configurations.

I see you have made an issue for this on the mom5 repo

mom-ocean/MOM5#197

We should discuss how it will be implemented over there. But I agree, it is a desirable goal.

@StephenGriffies
Copy link

Thanks @aidanheerdegen

@nichannah
Copy link
Contributor Author

Hi @StephenGriffies my thoughts were that we will do something similar to MOM6-examples for MOM5. i.e. put all the model configs into a single 'examples' repository.

@StephenGriffies
Copy link

Good option. Thanks

@aekiss
Copy link
Contributor

aekiss commented Nov 16, 2017

It looks like permissions need fixing.

Updating an existing clone via
git submodule update --remote
or
git pull --recurse-submodules
or starting fresh with a new clone via
git clone --recursive https://github.com/OceansAus/access-om2.git
now fails with errors like

esdhcp-190:access-om2 andy$ git submodule update --remote
Cloning into '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/01deg_jra55_ryf'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:OceansAus/01deg_jra55_ryf.git' into submodule path '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/01deg_jra55_ryf' failed
Failed to clone 'control/01deg_jra55_ryf'. Retry scheduled
Cloning into '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/025deg_jra55_ryf'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:OceansAus/025deg_jra55_ryf.git' into submodule path '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/025deg_jra55_ryf' failed
Failed to clone 'control/025deg_jra55_ryf'. Retry scheduled
Cloning into '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/1deg_core_nyf'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:OceansAus/1deg_core_nyf.git' into submodule path '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/1deg_core_nyf' failed
Failed to clone 'control/1deg_core_nyf'. Retry scheduled
Cloning into '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/1deg_jra55_ryf'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:OceansAus/1deg_jra55_ryf.git' into submodule path '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/1deg_jra55_ryf' failed
Failed to clone 'control/1deg_jra55_ryf'. Retry scheduled
Cloning into '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/01deg_jra55_ryf'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:OceansAus/01deg_jra55_ryf.git' into submodule path '/Users/andy/Documents/COSIMA/github/OceansAus/access-om2/control/01deg_jra55_ryf' failed
Failed to clone 'control/01deg_jra55_ryf' a second time, aborting

@aekiss aekiss reopened this Nov 16, 2017
@aidanheerdegen
Copy link
Contributor

aidanheerdegen commented Nov 16, 2017

Works for me with fresh clone. I wonder why?

$ module load git
$ git clone --recursive https://github.com/OceansAus/access-om2.git
Cloning into 'access-om2'...
remote: Counting objects: 1353, done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 1353 (delta 17), reused 32 (delta 12), pack-reused 1307
Receiving objects: 100% (1353/1353), 531.11 KiB | 687.00 KiB/s, done.
Resolving deltas: 100% (853/853), done.
Checking connectivity... done.
Submodule 'control/01deg_jra55_ryf' ([email protected]:OceansAus/01deg_jra55_ryf.git) registered for path 'control/01deg_jra55_ryf'
Submodule 'control/025deg_jra55_ryf' ([email protected]:OceansAus/025deg_jra55_ryf.git) registered for path 'control/025deg_jra55_ryf'
Submodule 'control/1deg_core_nyf' ([email protected]:OceansAus/1deg_core_nyf.git) registered for path 'control/1deg_core_nyf'
Submodule 'control/1deg_jra55_ryf' ([email protected]:OceansAus/1deg_jra55_ryf.git) registered for path 'control/1deg_jra55_ryf'
Submodule 'src/cice5' (https://github.com/OceansAus/cice5.git) registered for path 'src/cice5'
Submodule 'src/matm' (https://github.com/OceansAus/matm.git) registered for path 'src/matm'
Submodule 'src/mom' (https://github.com/mom-ocean/MOM5.git) registered for path 'src/mom'
Submodule 'src/oasis3-mct' (https://github.com/OceansAus/oasis3-mct.git) registered for path 'src/oasis3-mct'
Cloning into 'control/01deg_jra55_ryf'...
remote: Counting objects: 76, done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 76 (delta 32), reused 76 (delta 32), pack-reused 0
Receiving objects: 100% (76/76), 19.61 KiB | 0 bytes/s, done.
Resolving deltas: 100% (32/32), done.
Checking connectivity... done.
Submodule path 'control/01deg_jra55_ryf': checked out '0436a288886d61f8805e1a11fa572eb7b294a758'
Cloning into 'control/025deg_jra55_ryf'...
remote: Counting objects: 103, done.
remote: Compressing objects: 100% (50/50), done.
remote: Total 103 (delta 51), reused 103 (delta 51), pack-reused 0
Receiving objects: 100% (103/103), 24.71 KiB | 0 bytes/s, done.
Resolving deltas: 100% (51/51), done.
Checking connectivity... done.
Submodule path 'control/025deg_jra55_ryf': checked out '0eebd2784a1c0210ac87c57961438fbc7ff3bcea'
Cloning into 'control/1deg_core_nyf'...
remote: Counting objects: 40, done.
remote: Compressing objects: 100% (24/24), done.
remote: Total 40 (delta 13), reused 40 (delta 13), pack-reused 0
Receiving objects: 100% (40/40), 16.75 KiB | 0 bytes/s, done.
Resolving deltas: 100% (13/13), done.
Checking connectivity... done.
Submodule path 'control/1deg_core_nyf': checked out 'c31e118be4f6df8db7ee091decbb7c90e0a8fdf5'
Cloning into 'control/1deg_jra55_ryf'...
remote: Counting objects: 131, done.
remote: Compressing objects: 100% (62/62), done.
remote: Total 131 (delta 66), reused 131 (delta 66), pack-reused 0
Receiving objects: 100% (131/131), 30.22 KiB | 0 bytes/s, done.
Resolving deltas: 100% (66/66), done.
Checking connectivity... done.
Submodule path 'control/1deg_jra55_ryf': checked out 'fddbf21b275917d38286ff1e34ebca790393e67e'
Cloning into 'src/cice5'...
remote: Counting objects: 717, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 717 (delta 0), reused 1 (delta 0), pack-reused 712
Receiving objects: 100% (717/717), 60.85 MiB | 21.84 MiB/s, done.
Resolving deltas: 100% (462/462), done.
Checking connectivity... done.
Submodule path 'src/cice5': checked out 'fe7300227107bde802a217ff0d6ef7f92a6eb6c2'
remote: Counting objects: 717, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 717 (delta 0), reused 1 (delta 0), pack-reused 712
Receiving objects: 100% (717/717), 60.85 MiB | 21.84 MiB/s, done.
Resolving deltas: 100% (462/462), done.
Checking connectivity... done.
Submodule path 'src/cice5': checked out 'fe7300227107bde802a217ff0d6ef7f92a6eb6c2'
Cloning into 'src/matm'...
remote: Counting objects: 305, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 305 (delta 8), reused 14 (delta 6), pack-reused 284
Receiving objects: 100% (305/305), 93.97 KiB | 0 bytes/s, done.
Resolving deltas: 100% (197/197), done.
Checking connectivity... done.
Submodule path 'src/matm': checked out 'b1f482c37eb951750be386f938d90e287aa577a3'
Cloning into 'src/mom'...
remote: Counting objects: 42168, done.
remote: Total 42168 (delta 0), reused 0 (delta 0), pack-reused 42168
Receiving objects: 100% (42168/42168), 30.92 MiB | 468.00 KiB/s, done.
Resolving deltas: 100% (15224/15224), done.
Checking connectivity... done.
Submodule path 'src/mom': checked out '030fb1f22af7a9f9a3d4a7dc197d6ace0684ae6b'
Cloning into 'src/oasis3-mct'...
remote: Counting objects: 472, done.
remote: Total 472 (delta 0), reused 0 (delta 0), pack-reused 472
Receiving objects: 100% (472/472), 5.27 MiB | 4.46 MiB/s, done.
Resolving deltas: 100% (124/124), done.
Checking connectivity... done.
Submodule path 'src/oasis3-mct': checked out '0d0f2ff4ee71c0fb9c1346ff4e60d62c56a15bf9'

@aidanheerdegen
Copy link
Contributor

This worked too:

git clone --recursive [email protected]:OceansAus/access-om2.git

@aidanheerdegen
Copy link
Contributor

I asked someone else to test and it was ok for them too. Have you changed your GitHub ssh keys? This looks suspicious:

Permission denied (publickey).

@aekiss
Copy link
Contributor

aekiss commented Nov 16, 2017

Ah oops sorry, the problem was at my end. Thanks Aidan for fixing it.

@aekiss aekiss closed this as completed Nov 16, 2017
@aekiss
Copy link
Contributor

aekiss commented Nov 23, 2017

Submodules create some serious traps for the unwary (i.e. me and @AndyHoggANU, earlier today).

I hadn't realised that git submodule update is not like git pull, but more like git clone in that it will silently overwrite any local changes (though committed changes remain in the history).

This means that git submodule update will overwrite a user's carefully customised configs with the standard versions from github. So this section from https://github.com/OceansAus/access-om2/blob/master/README.md is a bad approach and needs to be re-thought:
"
if you have an existing download and would like to update to the latest version:

cd /short/${PROJECT}/${USER}
cd access-om2
git pull
git submodule update

"

Another source of confusion (for me, anyway) is that branches are specific to the submodule repo I'm in. So https://github.com/OceansAus/access-om2/wiki/Contributing-to-model-configurations needs to be updated - eg at step 4 the 'run' branch needs to be set up in the config dir, not the access-om2 dir. If the user switches back to access-om2 they will be back on the master branch for that overarching repo even though the config remains as the 'run' branch as it is in a separate repo. That makes sense now that I understand this better but is initially unintuitive and should be explained to users.

More traps are in the "gotchas" section of https://git.wiki.kernel.org/index.php/GitSubmoduleTutorial

The configs-as-submodules approach has a lot of merit, but the traps need to be made clear and the user documentation updated.

@aekiss
Copy link
Contributor

aekiss commented Nov 23, 2017

re. connecting configs with src versions (#42 (comment)): the output of each run currently includes config.yaml, which specifies binaries with git hashes attached to their names (if the config was set up with hashexe.sh or equivalent). So that provides some measure of control over reproducibility (though it isn't enforced, eg a user could sidestep using hashexe.sh and manually change binary names and config.yaml to have incorrect/no hashes).

@aidanheerdegen
Copy link
Contributor

I think there is a tension between the best approach for users and the best approach for maintainers. A better separation between code and model configs makes life easier for users. They need a new version, they just blow away their old source directory and pull in a fresh one, compile and they’re sweet.

This is pretty much what @nicjhan was suggesting up above

#42 (comment)

and I think I agree. I'm not sure about putting the source code inside the experimental config, as the source is common to all experiments. Associating a config with the code to run it is better addressed in other ways IMO.

@AndyHoggANU wasn't sure because it sounded more complex, but in fact I think it is less complex for most users, just a bit more complex for the maintainer, but he is a total guru, and in theory it changes less frequently.

@nichannah
Copy link
Contributor Author

We seem to have converged on a usable set-up. This conversation is being referenced here https://github.com/OceansAus/access-om2/wiki/Tutorials#Understanding-the-ACCESS-OM2-repository-layout

Further work needs to be done to fix up this part of the wiki.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants