-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put experiment configs in separate submodules #42
Comments
One problem with splitting the configs out into separate repos and using them independently of the top level access-om2 repo is that a config hash is no longer tied to a source code hash. When the configs are kept in the top level with source code as submodules this is the case. I wonder whether we should go to a 3-level (!) structure:
In this way people will be able to do a shallow clone of an experiment repo to get the payu experiment. Also a recursive clone will give the source code for that particular config. In practice the top level access-om2 repo probably wouldn't be used so frequently. Mainly it would be just used to define releases and their documentation. One good thing about this setup is that it supports a better model of forking / collaboration on the granularity of individual experiments. Users can fork just the exp config that they are interested in, change source code etc to easily create a brand new experiment. |
Seems like a good solution but I don't have my work head on. Perhaps worth getting @marshallward to comment. |
Re: hosting the numerous OceansAus experiments- I think it would be good to find a solution which allows for independent (payu) experiment configuration repos. It's the direction we've been going for the last few years, and it seems to be working well for us. I also think in general that it's a better fit for academic development. Also, the new payu github support should make it (potentially) easy to share and sync these without much additional effort. (Tucking everything in a top-level repo feels very "subversion"-y and, consequently, might be a good fit for a Rose suite, BTW). Re: other issues I like the idea of adding source code as a submodule to an experiment. But doesn't this tie a commit to a specific source code hash? Submodules are set by hash, right? And they are added on a commit (with its own hash), right? So doesn't this tie the submodule has to the config hash? Or am I wrong about this? Maybe this would be better?
Would that work? Or am I missing something? |
Hi All, |
I see the codebase submodules as optional, as they currently are in our current payu-managed runs. If one wants to keep everything simple, then they can create binaries and tuck them into the the lab-wide If a user wants to work in a more reproducible environment, then they can anchor their experiment to a git-managed submodule. I don't generally like working with submodules, but I think anchoring the experiment to a hashed snapshot of the code is a very good approach. But I also don't see a problem with making this optional. This is how I see it, at least. But I think this probably needs more discussion. |
Fair enough. Let's discuss in an interactive forum like next week's MOM meeting ... |
I've moved the experiment directories into their own repositories within the OceansAus organisation. If this gets too cluttered then we can move them. They are also now submodules within the top-level access-om2 repo. I think this is an improvement over what we had. We probably should still think more about how we make sure that we know which code/executable is attached to each experiment. |
Great! Thanks @nicjhan. |
Is this a move of use for the mom-ocean repo? |
I'm not quite sure I understand your question @StephenGriffies. If you mean, is OceansAus taking over from the mom-ocean repo as the location for MOM5 model configurations, then I would say no. These are ACCESS-OM2 (MOM5+CICE+MATM+OASIS) configs intended to be the COSIMA standard runs that others can use. In particular we will be able to share spinup runs and runs forked from the spinups. |
My question refers to the mom5 repo. A concern I have is that there is no Git version control for the experiment field tables, diag tables, input nml, data table. Instead, these files are housed elsewhere outside of Github. It would be nice to see these files on Github. My presumption is that ACCESS has these files versioned. Is that true? |
Yes @StephenGriffies you are correct, the ACCESS model configs are stored on github. This is not the case for the mom-ocean example configurations. I see you have made an issue for this on the mom5 repo We should discuss how it will be implemented over there. But I agree, it is a desirable goal. |
Thanks @aidanheerdegen |
Hi @StephenGriffies my thoughts were that we will do something similar to MOM6-examples for MOM5. i.e. put all the model configs into a single 'examples' repository. |
Good option. Thanks |
It looks like permissions need fixing. Updating an existing clone via
|
Works for me with fresh clone. I wonder why? $ module load git
$ git clone --recursive https://github.com/OceansAus/access-om2.git
Cloning into 'access-om2'...
remote: Counting objects: 1353, done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 1353 (delta 17), reused 32 (delta 12), pack-reused 1307
Receiving objects: 100% (1353/1353), 531.11 KiB | 687.00 KiB/s, done.
Resolving deltas: 100% (853/853), done.
Checking connectivity... done.
Submodule 'control/01deg_jra55_ryf' ([email protected]:OceansAus/01deg_jra55_ryf.git) registered for path 'control/01deg_jra55_ryf'
Submodule 'control/025deg_jra55_ryf' ([email protected]:OceansAus/025deg_jra55_ryf.git) registered for path 'control/025deg_jra55_ryf'
Submodule 'control/1deg_core_nyf' ([email protected]:OceansAus/1deg_core_nyf.git) registered for path 'control/1deg_core_nyf'
Submodule 'control/1deg_jra55_ryf' ([email protected]:OceansAus/1deg_jra55_ryf.git) registered for path 'control/1deg_jra55_ryf'
Submodule 'src/cice5' (https://github.com/OceansAus/cice5.git) registered for path 'src/cice5'
Submodule 'src/matm' (https://github.com/OceansAus/matm.git) registered for path 'src/matm'
Submodule 'src/mom' (https://github.com/mom-ocean/MOM5.git) registered for path 'src/mom'
Submodule 'src/oasis3-mct' (https://github.com/OceansAus/oasis3-mct.git) registered for path 'src/oasis3-mct'
Cloning into 'control/01deg_jra55_ryf'...
remote: Counting objects: 76, done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 76 (delta 32), reused 76 (delta 32), pack-reused 0
Receiving objects: 100% (76/76), 19.61 KiB | 0 bytes/s, done.
Resolving deltas: 100% (32/32), done.
Checking connectivity... done.
Submodule path 'control/01deg_jra55_ryf': checked out '0436a288886d61f8805e1a11fa572eb7b294a758'
Cloning into 'control/025deg_jra55_ryf'...
remote: Counting objects: 103, done.
remote: Compressing objects: 100% (50/50), done.
remote: Total 103 (delta 51), reused 103 (delta 51), pack-reused 0
Receiving objects: 100% (103/103), 24.71 KiB | 0 bytes/s, done.
Resolving deltas: 100% (51/51), done.
Checking connectivity... done.
Submodule path 'control/025deg_jra55_ryf': checked out '0eebd2784a1c0210ac87c57961438fbc7ff3bcea'
Cloning into 'control/1deg_core_nyf'...
remote: Counting objects: 40, done.
remote: Compressing objects: 100% (24/24), done.
remote: Total 40 (delta 13), reused 40 (delta 13), pack-reused 0
Receiving objects: 100% (40/40), 16.75 KiB | 0 bytes/s, done.
Resolving deltas: 100% (13/13), done.
Checking connectivity... done.
Submodule path 'control/1deg_core_nyf': checked out 'c31e118be4f6df8db7ee091decbb7c90e0a8fdf5'
Cloning into 'control/1deg_jra55_ryf'...
remote: Counting objects: 131, done.
remote: Compressing objects: 100% (62/62), done.
remote: Total 131 (delta 66), reused 131 (delta 66), pack-reused 0
Receiving objects: 100% (131/131), 30.22 KiB | 0 bytes/s, done.
Resolving deltas: 100% (66/66), done.
Checking connectivity... done.
Submodule path 'control/1deg_jra55_ryf': checked out 'fddbf21b275917d38286ff1e34ebca790393e67e'
Cloning into 'src/cice5'...
remote: Counting objects: 717, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 717 (delta 0), reused 1 (delta 0), pack-reused 712
Receiving objects: 100% (717/717), 60.85 MiB | 21.84 MiB/s, done.
Resolving deltas: 100% (462/462), done.
Checking connectivity... done.
Submodule path 'src/cice5': checked out 'fe7300227107bde802a217ff0d6ef7f92a6eb6c2'
remote: Counting objects: 717, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 717 (delta 0), reused 1 (delta 0), pack-reused 712
Receiving objects: 100% (717/717), 60.85 MiB | 21.84 MiB/s, done.
Resolving deltas: 100% (462/462), done.
Checking connectivity... done.
Submodule path 'src/cice5': checked out 'fe7300227107bde802a217ff0d6ef7f92a6eb6c2'
Cloning into 'src/matm'...
remote: Counting objects: 305, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 305 (delta 8), reused 14 (delta 6), pack-reused 284
Receiving objects: 100% (305/305), 93.97 KiB | 0 bytes/s, done.
Resolving deltas: 100% (197/197), done.
Checking connectivity... done.
Submodule path 'src/matm': checked out 'b1f482c37eb951750be386f938d90e287aa577a3'
Cloning into 'src/mom'...
remote: Counting objects: 42168, done.
remote: Total 42168 (delta 0), reused 0 (delta 0), pack-reused 42168
Receiving objects: 100% (42168/42168), 30.92 MiB | 468.00 KiB/s, done.
Resolving deltas: 100% (15224/15224), done.
Checking connectivity... done.
Submodule path 'src/mom': checked out '030fb1f22af7a9f9a3d4a7dc197d6ace0684ae6b'
Cloning into 'src/oasis3-mct'...
remote: Counting objects: 472, done.
remote: Total 472 (delta 0), reused 0 (delta 0), pack-reused 472
Receiving objects: 100% (472/472), 5.27 MiB | 4.46 MiB/s, done.
Resolving deltas: 100% (124/124), done.
Checking connectivity... done.
Submodule path 'src/oasis3-mct': checked out '0d0f2ff4ee71c0fb9c1346ff4e60d62c56a15bf9' |
This worked too:
|
I asked someone else to test and it was ok for them too. Have you changed your GitHub ssh keys? This looks suspicious:
|
Ah oops sorry, the problem was at my end. Thanks Aidan for fixing it. |
Submodules create some serious traps for the unwary (i.e. me and @AndyHoggANU, earlier today). I hadn't realised that This means that
" Another source of confusion (for me, anyway) is that branches are specific to the submodule repo I'm in. So https://github.com/OceansAus/access-om2/wiki/Contributing-to-model-configurations needs to be updated - eg at step 4 the 'run' branch needs to be set up in the config dir, not the access-om2 dir. If the user switches back to access-om2 they will be back on the master branch for that overarching repo even though the config remains as the 'run' branch as it is in a separate repo. That makes sense now that I understand this better but is initially unintuitive and should be explained to users. More traps are in the "gotchas" section of https://git.wiki.kernel.org/index.php/GitSubmoduleTutorial The configs-as-submodules approach has a lot of merit, but the traps need to be made clear and the user documentation updated. |
re. connecting configs with src versions (#42 (comment)): the output of each run currently includes config.yaml, which specifies binaries with git hashes attached to their names (if the config was set up with hashexe.sh or equivalent). So that provides some measure of control over reproducibility (though it isn't enforced, eg a user could sidestep using hashexe.sh and manually change binary names and config.yaml to have incorrect/no hashes). |
I think there is a tension between the best approach for users and the best approach for maintainers. A better separation between code and model configs makes life easier for users. They need a new version, they just blow away their old source directory and pull in a fresh one, compile and they’re sweet. This is pretty much what @nicjhan was suggesting up above and I think I agree. I'm not sure about putting the source code inside the experimental config, as the source is common to all experiments. Associating a config with the code to run it is better addressed in other ways IMO. @AndyHoggANU wasn't sure because it sounded more complex, but in fact I think it is less complex for most users, just a bit more complex for the maintainer, but he is a total guru, and in theory it changes less frequently. |
We seem to have converged on a usable set-up. This conversation is being referenced here https://github.com/OceansAus/access-om2/wiki/Tutorials#Understanding-the-ACCESS-OM2-repository-layout Further work needs to be done to fix up this part of the wiki. |
@aidanheerdegen has written some documentation about this here: https://github.com/OceansAus/access-om2/wiki/Contributing-to-model-configurations
However it has not been implemented yet.
@aidanheerdegen I have a question. Where do you think we should put these experiment repositories? I imagine there may eventually be many and I'm reluctant to clutter up OceansAus. I seem to remember you started a new organisation for this kind of thing?
Thanks.
The text was updated successfully, but these errors were encountered: