-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving to a single registry and importing on request not automatically #323
Moving to a single registry and importing on request not automatically #323
Conversation
…nts_loader to preserve core.constants for constants classes
…ing unused calls to old config.register_schema
Just to demonstrate the registry structure and mechanism, which we'll need to work into the tests: In [1]: from virtual_rainforest.core.registry import register_module, MODULE_REGISTRY
In [2]: register_module('virtual_rainforest.models.plants')
[INFO] - registry - register_module(72) - Registering virtual_rainforest.models.plants module components
[INFO] - registry - register_model(145) - Registered class for plants model: PlantsModel
[INFO] - registry - register_schema(179) - Schema registered for module plants: /Users/dorme/Research/Virtual_Rainforest/virtual_rainforest/virtual_rainforest/models/plants/model_schema.json
[INFO] - registry - register_constants_classes(222) - Constants class plants.PlantsConsts registered
In [3]: register_module('virtual_rainforest.core')
[INFO] - registry - register_module(72) - Registering virtual_rainforest.core module components
[INFO] - registry - register_schema(179) - Schema registered for module core: /Users/dorme/Research/Virtual_Rainforest/virtual_rainforest/virtual_rainforest/core/core_schema.json
In [4]: MODULE_REGISTRY
{'core': {
'constants': {}, # None defined yet.
'schema': {'properties': {'core': ...}},
},
'plants': {
'constants': {'PlantsConsts': <class 'virtual_rainforest.models.plants.constants.PlantsConsts'>},
'model': <class 'virtual_rainforest.models.plants.plants_model.PlantsModel'>,
'schema': {'properties': {'plants': ... }}
} |
Point of concernThis revision does not populate the registry for a module until that I think that would also likely reduce the amount of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change seems sensible to me!
I do wonder whether core
should really be contained in the registry though as it kind of creates the impression that virtual_rainforest.core
is a module that could be switched out for an alternative "core", which I don't think is something we would ever want to support? I guess the core constants existing outside the registry would be a bit weird though?
I did think about having a separate CORE_REGISTRY, but really having a single place to look for module components seemed like less code to maintain. I have just been wondering if we want a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely an improvement. Good work!
I do have some suggestions and queries though.
virtual_rainforest/core/registry.py
Outdated
MODULE_REGISTRY: dict[str, Any] = {} | ||
"""Information about module schemas, constants and model, keyed by module name. | ||
|
||
The registry is a dictionary keyed by the final part of the module name. The entry for | ||
each module is itself a dictionary with 'model', 'constants' and 'schema' entries. For | ||
example: | ||
|
||
MODULE_REGISTRY['plants']['model'] = <virtual_rainforest.models.plants.PlantsModel> | ||
MODULE_REGISTRY['plants']['constants'] = [ | ||
'PlantsConsts': <virtual_rainforest.models.plants.constants.PlantsConsts> | ||
] | ||
MODULE_REGISTRY['plants']['schema'] = {schema_dict} | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If each element of MODEL_REGISTRY
is a dict with the same members, couldn't you use a dataclass for this instead? That way you get better typing etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to raise this point - you had this structure in an earlier post. The problem I ran into here is that importing those types to construct the dataclass causes some horrific circularity issues. Part of the reason I have separated schema
and registry
here is that they are then very low level and don't require much else from VR. Once you import BaseModel
or things like it then I was getting circular imports all over the place.
It would certainly be much neater to type this more explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the relevant types be put into their own .py
files then? I can't imagine the MODEL_REGISTRY
would be needed in those files, so there shouldn't be any circular import problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that BaseModel
has a whole bunch of higher level imports:
from virtual_rainforest.core.axes import AXIS_VALIDATORS
from virtual_rainforest.core.config import Config
from virtual_rainforest.core.data import Data
All of those structures come after configuration, so the circular imports from importing BaseModel
seem really hard to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given a first pass and mentioned a couple of things than look a bit odd to me. Maybe I've missunderstood somehinght.
Other than that, I think loading models on-demand as needed by the simulation to be run is a good move and much cleaner than loading everything. So, while there might be things to polish on the implementation side, my 👍 on purpose of the PR.
Codecov Report
@@ Coverage Diff @@
## develop #323 +/- ##
===========================================
- Coverage 96.43% 95.73% -0.71%
===========================================
Files 48 51 +3
Lines 2440 2484 +44
===========================================
+ Hits 2353 2378 +25
- Misses 87 106 +19
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…always-causes-models-to-be-loaded
Round 4OK - so this now implements a revised mode of operation, following discussion, that explicitly separates the registration process from module import. The The test suite is now updated to work with this new approach and all tests pass. That has included the addition of a global The docs are update too now. I've turned this from WIP into a full PR. I know it's a lot of changes but the key things that need attention are really just the latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -31,8 +32,8 @@ def test_vr_run(capsys): | |||
str(example_dir), | |||
"--outpath", | |||
str(example_dir), | |||
"--logfile", | |||
str(example_dir / "vr_example.log"), | |||
# "--logfile", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these supposed to be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - but there's a whole bunch of fixing over in #331 which will replace much of this.
), | ||
) | ||
from virtual_rainforest.models.litter.constants import LitterConsts | ||
from virtual_rainforest.models.litter.litter_model import LitterModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular value in importing these within (almost) every test rather than just at the top of the script. I can definitely see the value for things only used in specific tests, but if LitterModel
can't be imported then basically every test will break anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood @dalonsoa correctly, I think the idea is that as much as possible of the environment used by each test should be isolated within it? With some vr
things - notably custom Exceptions
- we have to import them at the module level so they can be used in parameterisation, but otherwise tests should be self contained? Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. As much as possible, setting up the tests (as oppose to running them) should not depend on the package under test itself, and that means having all the imports - even if repeated - within the tests. This avoids side effects (keeping appropriate isolation) as well as preventing the whole test suit to fail at setup time just because a failure of the package under test.
Having said that, as David points out, there are specific use cases where that is not possible, but it should be the general approach.
Description
Issues #278 and #302 request that modules are loaded on demand and not automatically and that the existing registries for models, constants and schema are integrated. Those are difficult to solve in isolation, so this does both.
This PR is currently a work in progress. I think it fully implements the requested functionality but it does nothing to update the tests or docs to match. If the changes makes sense then we can then roll out those additional commits.
This also fixes #124 - bad schemas will now fail at
register_module
rather than cause the package import to fail.Overview
Two new
core
submodules are added and one is renamed:schema.py
simply moves the JSONSchema loading and merging code out ofconfig
to simplify the file structure.registry.py
collects and updates the existing registration functions frombase_model
(register_model
),constants
(register_constant_class
) andconfig
(register_schema
). It then adds an umbrella functionregister_module
that wraps those and theMODULE_REGISTRY
itself.constants.py
submodule is used as the home for constants dataclasses, and is used in registering those classes. We expect to want to do that incore
too, so the existing functionality incore.constants
has moved tocore.constants_loader
.The
MODULE_REGISTRY
includescore
as well as models. The registry is structured as a dict keyed by module short name, each of which contains a dict of with keys['model', 'constants', 'schema']
. The entries for each of those elements is identical to the previous..._REGISTRY
object. So, for example, the plants model is atMODULE_REGISTRY['plants']['model']
. Thecore
module does not have an entry formodel
.The
register_module
function automatically imports the requested module and then searches for a single BaseModel subclass (except incore
), any data classes inxyz.constants
and then the appropriate JSONSchema. Because it searches for the BaseModel subclass usinginspect
, individual models now only have to import that object in__init__
, and all the registration code in__init__
files is removed.Within the
vr_run
flow, models cannot be registered until the model config is built and merged. At that point, we should have the model list, so modules need to be registered then before we can validate the config, which is insideConfig.build_schema
. So that method now callsregister_module('virtual_rainforest.core')
and thenregister_module('virtual_rainforest.model.xyz')
for the requested models right before building the schema.I've then stripped out all of the previous autodiscovery code and all of the apparatus around the previous
SCHEMA_REGISTRY
,MODEL_REGISTRY
andCONSTANTS_REGISTRY
.So - this does run using
vr_run
- the mechanisms seem to work as expected, but please give this WIP a solid kicking before we do anything rash like testing or documenting it. There are a lot of changes but a lot of it is just moving code locations.Fixes #278 and #302
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks