Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving static lifetime problems #674

Merged
merged 105 commits into from
Oct 21, 2024
Merged

Conversation

timspainNERSC
Copy link
Collaborator

@timspainNERSC timspainNERSC commented Sep 3, 2024

Resolving static lifetime problems

Fixes #614


Change Description

Deleted classes

  • ScopedTimer
  • Timer
    • Removed these classes as they are currently unused and need to be heavily refactored to be thread- and possibly MPI-aware.

Edited classes

ConfigMap

Changed the static const type constants to an enum.

Configurator

Made the static data into singletonoid functions. The data in question are:

  • The list of configuration sources
  • The true command line
  • The pointer to the AdditionalConfiguration object

The command line values were combined into a struct to avoid passing more than one return value.
The pointer to the additional configuration was set to a nullptr initially, with null checks used when the pointer value is used.

Configured class template

Nothing depends on the keyMap template member variable being a template member variable. Redefine each key map as static (compilation unit localized) const std::map. Since many classes are configured, this affects many files. Only the source .cpp files are affected.

ConfiguredModule & Module

Changes to Module<>

  • Make Module independent of Nextsim. It is now a separate Module base class that can be used for any C++ project.
  • Introduce a NextsimModule class that specializes the newly generic Module for nextSIM-DG.
  • Replace #include "Module.hpp" with #include "NextsimModule.hpp".
  • Add a map from interface name to implementation name in the Module namespace alongside the Module template class. This string-to-string map is not available elsewhere. The data is stored as a function-local static variable.
  • Remove the class-static instance of the reference implementation from the Module<> template. This static instance is now a function-local static variable, so at least the construction order is well defined. The lifetime is still until the executable epilogue.
  • Delete the module classes. Where there was originally (for example) a DiagnosticOutputModule class that inherited from Module<IDiagnosticModule>, everything now refers to the template instantiation. The …Module class was necessary to run the register the module at static start-up. With this no longer necessary,
    • the classes are no longer needed
    • the class headers (…Module.hpp) have been removed
    • the class header creation script module_header.py has been removed
    • the command in CMakeLists.txt to run the header script has been removed.
    • #includes of the …Module.hpp file have been replaced with an #include of NextsimModule.hpp and an #include of the interface header file.
  • Change the API of Module<>. It now depends on the definition of two classes before it is #included:
  • HelpMap
  • Config which defines a std::string Config::getImpl(const std::string&) function.
  • There may be a better way to handle this, but more template arguments was much too complex a solution.
  • Add internal options to set the implementation without setting the unique instance to prevent recursive initialization. The external API remains the same.
  • Move the class-static data in Module<> to be function-local statics. This requires a slight change to the API and a different format in the generated module.cpp files.
  • The fn type is now a plain function pointer, rather than a std::function object. This required replacing a lambda with an ad-hoc function in one test.
  • The variables holding the implementation names are now namespaced into namespace Module::InterfaceClass.
  • The necessary changes to module_builder.py to support the above changes.
  • Adds a finalizer method to the Module<> template. This will call the destructor on the static instance, allowing all the data arrays the static instance holds to be deleted.

Changes to ConfiguredModule

  • Remove static data from ConfiguredModule. Module configurations are now parsed on demand.
  • Remove the call from main to parse the configuration for the modules.
  • With the change to lazy configuration, it is impossible to reconfigure modules automatically from the config text. Change to explicit calls to setImplementation().
  • The ConfiguredModule_test now only checks that the module configuration options are parsed correctly. Everything else is handeled by the Module<> template class.

Miscellaneous other changes

  • Removed the directory ModuleLoaderTestModules. These are very defunct.
  • A fix to IceGrowth_test. This is so far from the Module changes, I don't see how it could have been working before.

FileCallbackCloser

Replaced the static std::list with a singleton function providing a std::list.

Iterator

Removed the static implementation of NullIterator.

MissingData

Replaced the static double missing data value with a singleton, inlined access function. The missing data value is created on first access, but still destroyed at the end of execution. This is fine, since it is a scalar double.

Removed the MissingData.cpp source file as it is no longer needed.

ModelComponent

  • Removed all references to the registerModule and the data members it uses. Also remove hField() and similar functions, which don't now do anything, and were originally planned to be used alongside the module register.
  • Moved the ModelArrayRefStore to be lazily allocated.
  • Moved the ocean mask and ocean index look-up vector to be lazily allocated.

ModelConfig

Moved the key specification to an .ipp file to avoid repeating the text in Model.cpp.

ParaGridIO

  • Moved the maps between netCDF concepts and ModelArray equivalents to const instance variables that are initialized as part of the ctor.
  • Moved the static maps from file names to netCDF files and current time indices to a singleton-like function. Also combined the maps, to ensure they remain consistent.
  • Added the Finalizer class to to what std::atexit does, except at our convenience, rather than executable clean-up. This includes a new test for the new class. This change is the same as that contained in PR Finalize classes and objects without using std::atexit #683

New classes

Finalizer

  • Added the Finalize class, which mimics std::atexit(), but can be called before the end of main().
  • Added the Finalizer.cpp source file to the CMake system.
  • Added a test of the Finalizer class.
  • Added a Module::finalize<> function template.
  • Added the instantiation of the above function template to the module generation file.
  • Added registration of finalization for a module wherever the module implementation if configured.
    • By calling registerUnique() each module instantiation is only called once when the finalization occurs.

Test Description

Apart from deleted tests for deleted classes, all tests should still pass.

ModelComponent

Adds some test to ModelArray_test to check sizes around assignment operations.

Finalizer

There is a test for the new Finalizer class.


Documentation Impact

No changes to existing documentation. All classes should provided the same interfaces as previously.

timspainNERSC and others added 2 commits September 3, 2024 10:00
# Remove timer classes.

Fixes #614 (partially)

The classes `Timer`and `ScopedTimer` both use static data, and so need
to be refactored as part of issue #614. However, they are currently
unused in the model and additionally need to be made thread-safe and
possibly MPI aware, depending on how the timings should be reported.

This PR entirely removes the classes from the code base.
@timspainNERSC timspainNERSC added bug Something isn't working enhancement New feature or request labels Sep 3, 2024
@timspainNERSC timspainNERSC added this to the 3 Stand-alone model milestone Sep 3, 2024
@timspainNERSC timspainNERSC self-assigned this Sep 3, 2024
timspainNERSC and others added 24 commits September 3, 2024 10:42
# Remove the static implementaion of NullIterator.

Fixes #614 (partially)
# Pull Request Title

Fixes #(your issue number)

### Task List
- [ ] Defined the tests that specify a complete and functioning change
(*It may help to create a [design specification & test
specification](../../../wiki/Specification-Template)*)
- [ ] Implemented the source code change that satisfies the tests
- [ ] Documented the feature by providing worked example
- [ ] Updated the README or other documentation
- [ ] Completed the pre-Request checklist below

---
# Change Description

*Please write a summary of the change and specify which Issue this
fixes. This should cover the motivation for making the change and
provide almost enough context to review the change in isolation. While
the issue description is a good place to discuss the
failure/requirements, this is the ideal space for explaining why this
design fixes the issue and why you designed it the way you did.*

- *If there are important design decisions, particularly those that
leverage functionality outside this change or were caused by existing
limitations, they should be noted here.*
- *If there is a design specification which relates to this PR, please
provide a link to it here.*
- *If the PR relates to an open issue, please link the issue to the PR
by using the 'fixes #issue_number' syntax.*
- *If the PR relates to several open issue, make sure to use 'fixes
#issue_1, fixes #issue_2, etc.' syntax.*
- *If there are any dependencies (third-party or internal downstream)
which are added or impacted, please note them here. And if their
limitations informed your design, please note how they did so here
(particularly if this information was not captured in a design
specification).*

Text goes here

---
# Test Description

*Please describe the tests you have written that verify the enhancement
works or that the bug fix has been remedied. For a bug fix, please
ensure a complete description of the failure is available here or in the
issue such that a reviewer can verify that the testing is suitable. This
is a good place to highlight any manual testing you may have had to
perform.*

*Please describe any operating system limitations placed on the testing
(if appropriate).*

*If your tests fixes any issue, please explain that in the comment you
provided in the test file by referencing the issue within the comment
and please record the mapping here e.g.*
- *testA fixes issue 404*
- *testB fixes issue 999*

Text goes here

---
# Documentation Impact

*Please describe any changes to the documentation not captured above or,
if made in a seperate pull request, please link to the corresponding PR
or issue.*

Text goes here

---
# Other Details

*If you have run any static analysis, including complexity analysis, or
run coverage testing which has not been captured by an automated tool,
please link or copy it here.*

Text goes here

---
### Pre-Request Checklist

- [ ] The requirements of this pull request are fully captured in an
issue or design specification and are linked and summarised in the
description of this PR
- [ ] No new warnings are generated
- [ ] The documentation has been updated (or an issue has been created
to track the corresponding change)
- [ ] Methods and Tests are commented such that they can be understood
without having to obtain additional context
- [ ] This PR/Issue is labelled as a bug/feature/enhancement/breaking
change
- [ ] File dates have been updated to reflect modification date
- [ ] This change conforms to the conventions described in the README

*Please complete your pull request description and delete the
instructional text before submitting.*

*Congratulations and thank you for making your contribution to
neXtSIM_DG!*
# Change ConfigMap static constants to an enum.

Fixes #614 (partially)

---
# Change Description

Change static const ints to an enum.
# Remove the `FileCallbackCloser` static list.

Fixes #614 (partially)

---
# Change Description

Replace with a singleton function providing a list.

---
# Test Description

The class unit test still passes.
timspainNERSC and others added 7 commits October 1, 2024 16:54
# ParaGridO static data

Fixes #614 (partially)

---
# Change Description

Moved the maps between netCDF concepts and `ModelArray` equivalents to
`const` instance variables that are initialized as part of the ctor.

Moved the static maps from file names to netCDF files and current time
indices to a singleton-like function. Also combined the maps, to ensure
they remain consistent.

Added the `Finalizer` class to to what `std::atexit` does, except at our
convenience, rather than executable clean-up. This includes a new test
for the new class. This change is the same as that contained in PR #683

---
# Test Description

Added a test for the new `Finalizer` class.

The `ParaGrid_test` still passes, this mainly tests the parts that pass
through to `ParaGridIO`.
@timspainNERSC timspainNERSC changed the base branch from develop to main October 15, 2024 08:49
@timspainNERSC timspainNERSC changed the base branch from main to develop October 15, 2024 08:49
Copy link
Member

@Thanduriel Thanduriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes greatly improve resource lifetime management. In fact, it is now possible to get useful information out of valgrind:

==41968== LEAK SUMMARY:
==41968==    definitely lost: 512 bytes in 3 blocks
==41968==    indirectly lost: 3,569,016 bytes in 24 blocks
==41968==      possibly lost: 1,186,944 bytes in 21 blocks
==41968==    still reachable: 110,717 bytes in 719 blocks
==41968==         suppressed: 0 bytes in 0 blocks

Not bad :D. Two of the leaks are from raw pointers holding heap data in the dynamics and they can be addressed in a different issue. The third one is in Iterator. I think this leak should be addressed before merging the changes.

My other comments are just for cosmetic code changes, so they are not essential.

For the big picture of resource management, a possible pitfall which remains is that modules have to be manually registered for destruction via Module::finalize. That seems manageable but is there a reason why this should be done manually instead of automatically, e.g. on construction?

core/src/include/Module.hpp Outdated Show resolved Hide resolved
core/src/include/Module.hpp Outdated Show resolved Hide resolved
core/src/include/Module.hpp Outdated Show resolved Hide resolved
core/src/include/Module.hpp Outdated Show resolved Hide resolved
core/src/Iterator.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@TomMelt TomMelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @timspainNERSC. Just a few comment/questions from my side.

core/src/ConfiguredModule.cpp Outdated Show resolved Hide resolved
core/src/ConfiguredModule.cpp Outdated Show resolved Hide resolved
Comment on lines +18 to +24
bool Finalizer::registerUnique(const FinalFn& fn)
{
if (contains(fn))
return false;
registerFunc(fn);
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this function returns a bool. Looks like it is mostly used like a void function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value indicates whether the function was successfully registered (true) or not (false). I this is useful enough functionality, given the minimal cost. Even if it is not currently used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my concern was, if you thought you were using it but you were not. Or if it was left in for some use case that we no longer need.

Do you think it would be more useful to raise an exception if the user tries to register an already registered function? Given the use in the rest of the code, my concern is that new developers are not likely to test the boolean return value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there as a service if the developer wants to know if the function was already registered. How it is used currently, I don't care, so I don't check. If a future developer does care, then they can check.

core/src/ParaGridIO.cpp Outdated Show resolved Hide resolved
@@ -46,6 +46,8 @@ ConfigurationHelp::HelpMap& ERA5Atmosphere::getHelpRecursive(HelpMap& map, bool

void ERA5Atmosphere::configure()
{
Finalizer::registerUnique(Module::finalize<IFluxCalculation>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought... is there a way to force finalization without relying on the user to remember they have to do it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost certainly! But what does this use case look like? Which classes would be automatically finalized? I think this is a big enough piece of work on its own that is should be a separate issue where we could discuss the scope and implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My use case was more around the fact the someone may forget to register the module finalization. By default we could finalize everything at the end if it hasn't been manually triggered beforehand. Maybe I am missing something though. Happy to discuss in the next meeting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's say I forget to do this step. How bad is it?

Copy link
Collaborator Author

@timspainNERSC timspainNERSC Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CPU version? It's what develop is doing right now without this code, so no problem at all.
In the GPU version, my understanding is that not cleaning up data buffers before the GPU process finishes is a big headache. That is what a lot of the changes in this PR are aiming to address.

core/test/Finalizer_test.cpp Outdated Show resolved Hide resolved
core/test/Finalizer_test.cpp Outdated Show resolved Hide resolved
@TomMelt
Copy link
Contributor

TomMelt commented Oct 18, 2024

Thanks @timspainNERSC , I have approved from my side. I left two conversations "unresolved" just so we can find them easily in the next team meeting and discuss. No outstanding issues. Thanks for addressing my comments

@timspainNERSC
Copy link
Collaborator Author

For the big picture of resource management, a possible pitfall which remains is that modules have to be manually registered for destruction via Module::finalize. That seems manageable but is there a reason why this should be done manually instead of automatically, e.g. on construction?

I thought that there were cases where registering the final methods at construction would not work. But I cannot identify any right now. This might be worth looking at in the (not to distant) future, but can be a separate issue.

@timspainNERSC timspainNERSC merged commit cb19d6a into develop Oct 21, 2024
5 checks passed
@timspainNERSC timspainNERSC mentioned this pull request Oct 22, 2024
Thanduriel added a commit that referenced this pull request Nov 12, 2024
# memory leaks in dynamics

Fixes #716 

# Change Description

With #674 integrated, all dynamic buffers held by modules should get
freed before the program terminates. This update replaces raw pointers
with `std::unique_ptr` for heap buffers that are used internally by the
dynamics.

# Test Description
A manual test was done by running the dynamics benchmark for 1 time step
with valgrind:
```
==74249== LEAK SUMMARY:
==74249==    definitely lost: 0 bytes in 0 blocks
==74249==    indirectly lost: 0 bytes in 0 blocks
==74249==      possibly lost: 7,600 bytes in 19 blocks
==74249==    still reachable: 110,406 bytes in 717 blocks
==74249==         suppressed: 0 bytes in 0 blocks

```

The setup that was used is not suited for an automated test as it is
takes too long (init + 1 step on 32x32 grid takes upwards of 30mins).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request ICCS Tasks or reviews for the ICCS team
Projects
Development

Successfully merging this pull request may close these issues.

incorrect resource lifetimes due to static data
3 participants