Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leaks and other issues identified by the code sanitizer #1353

Merged
merged 7 commits into from
Nov 10, 2022

Conversation

yantosca
Copy link
Contributor

@yantosca yantosca commented Aug 16, 2022

Using the -DSANITIZE=y CMake option (which invokes code sanitization) has identified several memory leaks and similar issues in GEOS-Chem code. These issues can be caused by:

  • Arrays (or array fields of objects) that are allocated but not deallocated
  • Attempting to store a number in a variable that cannot accommodate it (i.e. trying to store 1e150 in a REAL*4 variable)
  • Attempting to store a value of the wrong type into a variable
  • etc.

This PR should be merged along with geoschem/GCClassic#24 (which adds the proper compiler flags for the -DSANITIZE=y option) and geoschem/HEMCO#159 (which removes memory leaks etc. in HEMCO code).

For now, tagging @lizziel @msulprizio @Jourdan-He

Running GEOS-Chem with -DSANITIZE=y has found several memory leaks
in Headers/input_opt_mod.F90.  These were resolved by adding
deallocate statements for array fields of Input_Opt that had been
allocated but not deallocated.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca added category: Feature Request New feature or request topic: Performance Related to GEOS-Chem speed, memory, or parallelization labels Aug 16, 2022
@yantosca yantosca added this to the 14.1.0 milestone Aug 16, 2022
@yantosca yantosca self-assigned this Aug 16, 2022
The variable TimeForEmis (which denotes if it is time for emissions)
was being passed to the Phase 0 call of Emissions_Run but was never
defined.  It should be set to .FALSE. here because in Phase 0 we
are only reading restart file and met field input but not emissions.

This update removes the chance that TimeForEmis might evaluate to .TRUE.
at random because of a junk initial value.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca added memory leak or error topic: Structural Modifications Related to GEOS-Chem structural modifications (as opposed to scientific updates) and removed topic: Performance Related to GEOS-Chem speed, memory, or parallelization labels Aug 17, 2022
Headers/registry_mod.F90
- In routine Registry_AddField, we must free the memory allocated to
  the Item object before leaving the subroutine.

History/history_mod.F90
- In routine History_AddItemToCollection, we must free the memory
  allocated to the Item object before leaving the subroutine.

History/history_netcdf_mod.F90
- In routine IndexVarList_Create, we must free the memory allocated
  to the Item object at the end of the DO loop.

Signed-off-by: Bob Yantosca <[email protected]>
History/history_netcdf_mod.F90
- Removed a leftover USE statement.  We no longer need to reference
  HistItem_Destroy within History_Netcdf_Define (this was added for
  testing/debugging).

Signed-off-by: Bob Yantosca <[email protected]>
@msulprizio msulprizio changed the base branch from dev to main October 18, 2022 16:51
@msulprizio msulprizio changed the base branch from main to dev/14.1.0 November 2, 2022 16:36
@yantosca yantosca marked this pull request as ready for review November 4, 2022 20:15
@yantosca yantosca changed the title [WIP] Fix memory leaks and other issues identified by the code sanitizer Fix memory leaks and other issues identified by the code sanitizer Nov 4, 2022
@lizziel
Copy link
Contributor

lizziel commented Nov 4, 2022

I pulled the updates into a test branch and ran GCHP with and without the fixes. Here are a sampling of memory usage across just an hour run:
Before fixes:
GCHP_without_GC_memory_fixes

After fixes:
GCHP_with_GC_memory_fixes

I am not sure why the GEOS-Chem fixes seem to correct ALL of the memory leaks (at least as detectable this precision), including a small one in advection, but I'll take it.

I recommend putting this update, and the related PRs, into 14.0.2 if they are zero diff. We can then bring it into CESM and GEOS in that version.

@yantosca
Copy link
Contributor Author

yantosca commented Nov 4, 2022

I will run some integration tests with the sanitize updates merged up to the 14.0.1 and HEMCO 3.5.1 versions.

@yantosca
Copy link
Contributor Author

yantosca commented Nov 4, 2022

I can close this PR and open a new PR to dev/14.0.2.

@msulprizio
Copy link
Contributor

@yantosca There is no need to close this PR. You can simply modify the base branch to dev/14.0.2 by clicking 'Edit' above next to the title. Right below the title you should see a drop down menu that currently lists base:dev/14.1.0. Click that and select dev/14.0.2 instead.

@yantosca yantosca modified the milestones: 14.1.0, 14.0.2 Nov 7, 2022
@yantosca
Copy link
Contributor Author

yantosca commented Nov 7, 2022

I think what we can do is to move the -g etc. options in the standard compilation into a separate PR and keep the compiling options the same. That way it wont' delay the 14.0.2 release until we have time to properly check the impact of these flags. Thanks for the explanation@kilicomu!

@yantosca yantosca changed the base branch from dev/14.1.0 to dev/14.0.2 November 7, 2022 14:36
@yantosca
Copy link
Contributor Author

yantosca commented Nov 7, 2022

Some GEOS-Chem Classic issues

==============================================================================
GEOS-Chem Classic: Execution Test Results

Using 24 OpenMP threads
Number of execution tests: 48
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gc_025x03125_CH4_geosfp_47L_na...................Execute Simulation.....PASS
gc_025x03125_fullchem_geosfp_47L_na..............Execute Simulation.....FAIL
gc_05x0625_CH4_merra2_47L_na.....................Execute Simulation.....PASS
gc_05x0625_fullchem_merra2_47L_na................Execute Simulation.....FAIL
gc_2x25_aerosol_merra2...........................Execute Simulation.....FAIL
gc_2x25_CH4_merra2...............................Execute Simulation.....PASS
gc_2x25_CO2_merra2...............................Execute Simulation.....FAIL
gc_2x25_fullchem_merra2..........................Execute Simulation.....FAIL
gc_2x25_Hg_merra2................................Execute Simulation.....FAIL
gc_2x25_metals_merra2............................Execute Simulation.....PASS
gc_2x25_POPs_BaP_merra2..........................Execute Simulation.....PASS
gc_2x25_tagCH4_merra2............................Execute Simulation.....PASS
gc_2x25_tagCO_merra2.............................Execute Simulation.....PASS
gc_2x25_tagO3_merra2.............................Execute Simulation.....PASS
gc_2x25_TransportTracers_merra2..................Execute Simulation.....PASS
gc_2x25_TransportTracers_merra2_LuoWd............Execute Simulation.....PASS
gc_4x5_aerosol_geosfp............................Execute Simulation.....FAIL
gc_4x5_aerosol_merra2............................Execute Simulation.....FAIL
gc_4x5_CH4_geosfp................................Execute Simulation.....PASS
gc_4x5_CH4_merra2................................Execute Simulation.....PASS
gc_4x5_fullchem_aciduptake_merra2................Execute Simulation.....FAIL
gc_4x5_fullchem_APM_merra2.......................Execute Simulation.....FAIL
gc_4x5_fullchem_benchmark_merra2.................Execute Simulation.....FAIL
gc_4x5_fullchem_complexSOA_merra2................Execute Simulation.....FAIL
gc_4x5_fullchem_complexSOA_SVPOA_merra2..........Execute Simulation.....FAIL
gc_4x5_fullchem_geosfp...........................Execute Simulation.....FAIL
gc_4x5_fullchem_marinePOA_merra2.................Execute Simulation.....FAIL
gc_4x5_fullchem_merra2...........................Execute Simulation.....FAIL
gc_4x5_fullchem_merra2_47L.......................Execute Simulation.....FAIL
gc_4x5_fullchem_merra2_LuoWd.....................Execute Simulation.....FAIL
gc_4x5_fullchem_RRTMG_merra2.....................Execute Simulation.....FAIL
gc_4x5_fullchem_TOMAS15_merra2_47L...............Execute Simulation.....FAIL

Now investigating

@yantosca
Copy link
Contributor Author

yantosca commented Nov 7, 2022

The error seems to be:

*** Error in `./gcclassic': double free or corruption (!prev): 0x00000000c45b85c0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x2b2f4fa8e329]
./gcclassic[0xfce3e6]
./gcclassic[0xfce925]
./gcclassic[0xf495da]
./gcclassic[0x5d3896]
./gcclassic[0x4dea20]
./gcclassic[0x486772]
./gcclassic[0x40aadc]
./gcclassic[0x40bdb2]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b2f4fa2f555]
./gcclassic[0x4037b9]

The "double free or corruption" error means that we are trying to deallocate an array or pointer that has already been allocated.

@yantosca yantosca added the no-diff-to-benchmark This update will not change the results of fullchem benchmark simulations label Nov 7, 2022
@yantosca
Copy link
Contributor Author

yantosca commented Nov 7, 2022

I have identified the cause of the error: In HEMCO/src/Extensions/hcox_gfed_mod.F90, we were attempting to deallocate the field Inst%GFED_EMFAC. However, this field simply points to Inst%GFED4_EMFAC and is not allocated with any memory. This was causing the "double-free or corruption" error. The solution to this error is to just nullify the pointer (i.e. Inst%GFED_EMFAC => NULL() instead of trying to deallocate it. Doing this allows GEOS-Chem to proceed to termination gracefully.

Integration tests have been resubmitted.

Also, I have done a parallelization test with 8 cores vs. 5 cores. The results show that the sanitize updates are zero-diff:

Using configuration file rst.yml
... Printing totals and differences
... Only showing variables with |absolute difference| > 0.0
Variable                 Ref=5cores             Dev=8cores                Dev - Ref
```

@yantosca
Copy link
Contributor Author

yantosca commented Nov 7, 2022

The resubmitted GEOS-Chem Classic integration tests all passed:

==============================================================================
GEOS-Chem Classic: Execution Test Results

Using 24 OpenMP threads
Number of execution tests: 48
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gc_025x03125_CH4_geosfp_47L_na...................Execute Simulation.....PASS
gc_025x03125_fullchem_geosfp_47L_na..............Execute Simulation.....PASS
gc_05x0625_CH4_merra2_47L_na.....................Execute Simulation.....PASS
gc_05x0625_fullchem_merra2_47L_na................Execute Simulation.....PASS
gc_2x25_aerosol_merra2...........................Execute Simulation.....PASS
gc_2x25_CH4_merra2...............................Execute Simulation.....PASS
gc_2x25_CO2_merra2...............................Execute Simulation.....PASS
gc_2x25_fullchem_merra2..........................Execute Simulation.....PASS
gc_2x25_Hg_merra2................................Execute Simulation.....PASS
gc_2x25_metals_merra2............................Execute Simulation.....PASS
gc_2x25_POPs_BaP_merra2..........................Execute Simulation.....PASS
gc_2x25_tagCH4_merra2............................Execute Simulation.....PASS
gc_2x25_tagCO_merra2.............................Execute Simulation.....PASS
gc_2x25_tagO3_merra2.............................Execute Simulation.....PASS
gc_2x25_TransportTracers_merra2..................Execute Simulation.....PASS
gc_2x25_TransportTracers_merra2_LuoWd............Execute Simulation.....PASS
gc_4x5_aerosol_geosfp............................Execute Simulation.....PASS
gc_4x5_aerosol_merra2............................Execute Simulation.....PASS
gc_4x5_CH4_geosfp................................Execute Simulation.....PASS
gc_4x5_CH4_merra2................................Execute Simulation.....PASS
gc_4x5_fullchem_aciduptake_merra2................Execute Simulation.....PASS
gc_4x5_fullchem_APM_merra2.......................Execute Simulation.....PASS
gc_4x5_fullchem_benchmark_merra2.................Execute Simulation.....PASS
gc_4x5_fullchem_complexSOA_merra2................Execute Simulation.....PASS
gc_4x5_fullchem_complexSOA_SVPOA_merra2..........Execute Simulation.....PASS
gc_4x5_fullchem_geosfp...........................Execute Simulation.....PASS
gc_4x5_fullchem_marinePOA_merra2.................Execute Simulation.....PASS
gc_4x5_fullchem_merra2...........................Execute Simulation.....PASS
gc_4x5_fullchem_merra2_47L.......................Execute Simulation.....PASS
gc_4x5_fullchem_merra2_LuoWd.....................Execute Simulation.....PASS
gc_4x5_fullchem_RRTMG_merra2.....................Execute Simulation.....PASS
gc_4x5_fullchem_TOMAS15_merra2_47L...............Execute Simulation.....PASS
gc_4x5_fullchem_TOMAS40_merra2_47L...............Execute Simulation.....PASS
gc_4x5_Hg_geosfp.................................Execute Simulation.....PASS
gc_4x5_Hg_merra2.................................Execute Simulation.....PASS
gc_4x5_metals_merra2.............................Execute Simulation.....PASS
gc_4x5_POPs_BaP_geosfp...........................Execute Simulation.....PASS
gc_4x5_POPs_BaP_merra2...........................Execute Simulation.....PASS
gc_4x5_tagCH4_geosfp.............................Execute Simulation.....PASS
gc_4x5_tagCH4_merra2.............................Execute Simulation.....PASS
gc_4x5_tagCO_geosfp..............................Execute Simulation.....PASS
gc_4x5_tagCO_merra2..............................Execute Simulation.....PASS
gc_4x5_tagO3_geosfp..............................Execute Simulation.....PASS
gc_4x5_tagO3_merra2..............................Execute Simulation.....PASS
gc_4x5_TransportTracers_geosfp...................Execute Simulation.....PASS
gc_4x5_TransportTracers_geosfp_LuoWd.............Execute Simulation.....PASS
gc_4x5_TransportTracers_merra2...................Execute Simulation.....PASS
gc_4x5_TransportTracers_merra2_LuoWd.............Execute Simulation.....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 48
Execution tests failed: 0
Execution tests not yet completed: 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  All execution tests passed!  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

@yantosca
Copy link
Contributor Author

yantosca commented Nov 7, 2022

The resubmitted GCHP integration tests all passed:

==============================================================================
GCHP: Execution Test Results

Number of execution tests: 3
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gchp_fullchem_benchmark_merra2_c48...............Execute Simulation.....PASS
gchp_fullchem_standard_merra2_c24................Execute Simulation.....PASS
gchp_TransportTracers_geosfp_c24.................Execute Simulation.....PASS
 
Summary of execution test results:
------------------------------------------------------------------------------
Execution tests passed:        3
Execution tests failed:        0
Execution tests not completed: 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  All execution tests passed!  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Copy link
Contributor

@msulprizio msulprizio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to merge into 14.0.2.

@msulprizio
Copy link
Contributor

I just noticed the CHANGELOG.md has not been updated for these fixes. I recommend updating that before merging.

Copy link
Contributor Author

OK, will do. thanks for the reminder!

Added to unreleased 14.0.2 section

Signed-off-by: Bob Yantosca <[email protected]>
CHANGELOG.md Show resolved Hide resolved
Following the suggestion @lizziel, that this is more of a fix than
a change.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca merged commit 114f604 into dev/14.0.2 Nov 10, 2022
@yantosca yantosca deleted the feature/sanitize branch November 10, 2022 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Feature Request New feature or request no-diff-to-benchmark This update will not change the results of fullchem benchmark simulations topic: Structural Modifications Related to GEOS-Chem structural modifications (as opposed to scientific updates)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants