Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"MEMLEAK" indications in several acme_developer tests on edison #1636

Closed
ndkeen opened this issue Jul 13, 2017 · 5 comments
Closed

"MEMLEAK" indications in several acme_developer tests on edison #1636

ndkeen opened this issue Jul 13, 2017 · 5 comments
Assignees

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Jul 13, 2017

For a while now, the "smallville" test has been issuing a MEMLEAK warning on several platforms and it was suggested to ignore it. But in the last month or so, I see 5 new MEMLEAKS with tests in acme_developer -- but only on edison. I don't see these warnings on cori-haswell or cori-knl. I also tried using the intel v17 compiler on edison and see the same thing (currently edison using intel v15).

It seems strange that there really is a memory leak, so it could be an issue with how the memory is reported or how the decision is made to flag as MEMLEAK. I'm just documenting here.

ERS.f19_g16_rx1.A.edison_intel
ERS_IOP.f19_g16_rx1.A.edison_intel
ERS_IOP4c.f19_g16_rx1.A.edison_intel
ERS_IOP4p.f19_g16_rx1.A.edison_intel
NCK.f19_g16_rx1.A.edison_intel
SMS_Ly2_P1x1.1x1_smallvilleIA.ICLM45CNCROP.edison_intel.force_netcdf_pio

Rob suggested that the following PR may have changed some things with respect to this. I am not sure yet, but the timing of it matches.

#1532

@ndkeen ndkeen added the Edison label Jul 13, 2017
@rljacob
Copy link
Member

rljacob commented Jul 13, 2017

@ndkeen
Copy link
Contributor Author

ndkeen commented Jul 14, 2017

Last night I looked at the memory values for one of the tests. On edison, the first day is lower than the rest -- and then they stabilize. On Cori, they are all fairly even. I was going to make some plots, but haven't had time yet. I also haven't had time to look at the formula closely to see how it decides, but it's possible that first day is what's causing it to think there is a memleak -- would an acceptable solution be to ignore the first days memory measurement?

@rljacob
Copy link
Member

rljacob commented Jul 14, 2017

Yeah if its comparing first day with later days that needs to change. It should not even try to diagnose unless there is at least 2 days of integration.

jgfouca pushed a commit that referenced this issue Jul 14, 2017
New match attribute that can be 'last' or 'first' for values match in component.py

Currently there is confusion as to how matches are found for multiple
<value> elements in a <values> node.

    component.py is currently using a matching algorithm that picks the
    last match in case of multiple matches that are found. This
    matching algorithm is used anytime a Component object is
    instantiated (currently occurs in config_component.xml). By default
    if the match attribute DOES NOT appear, then the last match will
    be used, to make things backwards compatible.

    namelist_definition_<component>.xml uses the entry_id.py
    matching algorithm which picks the first match in case of
    multiple matches being found. So for setting namelists the first
    match is picked.

This PR adds a new, optional, attribute to the <entry> element in
EITHER a config_component.xml, config_compset.xml or namelist_definition_<component>.xml file.

<entry id="<name>">
   <values match="last"> will pick the last best match
   <values match="first"> will pick the first best match
      <value>...</value>
      <value>...</value>
   <values>
<entry_id>

As a result, there is new flexibility and transparency in how matches
are determined in component.py by adding a match attribute that can
be 'first' or 'last'. Having this be explicit will enable developers
to not trip up on assuming 'first' or 'last' match and be wrong.
This capability has been added to the _get_value_match routine in BOTH
entry_id.py AND component.py. However, the default values differ:

    the default "match" value entry_id.py is "first"
    the default "match" value in component.py is "last"
    Having these default values differ preserves backwards compatibility when the
    "match" attribute is not there. Moving forwards, it would be good to always
    have a "match" attribute.

The new match = "last"attribute has been added to all of the data
components component_component.xml and the config_component_cesm.xml
and config_component_acme.xml.

Test suite: scripts_regressions_tests and
also verified that running the prealpha and prebeta tests on
cheyenne, with just namelist comparisons, resulted in identical
namelists when compared to cesm2_0_alpha06m
Test baseline: cesm2_0_alpha06m for cesm
Test namelist changes: None
Test status: bit for bit

Fixes ESMCI/CIME issue 1617

User interface changes?: New match attribute elements that are children of <entry> nodes.

Code review: gold2718
@rljacob
Copy link
Member

rljacob commented Jul 18, 2017

This is a bug and fill be fixed by #1639

jgfouca added a commit that referenced this issue Jul 27, 2017
Swap mem highwater and usage logs for memleak tests

This is needed for correct parsing of mem highwater values from
cpl.log. Previously, memory resident set size was getting parsed while
checking for memleaks.

Fixes #1636

[BFB]

* origin/azamat/mem-usage/swap-highwater:
  Swap mem highwater and usage logs for memleak tests
@ndkeen
Copy link
Contributor Author

ndkeen commented Jul 28, 2017

I ran acme_dev on edison with next and all of the MEMLEAKs are now PASSes

rljacob added a commit that referenced this issue Aug 3, 2017
Correct parsing of mem highwater values from cpl.log. Previously, memory resident set size was getting parsed while checking for memleaks.

[BFB]
Fixes #1636
jgfouca pushed a commit to ESMCI/cime that referenced this issue Oct 17, 2017
Correct parsing of mem highwater values from cpl.log. Previously, memory resident set size was getting parsed while checking for memleaks.

[BFB]
Fixes E3SM-Project/E3SM#1636
jgfouca pushed a commit to ESMCI/cime that referenced this issue Feb 23, 2018
Correct parsing of mem highwater values from cpl.log. Previously, memory resident set size was getting parsed while checking for memleaks.

[BFB]
Fixes E3SM-Project/E3SM#1636
jgfouca pushed a commit to ESMCI/cime that referenced this issue Mar 13, 2018
Correct parsing of mem highwater values from cpl.log. Previously, memory resident set size was getting parsed while checking for memleaks.

[BFB]
Fixes E3SM-Project/E3SM#1636
rljacob added a commit that referenced this issue Apr 12, 2021
Correct parsing of mem highwater values from cpl.log. Previously, memory resident set size was getting parsed while checking for memleaks.

[BFB]
Fixes #1636
rljacob added a commit that referenced this issue Apr 12, 2021
Correct parsing of mem highwater values from cpl.log. Previously, memory resident set size was getting parsed while checking for memleaks.

[BFB]
Fixes #1636
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants