Reducing permissiveness of parser #160

aidanheerdegen · 2024-01-29T03:08:34Z

I am using f90nml to pull namelists out of a model log file.

A description and context are available here:

TL;DR the (MOM5) stdout contains the following text:

 &OCEAN_SHORTWAVE_GFDL_NML
 USE_THIS_MODULE = T,
 READ_CHL        = T,
 CHL_DEFAULT     =  8.000000000000000E-002,
 ZMAX_PEN        =   1000000.00000000     ,
 SW_FRAC_TOP     =  0.000000000000000E+000,
 DEBUG_THIS_MODULE       = F,
 ENFORCE_SW_FRAC = T,
 OVERRIDE_F_VIS  = T,
 SW_MOREL_FIXED_DEPTHS   = F,
 OPTICS_FOR_UNIFORM_CHL  = F,
 OPTICS_MOREL_ANTOINE    = F,
 OPTICS_MANIZZA  = T
 /
NOTE from PE     0: ==>Note: USING shortwave_gfdl_mod.
=>Note: Using shortwave penetration with GFDL formulaton & Manizza etal optics.
NOTE from PE     0: ==>Note: Reading in chlorophyll-a from data file for shortwave penetration.
=>Note: computing solar shortwave penetration. Assume stf has sw-radiation field
  included.  Hence, solar shortwave penetration effects placed in sw_source will
  subtract out the effects of shortwave at k=1 to avoid double-counting.
 ==>Note: Setting optical model coefficients assuming nonuniform chl distribution.



 &OCEAN_SPONGES_TRACER_NML
 USE_THIS_MODULE = F,
 DAMP_COEFF_3D   = F
 /

The parser interprets the & in the purely descriptive middle paragraph as the start of a namelist and parses the rest of the text like so:

Details

manizza:
  ':': null
  '=':
  - '>'
  - Note
  - ':'
  - Setting
  - optical
  - model
  - coefficients
  - assuming
  - nonuniform
  - chl
  - distribution.
  penetration.:
  - '>'
  - Note
  - ':'
  - computing
  - solar
  - shortwave
  - penetration.
  - Assume
  - stf
  - has
  - sw-radiation
  - field
  - included.
  - Hence
  - solar
  - shortwave
  - penetration
  - effects
  - placed
  - in
  - sw_source
  - will
  - subtract
  - out
  - the
  - effects
  - of
  - shortwave
  - at
  k:
  - 1
  - to
  - avoid
  double-counting.: null

I know this is a tough ask, and not exactly a well supported use-case, but I'd really like it if I could ask the parser to be more strict. The paragraph it had interpreted as a namelist group doesn't have a closing \ for example.

Would it be possible, or desirable, to have a --strict mode, or similar, that required what I would describe as "well-formed" namelists, e.g. start of a group is an & which is the first character on a line, preceded only by whitespace? Similar for the end of a namelist group and \.

The text was updated successfully, but these errors were encountered:

marshallward · 2024-01-29T03:39:16Z

I think you might be in luck. The standard might actually agree with your interpretation to some extent:

Input for a namelist input statement consists of
(1) optional blanks and namelist comments,
(2) the character & followed immediately by the namelist-group-name as specified in the NAMELIST statement,
(3) one or more blanks,
(4) a sequence of zero or more name-value subsequences separated by value separators, and
(5) a slash to terminate the namelist input.

In other words, & followed by a blank is no namelist group, and f90nml is in error here.

I even tested this out in a test GFortran program and it had no problem skipping over the & Manizza etal content and reading &ocean_sponges_tracer_nml ... /. So again, f90nml looks like the one in error.

I have to say... this really looks like a preprocessing job on your end :P but an error's an error. I will have a crack at it.

marshallward · 2024-01-29T03:42:53Z

... and we'll just say "No comment" regarding the first requirement. In my experience, compilers have always seemed to be very generous about their handling of the space between namelist groups.

aidanheerdegen · 2024-01-29T06:07:54Z

this really looks like a preprocessing job on your end

TBH I am using f90nml as the pre-processor, for which it does an admirable job. Thanks!

It would be a pain to effectively re-invent what f90nml is doing to figure out what isn't a legit namelist. Scrabbling around in the entrails of STDOUT trying to recreate semantic structure feels very 1980ish, and yet here we (I) still are (am).

marshallward · 2024-01-29T14:45:48Z

It would be a pain to effectively re-invent what f90nml is doing to figure out what isn't a legit namelist.

I think this might be the problem: F90nml does a very poor job of detecting what is and isn't a valid namelist. There is a lot of hidden assumptions that the input is a namelist. So it may not actually be well suited to handle namelist groups embedded in other text. (There are similar open issues where people have tried to use F90nml to parse namelist-like files, and it rarely works correctly.)

On top of that, &Manizza without the blank would be a valid namelist group, and your typical Fortran program would crash if it were to encounter &Manizza. In this case, the correct response would be to raise an error. Or (in our case, unfortunately) feed back a bunch of garbage.

But... you have fallen into an interesting corner case with that extra whitespace, so a solution is likely.

As for that solution, I have taken a look. There is a small issue because the token iterator automatically skips over whitespace, which is why the Manizza group is being created. If I can make this optional, then I believe this can be fixed. & may be the only token in the entire namelist grammar which forbids whitespace as the next token.

aidanheerdegen · 2024-01-30T02:55:38Z

As long as I'm special that's all I care about.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing permissiveness of parser #160

Reducing permissiveness of parser #160

aidanheerdegen commented Jan 29, 2024

marshallward commented Jan 29, 2024

marshallward commented Jan 29, 2024

aidanheerdegen commented Jan 29, 2024

marshallward commented Jan 29, 2024

aidanheerdegen commented Jan 30, 2024

Reducing permissiveness of parser #160

Reducing permissiveness of parser #160

Comments

aidanheerdegen commented Jan 29, 2024

marshallward commented Jan 29, 2024

marshallward commented Jan 29, 2024

aidanheerdegen commented Jan 29, 2024

marshallward commented Jan 29, 2024

aidanheerdegen commented Jan 30, 2024