Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing permissiveness of parser #160

Open
aidanheerdegen opened this issue Jan 29, 2024 · 5 comments
Open

Reducing permissiveness of parser #160

aidanheerdegen opened this issue Jan 29, 2024 · 5 comments

Comments

@aidanheerdegen
Copy link
Contributor

I am using f90nml to pull namelists out of a model log file.

A description and context are available here:

aekiss/run_summary#32

TL;DR the (MOM5) stdout contains the following text:

 &OCEAN_SHORTWAVE_GFDL_NML
 USE_THIS_MODULE = T,
 READ_CHL        = T,
 CHL_DEFAULT     =  8.000000000000000E-002,
 ZMAX_PEN        =   1000000.00000000     ,
 SW_FRAC_TOP     =  0.000000000000000E+000,
 DEBUG_THIS_MODULE       = F,
 ENFORCE_SW_FRAC = T,
 OVERRIDE_F_VIS  = T,
 SW_MOREL_FIXED_DEPTHS   = F,
 OPTICS_FOR_UNIFORM_CHL  = F,
 OPTICS_MOREL_ANTOINE    = F,
 OPTICS_MANIZZA  = T
 /
NOTE from PE     0: ==>Note: USING shortwave_gfdl_mod.
=>Note: Using shortwave penetration with GFDL formulaton & Manizza etal optics.
NOTE from PE     0: ==>Note: Reading in chlorophyll-a from data file for shortwave penetration.
=>Note: computing solar shortwave penetration. Assume stf has sw-radiation field
  included.  Hence, solar shortwave penetration effects placed in sw_source will
  subtract out the effects of shortwave at k=1 to avoid double-counting.
 ==>Note: Setting optical model coefficients assuming nonuniform chl distribution.



 &OCEAN_SPONGES_TRACER_NML
 USE_THIS_MODULE = F,
 DAMP_COEFF_3D   = F
 /

The parser interprets the & in the purely descriptive middle paragraph as the start of a namelist and parses the rest of the text like so:

Details

manizza:
  ':': null
  '=':
  - '>'
  - Note
  - ':'
  - Setting
  - optical
  - model
  - coefficients
  - assuming
  - nonuniform
  - chl
  - distribution.
  penetration.:
  - '>'
  - Note
  - ':'
  - computing
  - solar
  - shortwave
  - penetration.
  - Assume
  - stf
  - has
  - sw-radiation
  - field
  - included.
  - Hence
  - solar
  - shortwave
  - penetration
  - effects
  - placed
  - in
  - sw_source
  - will
  - subtract
  - out
  - the
  - effects
  - of
  - shortwave
  - at
  k:
  - 1
  - to
  - avoid
  double-counting.: null

I know this is a tough ask, and not exactly a well supported use-case, but I'd really like it if I could ask the parser to be more strict. The paragraph it had interpreted as a namelist group doesn't have a closing \ for example.

Would it be possible, or desirable, to have a --strict mode, or similar, that required what I would describe as "well-formed" namelists, e.g. start of a group is an & which is the first character on a line, preceded only by whitespace? Similar for the end of a namelist group and \.

@marshallward
Copy link
Owner

I think you might be in luck. The standard might actually agree with your interpretation to some extent:

Input for a namelist input statement consists of
(1) optional blanks and namelist comments,
(2) the character & followed immediately by the namelist-group-name as specified in the NAMELIST statement,
(3) one or more blanks,
(4) a sequence of zero or more name-value subsequences separated by value separators, and
(5) a slash to terminate the namelist input.

In other words, & followed by a blank is no namelist group, and f90nml is in error here.

I even tested this out in a test GFortran program and it had no problem skipping over the & Manizza etal content and reading &ocean_sponges_tracer_nml ... /. So again, f90nml looks like the one in error.

I have to say... this really looks like a preprocessing job on your end :P but an error's an error. I will have a crack at it.

@marshallward
Copy link
Owner

... and we'll just say "No comment" regarding the first requirement. In my experience, compilers have always seemed to be very generous about their handling of the space between namelist groups.

@aidanheerdegen
Copy link
Contributor Author

this really looks like a preprocessing job on your end

TBH I am using f90nml as the pre-processor, for which it does an admirable job. Thanks!

It would be a pain to effectively re-invent what f90nml is doing to figure out what isn't a legit namelist. Scrabbling around in the entrails of STDOUT trying to recreate semantic structure feels very 1980ish, and yet here we (I) still are (am).

@marshallward
Copy link
Owner

It would be a pain to effectively re-invent what f90nml is doing to figure out what isn't a legit namelist.

I think this might be the problem: F90nml does a very poor job of detecting what is and isn't a valid namelist. There is a lot of hidden assumptions that the input is a namelist. So it may not actually be well suited to handle namelist groups embedded in other text. (There are similar open issues where people have tried to use F90nml to parse namelist-like files, and it rarely works correctly.)

On top of that, &Manizza without the blank would be a valid namelist group, and your typical Fortran program would crash if it were to encounter &Manizza. In this case, the correct response would be to raise an error. Or (in our case, unfortunately) feed back a bunch of garbage.

But... you have fallen into an interesting corner case with that extra whitespace, so a solution is likely.


As for that solution, I have taken a look. There is a small issue because the token iterator automatically skips over whitespace, which is why the Manizza group is being created. If I can make this optional, then I believe this can be fixed. & may be the only token in the entire namelist grammar which forbids whitespace as the next token.

@aidanheerdegen
Copy link
Contributor Author

As long as I'm special that's all I care about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants