From adccde1066104f78a68d46ca988fdf264b3ed306 Mon Sep 17 00:00:00 2001 From: "Kelly (KT) Thompson" Date: Tue, 29 Jun 2021 18:20:16 -0600 Subject: [PATCH] Fix formatting. --- src/parser/README.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/src/parser/README.md b/src/parser/README.md index c491fd30e..6b386f747 100644 --- a/src/parser/README.md +++ b/src/parser/README.md @@ -1,10 +1,13 @@ # "Why we did it that way": Parsers + _Kent G. Budge_ ## Introduction + In this, the first installment of "Why we did it that way", I will discuss why the parser framework for our experimental drivers was written the way it was, what changes were made based on experience, and how we might change it in the future. ## Motivation + I desired to allow our test drivers to read input decks that were as human-friendly as possible. To me, this meant that they should be insensitive to white space, should use a reasonably intuitive grammar, and should not throw surprises at the user. I also wanted the parser system to be reasonably easy to maintain and to match to the data structures used to contain a problem specification. I was influenced in this by my experience with various programming languages. I did not want the kind of sensitivity to white space evident in older versions of Fortran or in input files read by older programs. I wanted something more C-like in its treatment of white space. I was also influenced by my readings on compiler theory, which is well developed for scanners and parsers. @@ -14,6 +17,7 @@ I also wished to avoid dependency on outside tools or libraries. This led me to I drew on my previous experience with developing parsers for input files for the Alegra code at Sandia. As a result, in its original set of features, the Capsaicin parser library was quite similar to the Alegra parser library. ## The original design + Compiler developers make a distinction between a scanner, which reads the raw characters of a file and breaks them into tokens, and parsers, which take a stream of tokens and parses them into grammatical structures. I adopted this distinction, with `Token_Stream` and its children representing different flavors of scanners, and `Parse_Table` representing a particular kind of parser. ### The scanner @@ -87,6 +91,7 @@ The client can also report an error to the stream. It may seem odd for the sourc I later added capability to push a token back onto the stream, though I don't think it's ever been used in our drivers. Likewise, there is capability to look more than one token ahead, using (for example) `lookahead(3)` to see the fourth token presently in the stream. (`lookahead(0)` is identical to `lookahead()` and we're counting like mathematicians: 0, 1, 2, 3 ...) So far as I know, the lookahead more than one token has never been used in our drivers. (Lookahead at the next token gets used all the time.) ### Parsing + Programming language grammar is a weird and wonderful field in which I've occasionally dabbled. C has a slightly weird version of what is called an LALR(1) grammar, which is a large family of grammars for which one can conveniently generate a parser using the original `yacc`. It's slightly weird because of the so-called "scanner trick", which turns a grammar that is not strictly LALR(1) into something that yacc can parse as it it _was_ LALR(1). C++ infamously departed from LALR(1) with its very large number of ambiguous grammatical constructs, and so has what is called a GLR grammar. GLR is basically any grammar that is parsed from left to right. It turns out `bison` has an option permitting it to generate a parser for any GLR language, but you have to add all kinds of stuff to decide which of several possible ways of parsing a construct is the right one. #### Parse_Table @@ -97,12 +102,16 @@ I decided early on that we would organize at least the top level of our parsers it would immediately call `parse_sn` which would parse the rest of the Sn specification. A parser of this kind would simply do a lookup on the keyword to find the right parse function to call. I left unspecified, and it remains unspecified, just what each parse function called by a keyword would do. The parse function is passed the token stream, and it is allowed to pull off as many additional tokens as it wishes and to do whatever it wants with them. Thus a programmer can insert a very general parser underneath the opening keyword, hand-crafted however he likes -- though this is not usual practice and is not really recommended. + #### Expressions + At some point, I decided it would be useful in some contexts to specify something as an expression, including expressions in space or time (for space- or time-dependent sources, for example.) I chose a C/C++ expression grammar, for which techniques for efficient parsing are very well established. Thus source boundary, boundary flag = 1, rate = 1.0 + 3.0*t would specify a time-dependent source boundary for a test problem. The `t` in this context now has builtin semantics ("time"), but it is not reserved; you can still have `t` as a keyword in a parse table. The context determines now it is interpreted. + #### Utilities + I found it useful to define a number of classes like unsigned parse_unsigned_integer(Token_Stream &); @@ -142,6 +151,7 @@ and are both supported in an input file and mean roughly the same thing. The second is preferred if you really mean 1 eV since the value will be taken from the Draco table of physical constants. #### Class_Parse_Table + I came to Los Alamos in 2002 with a strong prejudice against templates. They were still relatively new and quite buggy in C++ at that time. Different compilers instantiated templates in different ways and keeping it all straight on multiple platforms was just a pain in the posterior. Within a few years, templates became much more reliable and much better standardized. The standard template library, which was still very new in 2002, did a lot to help drive this. We began looking for opportunities to improve the code base by using templates intelligently. @@ -153,6 +163,7 @@ The `parse_class` template has a default implementation that makes us Much of the complexity of parsing at present is that the more recent refinements to the parse library have never been retrofitted to older classes. This means that "best practice" may not be clear to new developers. However, it's also true that I've not been able to make the `Class_Parser` approach as lightweight as I would like. #### Abstract_Class_Parse_Table + This is probably the most difficult part of the library. It was written to address the question of how to parse variants of an abstract class. For example, transport @@ -163,16 +174,20 @@ alerts the top level parser in our drivers that we are about to specify a transp at the top level because this meant that, if we added a new transport model, we'd have to go in and modify the top level parse table for every driver that might use that class. By nesting the concrete model within an abstract wrapper, we have a function for parsing a transport model that is the only thing that has to change when a new model is added. All drivers will automatically get the new model. I may have outsmarted myself on this one. I wanted it make it possible for a developer to add a local experimental model without having to change the shared code base to include his new model keyword. So I added a registration function to allow a client to add a new model. All the messiness of `Abstract_Class_Parse_Table` comes from this cool, but ultimately not really needed, requirement. (Yes, I recognize that 'not really needed requirement' is an oxymoron.) + #### Unit systems + Capsaicin originally converted all input to SI units and used these consistently in its calculations. This proved difficult when debugging problems that were more naturally expressed in other unit systems (such as cgs). We added the ability for Capsaicin to pick an arbitrary unit system to use consistently, with SI as the default. This was specified in input files with a leading unit system, cgs-K This had to appear before any other input. Otherwise we'd have to retroactively convert input already parsed, which was a horrible thought. We also added the option, strictly for machine-generated decks, to allow quantities normally requiring units to use implied units as specified by the leading `unit system` keyword. Users really should not do this! ## Move to Draco + The scanner and parser classes we originally wrote for Capsaicin are now in Draco. This is because we wrote other classes, such as quadrature classes and material models, that were also excellent candidates for shared projects but which came with their own parsers based on the Capsaicin library. ## Our parsers and host codes + Our parser library was written for use within our own project, in our test drivers. The test drivers are essential for performing the large number of regression tests we run nightly and weekly and before merging pull requests, and are also essential for our ongoing research work in transport methods. This makes the parser library an essential part of our code base. However, we use the parsers even in our flat interfaces, which may not seem like an obvious thing to do. Rather than simply construct the objects we need directly with a constructor, we tend to use the parameters passed in by the host code to build an input deck internally as a `String_Token_Stream`, then parse this to actually create the objects. @@ -182,6 +197,7 @@ There are two reasons for this. The first is that, if there are semantic errors Is this really a good idea? Randy Baker thought so, and I think the case can certainly be made. But it adds complexity, and it means we have to pull more object files into a linked executable. ## The future + > _Greetings, my friend. We are all interested in the future, for that is where you and I are going to spend the rest of our lives. And remember my friend, future events such as these will affect you in the future._ > -- Plan 9 from Outer Space >