Include as an MLang expression #703

johnwikman · 2023-03-13T14:31:29Z

This is the bit of code that was lifted out from PR #694 (MLang AST in MCore). The idea of having an include statement ties in with how include handling will be done in the bootstrapping stage.

Instead of include being a copy-paste of code (a la C style), the included file would be parsed and symbolized separately, and the symbolized identifiers would be added to the scope of where the include is being done. This would slightly impact the include semantics, such that this following program with 2 includes that is currently valid would now be invalid:

-- testA.mc
let strA = "I am string A..."
mexpr ()

-- testB.mc
let strB = concat strA " and I am string B"
mexpr ()

-- test.mc
include "testA.mc"
include "testB.mc"
mexpr
print strB; print "\n"

This currently works with the boot parser since it simply concatenates the includes, whereas in the bootstrapped parser the includes would be parsed independently, and give the error for testB.mc that strA is an unknown variable.

The reason for having an include as an expression would be to control in generated code that the unsymbolized identifiers will refer to the intended functions/types/constructors in some library. E.g. if I want to access the result identifier from result.mc, I could simply do include "result.mc" in at the start of the generated code and not have to worry about where my generated expression is being placed.

Current use case for this kind of feature would be in the code generated for the LR(k) parser (see lrGenerateParser in parser/lrk.mc), where there are unsymbolized to ResultErr, ResultOk, int2string, mergeInfo, join, etc. all over the place. There is currently nothing guaranteeing that these will be symbolized to the intended definitions, instead the generated expressions just assumes that nothing else will bind to these identifiers.

elegios · 2023-03-13T15:29:57Z

I think there are a couple of requirements and capabilities we should probably separate here:

Namespace-handling, what unsymbolized Names should resolve to "by default". This would probably be pretty similar to OCaml's local opens, given the design of use right now.
Ensuring that the definitions from another file are loaded/parsed and available.

Both parsed code and generated code (things like parser generators or the utest mechanism, as opposed to code generated in other languages for our backends, e.g., ocaml) need both of these, but it's not certain that they should work by the same mechanism.

For example, should we try to generate code that is already symbolized and/or typechecked? This is potentially error-prone if done manually, but on the other hand we often have a significant amount of information when we write the generating code, so it seems like a waste performance-wise to have to run those passes after we generate such code. We could also mitigate the risk by having optional passes that test correctness of already symbolized/typechecked code (that things are properly in scope and/or types agree).

To make the implementation of such things easier we can probably use some things we'll have to make anyway as a part of bootstrapping. I can imagine that we need to pass around some datastructure that keeps track of files parsed and their definitions. This could be used when we generate code to look up the symbols directly, i.e., generated code needs neither include nor use.

Reintroduce the include Expr

730a33e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include as an MLang expression #703

Include as an MLang expression #703

johnwikman commented Mar 13, 2023

elegios commented Mar 13, 2023

Include as an MLang expression #703

Are you sure you want to change the base?

Include as an MLang expression #703

Conversation

johnwikman commented Mar 13, 2023

elegios commented Mar 13, 2023