From 308e780954a20dff322804ac2aab2933c447bdd1 Mon Sep 17 00:00:00 2001 From: "Todd V. Jonker" Date: Tue, 9 Apr 2024 14:52:41 -0700 Subject: [PATCH 1/4] Specify syntax of Text E-expressions, and align with template invocations. Resolves #303 --- build-docker.sh | 5 +- src/IonSpec.adoc | 4 +- src/binary-encoding.adoc | 13 +++++ src/eexprs.adoc | 100 +++++++++++++++++++++++++++++++++++++ src/template-expr.adoc | 103 ++++++++++++++++++++++++--------------- 5 files changed, 183 insertions(+), 42 deletions(-) create mode 100644 src/eexprs.adoc diff --git a/build-docker.sh b/build-docker.sh index ac595cb8..6e99157a 100755 --- a/build-docker.sh +++ b/build-docker.sh @@ -51,10 +51,13 @@ function usage() echo echo " -h Show this help." echo " -u Update the Docker image." - echo " -b Execute the build logic inside a new container." + echo " -b Execute the build logic (docker-run.sh) inside a new container." echo " -s Start a shell inside a new container." echo echo "By default, when none of -ubs are given then -ub is assumed." + echo + echo "From a shell inside a container, you can run the build logic directly" + echo "via 'rake'. Run 'rake -T' to see the tasks our Rakefile defines." } while getopts ":ubsh" o; do diff --git a/src/IonSpec.adoc b/src/IonSpec.adoc index 54723099..31dda3f8 100644 --- a/src/IonSpec.adoc +++ b/src/IonSpec.adoc @@ -2,7 +2,7 @@ Ion Team :doctype: book :creator: {author} -:copyright: Copyright ©2023 Amazon.com Inc. or Affiliates (“Amazon”) +:copyright: Copyright ©2023-2024 Amazon.com Inc. or Affiliates (“Amazon”) :docinfo: :sectanchors: :sectnums: @@ -41,6 +41,8 @@ include::modules.adoc[] include::signatures.adoc[] +include::eexprs.adoc[] + include::system-module.adoc[] include::template-expr.adoc[] diff --git a/src/binary-encoding.adoc b/src/binary-encoding.adoc index 38be5214..041cbcff 100644 --- a/src/binary-encoding.adoc +++ b/src/binary-encoding.adoc @@ -359,8 +359,21 @@ The meanings of each opcode are organized loosely by their high and low nibbles. |=== +[[bin:eexp]] === Encoding Expressions +The encoding of E-expressions is designed to balance density and generality. +For example, they enable encodings with minimal tag bits, even none at all given +a thoughtful signature. This increases density, but limits generality at the point +of macro invocation. + +The <> and binary forms of E-expressions enforce the same +syntactic constraints on the type and range of data allowed as arguments. +Any syntactically well-formed E-expression can be transcoded between text and binary, +without expansion and without changing semantics, and independent of whether it can +be expanded successfully. + + [[e_expression_with_the_address_in_the_opcode]] ==== E-expression With the Address in the Opcode diff --git a/src/eexprs.adoc b/src/eexprs.adoc new file mode 100644 index 00000000..cd67c709 --- /dev/null +++ b/src/eexprs.adoc @@ -0,0 +1,100 @@ +[[sec:eexprs]] +== Encoding Expressions + +Understanding macro signatures, we can now discuss how macros are leveraged to +encode data. The syntax for macro invocation is called an _encoding expression_ +or E-expression. When the Ion parser encounters an E-expression, it automatically +replaces it with the values produced by the corresponding macro transformation +function. The inputs to that transformation are determined by the arguments +within the E-expression. + +IMPORTANT: This chapter details the syntax of E-expressions in Ion text format; +the corresponding <> enforces the +equivalent constraints. + +In Ion Text, E-expressions look similar to S-expressions but are opened by `(:` +and a reference to the macro that expands the expression. +Text-encoded E-expressions have one of several forms that differ in how they +reference the macro to be invoked: + +* `(:__macro-name__ …)` lookup by unambiguous name in system or local macro tables. +* `(:__address__ …)` lookup by address in current macro table (not the system table). +* `(:__module-name__:__macro-name__ …)` lookup by name in an installed module. +* `(:__module-name__:__address__ …)` lookup by address in an installed module. + +// TODO link or write more precise resolution rules. + +NOTE: The parenthesis, colon, and macro reference are a single syntactic token, +allowing no white space. +The names are neither string nor symbol tokens, and thus may not use quotes or +escapes or `$_uint_` symbol-table addresses. +This reflects the idea that macros behave like new syntactic forms, with this +entire character sequence determining the syntax that follows inside the +expression. + +Following the opening macro reference and whitespace, E-expressions follow +S-expression tokenization and whitespace rules. +The remaining elements are arguments supplying inputs to the macro, +each one either _individual_ or _grouped_. +The syntax that can appear in each argument position is constrained by the +macro’s signature. + +The number of argument elements (that is, the invocation’s actual arity) +must be equal to or greater than the macro’s minimum arity, and at most its +maximum arity, when one exists. +In other words, an E-expression must contain one element for each required +parameter, followed by optional elements for the remaining optional parameters. + +// TODO links for arity, optional parameters +// TODO base type? base shape? base form? encoding? + +The syntax of each argument is defined by the associated parameter’s cardinality +and type. +The cardinality determines whether an argument group can be used to collect a +series of individual arguments, and the type determines the syntax of an +individual argument, grouped or not. + +The parameter type constrains the syntax of an individual argument as follows: + +* For tagged types, an individual argument must be a single E-expression or any + datum (which may contain nested E-expressions). +* For primitive types, an individual argument must be a non-null, non-annotated + datum of the corresponding concrete type, within the range accepted by the + primitive type. +* For macro types, an individual argument must be an unannotated S-expression + containing arguments acceptable to that macro’s signature. + Here, the S-expression is implicitly converted to the equivalent E-expression. + +The parameter cardinality constrains the syntax of the overall argument +element(s), particularly whether a group is allowed. +In text E-expressions, argument groups are delimited using the special syntax +`(: …)` where the colon is followed by whitespace or `)` instead of a macro +reference. +Each element of an argument group must fit the same syntax rules as for an +individual argument. + +* A `!` parameter accepts only a single individual argument. +* A `?` parameter accepts either a single individual argument + or an empty argument group `(:)`. +* A `*` or `+` parameter accepts either a single individual argument, + or an argument group `(: …)`. +* A rest parameter captures all remaining arguments of the E-expression, + each of which must match the individual argument syntax. + For a `...+` parameter there must be at least one such argument. + +// TODO clarify whether a `+` group must contain at least one element. + +The rules above determine whether an E-expression is _well-formed_. +Any violation of the above constraints must signal a syntax error when the +E-expression is parsed. + + +// TODO #307 clarify how type and cardinality is enforced during expansion. + +// TODO #307 Clarify whether and when range checks are applied for fixed-width types. +// I believe we decided that they are not verified by template invocations, since they +// are intended to constrain the _encoding_, not the resulting argument values. +// The corresponding concrete type _is_ verified, however, per the above. + + +// TODO expansion process diff --git a/src/template-expr.adoc b/src/template-expr.adoc index ae2af57d..451f6c50 100644 --- a/src/template-expr.adoc +++ b/src/template-expr.adoc @@ -85,6 +85,7 @@ meaning of subsequent elements depends on the operator. Operators come in two varieties: special forms and macro invocations. +[[spec:tl_special]] === Special Forms Special forms are operators that cannot be expressed as macros, because some parts of their @@ -199,52 +200,61 @@ https://github.com/amazon-ion/ion-docs/issues/201 === Macro Invocation A macro definition can express its output in terms of other macros. Quite often, these will be -macros provided by the Ion implementation, -// —everything in System Macros 2023-05 is available by default— -but they can also be acquired from other modules. - -// TODO link to system-macro chapter - -The S-expression syntax for macro invocation is similar to that of E-expressions. -When a template is an S-expression and the first element is not the name of a special form, that -element must instead be a _macro-ref_ and the template denotes a macro invocation. -There are multiple sources of macros: the defining module’s internal environment (which is being -incrementally extended with each definition), and the exported macros of modules loaded by the -enclosing module or `$ion_encoding` directive. - -// TODO See Resolving Macro References: Encoding Modules 2023-05 for the relevant algorithm. - -The remaining elements of the S-expression are argument subforms that denote the inputs to the macro. -These use normal Ion notation, but what’s syntactically acceptable is defined by the macro’s -signature. - -The number of such subforms (that is, the invocation’s actual arity) must be equal to or greater -than the macro’s minimum arity, and at most its maximum arity, when one exists. -In other words, an invocation must contain one subform for each required parameter, followed by -optional subforms for the remaining optional parameters. +<>, but there are multiple sources of macros: + +* the defining module’s internal environment (which is being incrementally extended with each definition) +* the macros exported from modules ``load``ed by the enclosing module +* the macros exported from modules ``load``ed by the enclosing `$ion_encoding` directive + +The syntax for macro invocation in a template is similar to that of <>. +When a template is an S-expression and the first element is not the name of a +<>, that element must instead be a <> +and the template denotes a macro invocation. +See <> for the syntax of macro references and the resolution algorithm. + +The remaining elements of the S-expression are arguments supplying inputs to the macro, +each one either _individual_ or _grouped_. +What is syntactically well-formed is defined by the macro’s signature. + +The number of argument elements (that is, the invocation’s actual arity) +must be equal to or greater than the macro’s minimum arity, and at most its +maximum arity, when one exists. +In other words, an invocation must contain one element for each required +parameter, followed by optional elements for the remaining optional parameters. + +The syntax of each argument is defined by the associated parameter’s cardinality +and type. +The cardinality determines whether an argument group can be used to collect a +series of individual arguments, and the type determines the syntax of an +individual argument, grouped or not. + +The parameter type constrains the syntax of an individual argument as follows: + +* For tagged types, an individual argument may be any template. +* For primitive types, an individual argument may be any template. +* For macro types, an individual argument must be an unannotated S-expression +containing arguments acceptable to that macro’s signature. +These are implicit invocations of the parameter's declared macro, and the macro name cannot be +provided explicitly. -In the template language, argument groups are written by unannoted S-expressions starting with the +The parameter cardinality constrains the syntax of the overall argument +element(s). +In the template language, argument groups are written by unannotated S-expressions starting with the symbol `;`. +Each element of an argument group must fit the same syntax rules as for an +individual argument. The resulting notation `(; ...)` mirrors the syntax of groups in E-expressions, `(: ...)`. -Within an invocation expression, the syntax of each argument is defined by its parameter’s -declared shape. +* A non-rest parameter of any cardinality accepts either a single individual argument + or an argument group. +* A rest parameter captures all remaining arguments of the invocation, + each of which must match the individual argument syntax. + For a `...+` parameter there must be at least one such argument. -* The subform for a non-rest parameter must match the base type below, or an argument group -containing elements matching the base type. -* A rest parameter captures all remaining subforms of the invocation, each of which must match -the base type. +These rules determine whether a template macro invocation is _well-formed_. +Any violation of the above constraints must signal a syntax error when the +macro definition is compiled. -The base types match as follows: - -* For tagged types, the subform may be any template that produces acceptable values. -* For primitive types, the subform may be any template that produces values accepted by the -corresponding concrete type. -* For macro types, the subform must be an unannotated S-expression containing subforms acceptable to -that macro’s signature. These are implicit invocations of the macro, and the macro name cannot be -provided explicitly. - -TODO Clarify when/where range checks are applied for fixed-width types. TODO Examples @@ -252,6 +262,19 @@ TODO Examples TODO +// TODO #307 specify when the above constraints are enforced; particularly whether +// an unused but not well-formed macro signals an error. + +// TODO #307 clarify when type and cardinality constraints are enforced. +// Type and cardinality constraints are _also_ applied each time the macro is invoked, +// ensuring that the type and number of values provided to a parameter _after arguments are expansnded_ +// The base types match as follows: + +// TODO #307 Clarify whether and when range checks are applied for primitive types. +// I believe we decided that they are not verified by template invocations, since they +// are intended to constrain the _encoding_, not the resulting argument values. +// The corresponding concrete type _is_ verified, however, per the above. + === Error Handling TODO From 2ad7461f43b0c6a683ccb916dffe3b8d77b1b0ed Mon Sep 17 00:00:00 2001 From: "Todd V. Jonker" Date: Tue, 16 Apr 2024 12:25:14 -0700 Subject: [PATCH 2/4] Update src/eexprs.adoc Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com> --- src/eexprs.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/eexprs.adoc b/src/eexprs.adoc index cd67c709..2504f982 100644 --- a/src/eexprs.adoc +++ b/src/eexprs.adoc @@ -63,7 +63,8 @@ The parameter type constrains the syntax of an individual argument as follows: primitive type. * For macro types, an individual argument must be an unannotated S-expression containing arguments acceptable to that macro’s signature. - Here, the S-expression is implicitly converted to the equivalent E-expression. + This is called a _macro-shaped argument_. + A macro-shaped argument is implicitly converted to the equivalent E-expression. The parameter cardinality constrains the syntax of the overall argument element(s), particularly whether a group is allowed. From c0790d59e2e20023c0386266d75f3cd099631155 Mon Sep 17 00:00:00 2001 From: "Todd V. Jonker" Date: Tue, 16 Apr 2024 13:07:32 -0700 Subject: [PATCH 3/4] Updates based on PR feedback. --- src/eexprs.adoc | 10 +++++----- src/glossary.adoc | 3 +++ src/signatures.adoc | 12 ++++++++---- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/src/eexprs.adoc b/src/eexprs.adoc index 2504f982..5a1d39bd 100644 --- a/src/eexprs.adoc +++ b/src/eexprs.adoc @@ -40,12 +40,12 @@ The syntax that can appear in each argument position is constrained by the macro’s signature. The number of argument elements (that is, the invocation’s actual arity) -must be equal to or greater than the macro’s minimum arity, and at most its -maximum arity, when one exists. -In other words, an E-expression must contain one element for each required -parameter, followed by optional elements for the remaining optional parameters. +must be equal to or greater than the macro’s minimum <>, +and at most its maximum arity, when one exists. +In other words, an E-expression must contain one element for each +<>, followed by optional elements for the +remaining <>. -// TODO links for arity, optional parameters // TODO base type? base shape? base form? encoding? The syntax of each argument is defined by the associated parameter’s cardinality diff --git a/src/glossary.adoc b/src/glossary.adoc index 32e7cf0d..dcf679dc 100644 --- a/src/glossary.adoc +++ b/src/glossary.adoc @@ -111,6 +111,9 @@ Struct-shaped templates treat the field names as literal, but the corresponding templates. S-expressions denote operator invocations and are not treated quasi-literally. +required parameter:: +A macro parameter that is not _optional_ and therefore requires an argument at each invocation. + rest parameter:: A macro parameter—always the final parameter—declared with the `*\...*` or `*\...+*` modifier, that accepts all remaining arguments to the macro as if they were in an implicit _argument group_. diff --git a/src/signatures.adoc b/src/signatures.adoc index e5240560..553fbbae 100644 --- a/src/signatures.adoc +++ b/src/signatures.adoc @@ -134,18 +134,22 @@ Examples: ---- -=== Voidable and Optional Parameters +=== Arity: Required and Optional Parameters +[[def:optional-param]] Parameters with cardinality accepting zero values (declared with modifiers `?`, `*`, or `\...`) are called _voidable_ because their resulting value streams can be void. A parameter is _optional_ when it is voidable and all following parameters are voidable. +[[def:required-param]] +A parameter is _required_ when it is not optional. +Specifically, a parameter is required when it is declared with modifiers +`!`, `\+`, or `...+`, _or_ when any following parameter is required. + Optional parameters are given special treatment in text invocations: their arguments can be omitted entirely, as long as all following arguments are also omitted. - -=== Arity - +[[def:arity]] The _minimum arity_ of a macro is equal to the number of leading non-optional parameters. Assuming no rest-parameter, the _maximum arity_ of the macro is the total number of declared parameters. From 8a48eb25cce91f3d9b9aa3f70f8356b237696405 Mon Sep 17 00:00:00 2001 From: "Todd V. Jonker" Date: Wed, 17 Apr 2024 11:40:35 -0700 Subject: [PATCH 4/4] Update src/template-expr.adoc Co-authored-by: Zack Slayton --- src/template-expr.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/template-expr.adoc b/src/template-expr.adoc index 451f6c50..e753d9bd 100644 --- a/src/template-expr.adoc +++ b/src/template-expr.adoc @@ -267,7 +267,7 @@ TODO // TODO #307 clarify when type and cardinality constraints are enforced. // Type and cardinality constraints are _also_ applied each time the macro is invoked, -// ensuring that the type and number of values provided to a parameter _after arguments are expansnded_ +// ensuring that the type and number of values provided to a parameter _after arguments are expanded_ // The base types match as follows: // TODO #307 Clarify whether and when range checks are applied for primitive types.