Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify syntax of Text E-expressions, and align with template invocations. #308

Merged
merged 4 commits into from
Apr 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion build-docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,13 @@ function usage()
echo
echo " -h Show this help."
echo " -u Update the Docker image."
echo " -b Execute the build logic inside a new container."
echo " -b Execute the build logic (docker-run.sh) inside a new container."
echo " -s Start a shell inside a new container."
echo
echo "By default, when none of -ubs are given then -ub is assumed."
echo
echo "From a shell inside a container, you can run the build logic directly"
echo "via 'rake'. Run 'rake -T' to see the tasks our Rakefile defines."
}

while getopts ":ubsh" o; do
Expand Down
4 changes: 3 additions & 1 deletion src/IonSpec.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Ion Team
:doctype: book
:creator: {author}
:copyright: Copyright ©2023 Amazon.com Inc. or Affiliates (“Amazon”)
:copyright: Copyright ©2023-2024 Amazon.com Inc. or Affiliates (“Amazon”)
:docinfo:
:sectanchors:
:sectnums:
Expand Down Expand Up @@ -41,6 +41,8 @@ include::modules.adoc[]

include::signatures.adoc[]

include::eexprs.adoc[]

include::system-module.adoc[]

include::template-expr.adoc[]
Expand Down
13 changes: 13 additions & 0 deletions src/binary-encoding.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -359,8 +359,21 @@ The meanings of each opcode are organized loosely by their high and low nibbles.
|===


[[bin:eexp]]
=== Encoding Expressions

The encoding of E-expressions is designed to balance density and generality.
For example, they enable encodings with minimal tag bits, even none at all given
a thoughtful signature. This increases density, but limits generality at the point
of macro invocation.

The <<sec:eexprs,text>> and binary forms of E-expressions enforce the same
syntactic constraints on the type and range of data allowed as arguments.
Any syntactically well-formed E-expression can be transcoded between text and binary,
without expansion and without changing semantics, and independent of whether it can
be expanded successfully.
toddjonker marked this conversation as resolved.
Show resolved Hide resolved


[[e_expression_with_the_address_in_the_opcode]]
==== E-expression With the Address in the Opcode

Expand Down
101 changes: 101 additions & 0 deletions src/eexprs.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
[[sec:eexprs]]
== Encoding Expressions

Understanding macro signatures, we can now discuss how macros are leveraged to
encode data. The syntax for macro invocation is called an _encoding expression_
or E-expression. When the Ion parser encounters an E-expression, it automatically
replaces it with the values produced by the corresponding macro transformation
function. The inputs to that transformation are determined by the arguments
within the E-expression.

IMPORTANT: This chapter details the syntax of E-expressions in Ion text format;
the corresponding <<bin:eexp,Ion binary encoding>> enforces the
equivalent constraints.

In Ion Text, E-expressions look similar to S-expressions but are opened by `(:`
and a reference to the macro that expands the expression.
Text-encoded E-expressions have one of several forms that differ in how they
reference the macro to be invoked:

* `(:__macro-name__ …)` lookup by unambiguous name in system or local macro tables.
* `(:__address__ …)` lookup by address in current macro table (not the system table).
* `(:__module-name__:__macro-name__ …)` lookup by name in an installed module.
* `(:__module-name__:__address__ …)` lookup by address in an installed module.

// TODO link or write more precise resolution rules.

NOTE: The parenthesis, colon, and macro reference are a single syntactic token,
allowing no white space.
The names are neither string nor symbol tokens, and thus may not use quotes or
escapes or `$_uint_` symbol-table addresses.
This reflects the idea that macros behave like new syntactic forms, with this
entire character sequence determining the syntax that follows inside the
expression.

Following the opening macro reference and whitespace, E-expressions follow
S-expression tokenization and whitespace rules.
The remaining elements are arguments supplying inputs to the macro,
each one either _individual_ or _grouped_.
The syntax that can appear in each argument position is constrained by the
macro’s signature.

The number of argument elements (that is, the invocation’s actual arity)
must be equal to or greater than the macro’s minimum <<def:arity,arity>>,
and at most its maximum arity, when one exists.
In other words, an E-expression must contain one element for each
<<def:required-param,required parameter>>, followed by optional elements for the
remaining <<def:optional-param,optional parameters>>.

// TODO base type? base shape? base form? encoding?

The syntax of each argument is defined by the associated parameter’s cardinality
and type.
The cardinality determines whether an argument group can be used to collect a
series of individual arguments, and the type determines the syntax of an
individual argument, grouped or not.

The parameter type constrains the syntax of an individual argument as follows:

* For tagged types, an individual argument must be a single E-expression or any
datum (which may contain nested E-expressions).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we defining "datum" somewhere? Is it distinct from "value" in some way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've not defined 'datum', and it might not be the best term. (Other than this line it's only used in the grammar: (**literal** _datum_).

I'm using it here to mean "syntax tree" as the semantic model defines it, but that term hasn't been incorporated into this document.

"Value" is not correct; since these are syntax trees and we use "value" to denote the results of expanding syntax trees.

Another way to write this might be:

For tagged types, an individual argument must be a single syntax tree, which may be an E-expression and/or contain nested E-expressions.

But ATM that doesn't seem better since we don't currently talk in detail about the expansion process. (Another gap in the document!)

* For primitive types, an individual argument must be a non-null, non-annotated
datum of the corresponding concrete type, within the range accepted by the
primitive type.
* For macro types, an individual argument must be an unannotated S-expression
containing arguments acceptable to that macro’s signature.
This is called a _macro-shaped argument_.
A macro-shaped argument is implicitly converted to the equivalent E-expression.

The parameter cardinality constrains the syntax of the overall argument
element(s), particularly whether a group is allowed.
In text E-expressions, argument groups are delimited using the special syntax
`(: …)` where the colon is followed by whitespace or `)` instead of a macro
reference.
Each element of an argument group must fit the same syntax rules as for an
individual argument.

* A `!` parameter accepts only a single individual argument.
* A `?` parameter accepts either a single individual argument
or an empty argument group `(:)`.
* A `*` or `+` parameter accepts either a single individual argument,
or an argument group `(: …)`.
* A rest parameter captures all remaining arguments of the E-expression,
each of which must match the individual argument syntax.
For a `...+` parameter there must be at least one such argument.
toddjonker marked this conversation as resolved.
Show resolved Hide resolved

// TODO clarify whether a `+` group must contain at least one element.

The rules above determine whether an E-expression is _well-formed_.
Any violation of the above constraints must signal a syntax error when the
E-expression is parsed.
toddjonker marked this conversation as resolved.
Show resolved Hide resolved


// TODO #307 clarify how type and cardinality is enforced during expansion.

// TODO #307 Clarify whether and when range checks are applied for fixed-width types.
// I believe we decided that they are not verified by template invocations, since they
// are intended to constrain the _encoding_, not the resulting argument values.
// The corresponding concrete type _is_ verified, however, per the above.


// TODO expansion process
3 changes: 3 additions & 0 deletions src/glossary.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ Struct-shaped templates treat the field names as literal, but the corresponding
templates.
S-expressions denote operator invocations and are not treated quasi-literally.

required parameter::
A macro parameter that is not _optional_ and therefore requires an argument at each invocation.

rest parameter::
A macro parameter—always the final parameter—declared with the `*\...*` or `*\...+*` modifier,
that accepts all remaining arguments to the macro as if they were in an implicit _argument group_.
Expand Down
12 changes: 8 additions & 4 deletions src/signatures.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -134,18 +134,22 @@ Examples:
----


=== Voidable and Optional Parameters
=== Arity: Required and Optional Parameters

[[def:optional-param]]
Parameters with cardinality accepting zero values (declared with modifiers `?`, `*`, or `\...`)
are called _voidable_ because their resulting value streams can be void.
A parameter is _optional_ when it is voidable and all following parameters are voidable.

[[def:required-param]]
A parameter is _required_ when it is not optional.
Specifically, a parameter is required when it is declared with modifiers
`!`, `\+`, or `...+`, _or_ when any following parameter is required.

Optional parameters are given special treatment in text invocations: their arguments can be
omitted entirely, as long as all following arguments are also omitted.


=== Arity

[[def:arity]]
The _minimum arity_ of a macro is equal to the number of leading non-optional parameters.
Assuming no rest-parameter, the _maximum arity_ of the macro is the total number of declared
parameters.
Expand Down
103 changes: 63 additions & 40 deletions src/template-expr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ meaning of subsequent elements depends on the operator.
Operators come in two varieties: special forms and macro invocations.


[[spec:tl_special]]
=== Special Forms

Special forms are operators that cannot be expressed as macros, because some parts of their
Expand Down Expand Up @@ -199,59 +200,81 @@ https://github.com/amazon-ion/ion-docs/issues/201
=== Macro Invocation

A macro definition can express its output in terms of other macros. Quite often, these will be
macros provided by the Ion implementation,
// —everything in System Macros 2023-05 is available by default—
but they can also be acquired from other modules.

// TODO link to system-macro chapter

The S-expression syntax for macro invocation is similar to that of E-expressions.
When a template is an S-expression and the first element is not the name of a special form, that
element must instead be a _macro-ref_ and the template denotes a macro invocation.
There are multiple sources of macros: the defining module’s internal environment (which is being
incrementally extended with each definition), and the exported macros of modules loaded by the
enclosing module or `$ion_encoding` directive.

// TODO See Resolving Macro References: Encoding Modules 2023-05 for the relevant algorithm.

The remaining elements of the S-expression are argument subforms that denote the inputs to the macro.
These use normal Ion notation, but what’s syntactically acceptable is defined by the macro’s
signature.

The number of such subforms (that is, the invocation’s actual arity) must be equal to or greater
than the macro’s minimum arity, and at most its maximum arity, when one exists.
In other words, an invocation must contain one subform for each required parameter, followed by
optional subforms for the remaining optional parameters.
<<sec:sysmod,macros provided by the Ion implementation>>, but there are multiple sources of macros:

* the defining module’s internal environment (which is being incrementally extended with each definition)
* the macros exported from modules ``load``ed by the enclosing module
* the macros exported from modules ``load``ed by the enclosing `$ion_encoding` directive

The syntax for macro invocation in a template is similar to that of <<sec:eexprs,E-expressions>>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot in this section that is repeated. Could we explain macro invocations in a way that DRYs this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but there are also differences that make it hard to slot a unified description into either chapter.
I'm inclined to leave that for later cleanup.

When a template is an S-expression and the first element is not the name of a
<<spec:tl_special,special form>>, that element must instead be a <<spec:resolve-macro,_macro-ref_>>
and the template denotes a macro invocation.
See <<spec:resolve-macro>> for the syntax of macro references and the resolution algorithm.

The remaining elements of the S-expression are arguments supplying inputs to the macro,
each one either _individual_ or _grouped_.
What is syntactically well-formed is defined by the macro’s signature.

The number of argument elements (that is, the invocation’s actual arity)
must be equal to or greater than the macro’s minimum arity, and at most its
maximum arity, when one exists.
In other words, an invocation must contain one element for each required
parameter, followed by optional elements for the remaining optional parameters.

The syntax of each argument is defined by the associated parameter’s cardinality
and type.
The cardinality determines whether an argument group can be used to collect a
series of individual arguments, and the type determines the syntax of an
individual argument, grouped or not.

The parameter type constrains the syntax of an individual argument as follows:

* For tagged types, an individual argument may be any template.
* For primitive types, an individual argument may be any template.
* For macro types, an individual argument must be an unannotated S-expression
containing arguments acceptable to that macro’s signature.
These are implicit invocations of the parameter's declared macro, and the macro name cannot be
provided explicitly.

In the template language, argument groups are written by unannoted S-expressions starting with the
The parameter cardinality constrains the syntax of the overall argument
element(s).
In the template language, argument groups are written by unannotated S-expressions starting with the
symbol `;`.
Each element of an argument group must fit the same syntax rules as for an
individual argument.
The resulting notation `(; ...)` mirrors the syntax of groups in E-expressions, `(: ...)`.

Within an invocation expression, the syntax of each argument is defined by its parameter’s
declared shape.
* A non-rest parameter of any cardinality accepts either a single individual argument
or an argument group.
* A rest parameter captures all remaining arguments of the invocation,
each of which must match the individual argument syntax.
For a `...+` parameter there must be at least one such argument.

* The subform for a non-rest parameter must match the base type below, or an argument group
containing elements matching the base type.
* A rest parameter captures all remaining subforms of the invocation, each of which must match
the base type.
These rules determine whether a template macro invocation is _well-formed_.
Any violation of the above constraints must signal a syntax error when the
macro definition is compiled.

The base types match as follows:

* For tagged types, the subform may be any template that produces acceptable values.
* For primitive types, the subform may be any template that produces values accepted by the
corresponding concrete type.
* For macro types, the subform must be an unannotated S-expression containing subforms acceptable to
that macro’s signature. These are implicit invocations of the macro, and the macro name cannot be
provided explicitly.

TODO Clarify when/where range checks are applied for fixed-width types.

TODO Examples

=== Type Checking

TODO

// TODO #307 specify when the above constraints are enforced; particularly whether
// an unused but not well-formed macro signals an error.

// TODO #307 clarify when type and cardinality constraints are enforced.
// Type and cardinality constraints are _also_ applied each time the macro is invoked,
// ensuring that the type and number of values provided to a parameter _after arguments are expanded_
// The base types match as follows:

// TODO #307 Clarify whether and when range checks are applied for primitive types.
// I believe we decided that they are not verified by template invocations, since they
// are intended to constrain the _encoding_, not the resulting argument values.
// The corresponding concrete type _is_ verified, however, per the above.

=== Error Handling

TODO