-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Select payload subset without JMESPath #1644
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
Hi @rjmackay thank you for taking the time to open this issue. We have opted for using JMESPath expressions because it’s one of the standard query languages for JSON structures and believe that it can cover both simpler and more complex use cases effectively and efficiently. Most JMESPath implementations including the one we are currently using (jmespath), the one you suggested in the linked issue (#1645), and others from other languages like Python, all follow a similar call order that allows to cache operations and speed up repeated expressions. For example, here’s a diagram of how JMESPath implementations evaluate a request: flowchart LR
A[Take expression] --> B{Has cached expression}
B -->|Yes| C[Apply expression]
B -->|No| D[Create AST for expression]
D --> C
C --> E[Return result]
Whenever you provide a JMESPath expression (i.e. foo.bar) and a payload (i.e. the request), the module creates an AST (Abstract Syntax Tree) of the expression, which is then used to visit the payload using an implementation of a Pratt Parser (aka top down precedence). The AST generated by an expression is then cached in memory so that all subsequent evaluations can reuse the AST and only need to actually extract the data. On top of the above, many implementations allow you to also parse an expression beforehand so that the corresponding AST is already in the cache when the first request is parsed. Doing this outside of the Lambda handler would ensure that this is taken care of during the function's initialization. With these considerations in mind, the main difference in terms of operations between using a JMESPath library vs bringing your own function resides purely on the respective implementations. As mentioned earlier JMESPath uses the abstract tree created from the expression to visit the payload and extract the corresponding data. Let’s take a high level look at how the parser works, and how the object is visited using your example and assuming a payload that looks like this: {
"headers": {
"X-Idempotency-Key": "foo"
}
} With this payload the corresponding JMESPath expression to extract the header you want would be {
"type": "subexpression",
"children": [
{
"type": "field",
"children": [],
"value": "headers"
},
{
"type": "field",
"children": [],
"value": "X-Idempotency-Key"
}
]
} Using this abstract tree, the module visits the payload and extracts the data in almost the same way that you described, which corresponds to the following (simplified) pseudo execution stack:
You can see that this is the case in all the implementations I mentioned: So to sum up: even for simple cases like the one you describe the two implementations are equivalent and if performance on the first request is a concern, you can pre-compile an expression so that the AST is generated during the initialization phase. Having the support of an expressive query language like JMESPath however also allows you to easily adapt to different use cases as your workload evolves. While it’s true that for a simple field extraction an arrow function might be tempting, if your payload becomes more complex, or you want to query data in a more involved way like extracting multiple fields or applying logical operators then you don’t have to reimplement your own parsing function and you can just write a JMESPath expression that will do all that for you. I hope this clarifies why we stand behind the choice of using JMESPath. |
This issue is now closed. Please be mindful that future comments are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so. |
Use case
Executing jmespath on every request just to select part of the payload seems like overkill. It would be easier to express this selector with a simple function instead and would avoid the overhead of jmespath
Solution/User Experience
Allow passing a function to select the idempotency key instead of using jmespath
The example from the docs could be easily expressed as
Ideally make jmespath an optional dependency too.
Alternative solutions
No response
Acknowledgment
Future readers
Please react with 👍 and your use case to help us understand customer demand.
The text was updated successfully, but these errors were encountered: