general `variables` function #1

adamjanicki2 · 2023-10-31T23:19:20Z

No description provided.

adamjanicki2 · 2023-11-01T15:30:55Z

src/extract.js

+            if (node.operator === ":") {
+                return right;
+            }


This special case is needed to handle Mavo's group(key1: value1, key2: value2, ...) operator. I'm thinking about the best way to be able to get rid of this special case by adding the second options parameter to the function. My first thought is to include the field modifier inside of options, i.e.

options: { modifier: (node) => void; // applies some sort of modification to the node // other options fields }

This would then enable us (within Mavo) to call extract like this:

const expression = "group(key1: 'foo', key2: 'bar')"; const ast = Mavo.Script.parse(expression); const modifier = (node) => { if (node.type === "BinaryExpression" && node.operator === ":") { node = node.right; // ignore the left side of `key : value` BinaryExpressions } }; const extractResult = Vastly.extract(ast, {modifier})

Wanted to get feedback to see if anyone has any better thoughts on this, or the options parameter in general

Indeed, Mavo specific stuff has no place here, but it's totally fine for Mavo to be passing specific options to these functions, so the general idea is on the right track: we design vastly to be flexible enough to accommodate Mavo, then Mavo can just pass certain options when using vastly methods.

Some issues with this proposed API:

Only allowing a single function forces calling code to couple all their modifications in the same place

The modifier being an arbitrary function will make it harder for you to adapt extract() to use this parameter. You can't add handling code to individual cases, so where does it go? Before or after the switch?

Speaking of ordering, it is also not clear to users when this gets executed: before, after, or instead of our code?

I imagine a lot of use cases will want to specify these overrides once, rather than having to pass them in every single function call, so it would be nice if that was supported. As a rule of thumb, a good design principle for all UI design (including API design) is to try and make common things easy and complex things possible.

I have some thoughts on what an alternative API could look like, but I think there's value in you iterating on it before I share my thoughts. 😊

LeaVerou · 2023-11-01T16:03:37Z

Love the tests! How did it feel to specify them using this syntax? More or less tedious than other testing frameworks you have used?

Thinking some more and looking at the complications the recent change introduced, I wonder if the distinction between functions and variables is actually arbitrary, and we are guilty of overfitting to our primary use case. What if other code wanted to draw a different distinction, or none at all?

Thought experiment: If this function just extracted top level identifiers, what would the code look like to tell them apart after (remember that each node has a pointer to its parent, and we also have closest())?

Yes, that would be slower (without thinking about it much, I think the current solution is O(N) on the number of AST nodes, this one would be O(Nk), where k is depth), but usually these kinds of things are not hugely performance-critical (e.g. rewriting expressions happens once per expression for the lifetime of the app) so trading off some performance to get more flexibility could be a worthy tradeoff.

adamjanicki2 · 2023-11-01T16:22:48Z

Love the tests! How did it feel to specify them using this syntax? More or less tedious than other testing frameworks you have used?

It was definitely much easier! I might be biased because I typically write tests with a react app, and testing components is annoying, but still, it was great

Thought experiment: If this function just extracted top level identifiers, what would the code look like to tell them apart after (remember that each node has a pointer to its parent, and we also have closest())?

Hmm this is a good point! We could provide this general function, which returns identifiers (we would have to rename this to something like extractIdentifiers), and then a second function would be able to distinguish the type of identifier. Would providing that second function (the one that can check what type of identifier it is) be too specific to include in vastly, or would it be useful/generalizable enough for developers to include it?

Yes, that would be slower (without thinking about it much, I think the current solution is O(N) on the number of AST nodes, this one would be O(Nk), where k is depth), but usually these kinds of things are not hugely performance-critical (e.g. rewriting expressions happens once per expression for the lifetime of the app) so trading off some performance to get more flexibility could be a worthy tradeoff.

I agree, especially since the typical expression is mostly likely pretty simple, and wouldn't have many nodes anyways

LeaVerou · 2023-11-01T16:34:00Z

Hmm this is a good point! We could provide this general function, which returns identifiers (we would have to rename this to something like extractIdentifiers),

Important distinction: top-level identifiers, i.e. basically what you're extracting now, but in one array. Just extracting identifiers wouldn't be that useful (you can do it already with a simple walker)

and then a second function would be able to distinguish the type of identifier. Would providing that second function (the one that can check what type of identifier it is) be too specific to include in vastly, or would it be useful/generalizable enough for developers to include it?

We won't know until we see what the code for it would look like! (if it's up to the vastly user)

let variables = extractVariables(expression);
let functions = [], data = [];

for (let variable of variables) {
	if (/* ??? */) {
		functions.push(variable);
	}
	else {
		data.push(variable);
	}
}

Can you fill in the /* ??? */ ?

adamjanicki2 · 2023-11-01T17:13:22Z

Important distinction: top-level identifiers, i.e. basically what you're extracting now, but in one array. Just extracting identifiers wouldn't be that useful (you can do it already with a simple walker)

Yes

We won't know until we see what the code for it would look like! (if it's up to the vastly user)
let variables = extractVariables(expression);
let functions = [], data = [];

for (let variable of variables) {
	if (/* ??? */) {
		functions.push(variable);
	}
	else {
		data.push(variable);
	}
}
Can you fill in the /* ??? */ ?

Where does parent come from? I figured it would be something like

const variables = extractVariables(expr);
const functions = [], data = [];

for (const variable of variables) {
    if (variable.parent && variable.parent.type === "CallExpression") {
        functions.push(variable);
    } else {
        data.push(variable);
    }
}

but when I ran a simple test, the parent field was not populated.

LeaVerou · 2023-11-01T18:04:25Z

Where does parent come from?

Huh. Right now from serialize(), which is totally not the right place for it!

Just thinking out loud here:

Rather than having this be a side effect of one of the other functions, it should probably be its own function.
Challenge is how to prevent it from adding too much overhead? It’s super common to need parent references, but we only need to set them once (rewriting functions will need to adjust them of course). Perhaps the function can assume that if parent is set on a node, it will also be set on its children, but not sure if that's a good heuristic.

Note that functions cannot always do this on an as-needed basis: if e.g. closest() receives an AST node without a parent, it's stuck.

I figured it would be something like

const variables = extractVariables(expr);
const functions = [], data = [];

for (const variable of variables) {
    if (variable.parent && variable.parent.type === "CallExpression") {
        functions.push(variable);
    } else {
        data.push(variable);
    }
}

but when I ran a simple test, the parent field was not populated.

That’s not the main issue with this:

this will also fail for things like the foo in Array.isArray(foo), since that also has a CallExpression parent.
Would it always be the direct parent?
Worse yet, since we don't store the exact way a child is related to its parent, we'd need to fetch all children and check which one it is manually (it can only be one). We probably don't want to mess with the nodes too much, so this may be acceptable compared to the alternate solution of adding a parentProperty property or something.
Nit, but variable.parent?.type === "CallExpression" would be a little cleaner

Another way would be to treat the nodes as entirely untouchable, and have a WeakMap to map them to arbitrary metadata. This is suboptimal though:

Not a huge fan of the memory complexity, since the nodes may not be garbage collected for a while after vastly is done with them.
Every module would need to import this data structure and depend on it

adamjanicki2 · 2023-11-01T18:22:27Z

Also thinking out loud: what if instead of returning a list of nodes, this function returns a list of objects with this form?

{
    node: NodeObj;
    parentProperty: "callee" | "left" | "right" | ... ;
}

That way the node wouldn't be mutated, and someone who uses this function would be able to tell a variable's relationship to its parent. The downside is that this adds memory per each node in the output, but what do you think?

adamjanicki2 · 2023-11-04T18:31:19Z

Coming back to this after a few days, my above comment doesn't make sense given that we'd potentially need to walk up the tree multiple levels to determine if an identifier is a function or not. Perhaps there's some sort of way we might consider making a new function called expandAST which takes an AST and adds parent pointers to it, and then keep this function as is, extractVariables which takes an AST (expanded or not) and returns all top level identifiers.

Then we could call extractVariables(expandAST(ast)) to get a list of variables with parent pointers. From this list, we could use the parent pointers to filter out a list of functions from the list of variables, using closest

I'm struggling to think of a great way to solve this, but does this seem like a solution that could be reasonable? The memory complexity is again not good, but I can't think of a better solution that's not overfitting to Mavo's use case...

LeaVerou · 2023-11-04T19:12:07Z

I started a new issue for the broader architectural discussion on parent references: #3

You don't need to block on it: since your function accepts whole ASTs, you can just call parents.setAll() as the first step for now.

This is somewhat orthogonal to being able to tell which top-level identifiers are what, because even with being able to get parents it's still nontrivial.

LeaVerou

Hi there, see comments

src/parents.js

src/extract.js

LeaVerou · 2023-11-07T17:21:44Z

src/extract.js

+        }
+    }
+
+    return _extractIdentifiers(node);


Why do we need a separate function here? It seems to take the same arguments?
Other functions use this pattern when they need to pass more arguments to the recursive function.

I did this because it was easier to only have to pass in 1 argument in recursive subcalls rather than all of them. Plus, then parents.setAll only needs to be called once at the start of the function (I know that even if it gets called multiple times it will only walk the tree once because it uses the heuristic of if a node has a parent pointer then it doesn't walk its subtree, but still)

If you don't like this and would prefer it be simpler, I can unnest it

src/extract.js

LeaVerou · 2023-11-14T21:45:31Z

src/variables.js

+ * Recursively traverse the AST and return all top-level identifiers
+ * @param {object} node  
+ * @param {object} [options]
+ * @param {function(object): boolean} [options.filter] A function that returns true if the node should be included


I'd leave options out until we have use cases that need them.

LeaVerou · 2023-11-14T21:45:59Z

src/variables.js

+ * @param {object} node  
+ * @param {object} [options]
+ * @param {function(object): boolean} [options.filter] A function that returns true if the node should be included
+ * @param {boolean} [options.addParents] If true, add a parent property to each node


Let's leave that out until we know what we're doing in #3

LeaVerou · 2023-11-15T15:36:04Z

test/variables.js

+	name: "variables()",
+	run (expression) {
+		const ast = jsep(expression);
+        let identifiers = variables(ast);


Nit: indentation

LeaVerou · 2023-11-15T15:38:26Z

src/variables.js

+			const propertyChildren = property.type === "Identifier" ? [] : variables(property);
+			return variables(object).concat(propertyChildren);
+		// Rest of the cases contain a single variable
+		// Also check for filter condition


Suggested change

// Also check for filter condition

LeaVerou

See minor comments, but overall looks good!
I don't think it should hold the PR up, but the tests are very Mavo-specific, and it's unclear all are testing sufficiently different things.

adamjanicki2 · 2023-11-15T17:05:16Z

Got it, I will write some more robust tests

LeaVerou · 2023-11-15T17:07:56Z

Got it, I will write some more robust tests

Keep in mind that's pretty low priority — everything else we discussed is far more important!

adamjanicki2 · 2023-11-15T17:10:12Z

Yes, I'm just going to write a handful more then merge in ~1 hour

LeaVerou · 2023-11-15T18:14:49Z

Re: your most recent commit, that's a good first step, but I think we also need to remove some. It's not clear what each test is actually testing, and I suspect there is a lot of duplication.

adamjanicki2 · 2023-11-15T18:23:29Z

Ok, I can make a follow-up PR to deduplicate and get better coverage

adamjanicki2 added 3 commits October 29, 2023 09:58

attempt 1

2b74699

Merge remote-tracking branch 'origin/main' into get-vars

5aa1577

rename to extract

57ce0a0

adamjanicki2 requested a review from LeaVerou November 1, 2023 02:02

adamjanicki2 commented Nov 1, 2023

View reviewed changes

adamjanicki2 added 2 commits November 5, 2023 09:45

Merge remote-tracking branch 'origin/main' into extract

dad1d25

generalized extract

b6957c7

adamjanicki2 changed the title ~~[wip] extract function~~ extractIdentifiers function Nov 5, 2023

adamjanicki2 added 2 commits November 5, 2023 16:40

uncomment tests

a1e7f55

finishes tests pending parents bug fix

deff7cd

LeaVerou requested changes Nov 7, 2023

View reviewed changes

adamjanicki2 added 2 commits November 8, 2023 09:06

address code review feedback

08761de

fix import

cfee5c5

LeaVerou requested changes Nov 15, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into extract

dad607d

adamjanicki2 changed the title ~~extractIdentifiers function~~ general variables function Nov 15, 2023

delete options for now

c7c5f35

adamjanicki2 requested a review from LeaVerou November 15, 2023 14:11

LeaVerou reviewed Nov 15, 2023

View reviewed changes

LeaVerou approved these changes Nov 15, 2023

View reviewed changes

fix nits

06f51fa

add more general test cases

3d08c0e

adamjanicki2 merged commit 79d6555 into main Nov 15, 2023

adamjanicki2 deleted the extract branch November 15, 2023 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

general `variables` function #1

general `variables` function #1

adamjanicki2 commented Oct 31, 2023 •

edited

Loading

adamjanicki2 Nov 1, 2023 •

edited

Loading

LeaVerou Nov 1, 2023

LeaVerou commented Nov 1, 2023 •

edited

Loading

adamjanicki2 commented Nov 1, 2023

LeaVerou commented Nov 1, 2023 •

edited

Loading

adamjanicki2 commented Nov 1, 2023

LeaVerou commented Nov 1, 2023

adamjanicki2 commented Nov 1, 2023

adamjanicki2 commented Nov 4, 2023

LeaVerou commented Nov 4, 2023

LeaVerou left a comment

LeaVerou Nov 7, 2023

adamjanicki2 Nov 8, 2023

LeaVerou Nov 14, 2023

LeaVerou Nov 14, 2023

LeaVerou Nov 15, 2023

LeaVerou Nov 15, 2023

LeaVerou left a comment

adamjanicki2 commented Nov 15, 2023

LeaVerou commented Nov 15, 2023

adamjanicki2 commented Nov 15, 2023

LeaVerou commented Nov 15, 2023

adamjanicki2 commented Nov 15, 2023

general variables function #1

general variables function #1

Conversation

adamjanicki2 commented Oct 31, 2023 • edited Loading

adamjanicki2 Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

LeaVerou Nov 1, 2023

Choose a reason for hiding this comment

LeaVerou commented Nov 1, 2023 • edited Loading

adamjanicki2 commented Nov 1, 2023

LeaVerou commented Nov 1, 2023 • edited Loading

adamjanicki2 commented Nov 1, 2023

LeaVerou commented Nov 1, 2023

adamjanicki2 commented Nov 1, 2023

adamjanicki2 commented Nov 4, 2023

LeaVerou commented Nov 4, 2023

LeaVerou left a comment

Choose a reason for hiding this comment

LeaVerou Nov 7, 2023

Choose a reason for hiding this comment

adamjanicki2 Nov 8, 2023

Choose a reason for hiding this comment

LeaVerou Nov 14, 2023

Choose a reason for hiding this comment

LeaVerou Nov 14, 2023

Choose a reason for hiding this comment

LeaVerou Nov 15, 2023

Choose a reason for hiding this comment

LeaVerou Nov 15, 2023

Choose a reason for hiding this comment

LeaVerou left a comment

Choose a reason for hiding this comment

adamjanicki2 commented Nov 15, 2023

LeaVerou commented Nov 15, 2023

adamjanicki2 commented Nov 15, 2023

LeaVerou commented Nov 15, 2023

adamjanicki2 commented Nov 15, 2023

general `variables` function #1

general `variables` function #1

adamjanicki2 commented Oct 31, 2023 •

edited

Loading

adamjanicki2 Nov 1, 2023 •

edited

Loading

LeaVerou commented Nov 1, 2023 •

edited

Loading

LeaVerou commented Nov 1, 2023 •

edited

Loading