Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

general variables function #1

Merged
merged 13 commits into from
Nov 15, 2023
Merged

general variables function #1

merged 13 commits into from
Nov 15, 2023

Conversation

adamjanicki2
Copy link
Contributor

@adamjanicki2 adamjanicki2 commented Oct 31, 2023

No description provided.

@adamjanicki2 adamjanicki2 requested a review from LeaVerou November 1, 2023 02:02
src/extract.js Outdated
Comment on lines 16 to 18
if (node.operator === ":") {
return right;
}
Copy link
Contributor Author

@adamjanicki2 adamjanicki2 Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special case is needed to handle Mavo's group(key1: value1, key2: value2, ...) operator. I'm thinking about the best way to be able to get rid of this special case by adding the second options parameter to the function. My first thought is to include the field modifier inside of options, i.e.

options: {
    modifier: (node) => void; // applies some sort of modification to the node
    // other options fields
}

This would then enable us (within Mavo) to call extract like this:

const expression = "group(key1: 'foo', key2: 'bar')";
const ast = Mavo.Script.parse(expression);
const modifier = (node) => {
    if (node.type === "BinaryExpression" && node.operator === ":") {
        node = node.right; // ignore the left side of `key : value` BinaryExpressions
    }
};
const extractResult = Vastly.extract(ast, {modifier})

Wanted to get feedback to see if anyone has any better thoughts on this, or the options parameter in general

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, Mavo specific stuff has no place here, but it's totally fine for Mavo to be passing specific options to these functions, so the general idea is on the right track: we design vastly to be flexible enough to accommodate Mavo, then Mavo can just pass certain options when using vastly methods.

Some issues with this proposed API:

  • Only allowing a single function forces calling code to couple all their modifications in the same place
  • The modifier being an arbitrary function will make it harder for you to adapt extract() to use this parameter. You can't add handling code to individual cases, so where does it go? Before or after the switch?
  • Speaking of ordering, it is also not clear to users when this gets executed: before, after, or instead of our code?
  • I imagine a lot of use cases will want to specify these overrides once, rather than having to pass them in every single function call, so it would be nice if that was supported. As a rule of thumb, a good design principle for all UI design (including API design) is to try and make common things easy and complex things possible.

I have some thoughts on what an alternative API could look like, but I think there's value in you iterating on it before I share my thoughts. 😊

@LeaVerou
Copy link
Member

LeaVerou commented Nov 1, 2023

Love the tests! How did it feel to specify them using this syntax? More or less tedious than other testing frameworks you have used?

Thinking some more and looking at the complications the recent change introduced, I wonder if the distinction between functions and variables is actually arbitrary, and we are guilty of overfitting to our primary use case. What if other code wanted to draw a different distinction, or none at all?

Thought experiment: If this function just extracted top level identifiers, what would the code look like to tell them apart after (remember that each node has a pointer to its parent, and we also have closest())?

Yes, that would be slower (without thinking about it much, I think the current solution is O(N) on the number of AST nodes, this one would be O(Nk), where k is depth), but usually these kinds of things are not hugely performance-critical (e.g. rewriting expressions happens once per expression for the lifetime of the app) so trading off some performance to get more flexibility could be a worthy tradeoff.

@adamjanicki2
Copy link
Contributor Author

Love the tests! How did it feel to specify them using this syntax? More or less tedious than other testing frameworks you have used?

It was definitely much easier! I might be biased because I typically write tests with a react app, and testing components is annoying, but still, it was great

Thought experiment: If this function just extracted top level identifiers, what would the code look like to tell them apart after (remember that each node has a pointer to its parent, and we also have closest())?

Hmm this is a good point! We could provide this general function, which returns identifiers (we would have to rename this to something like extractIdentifiers), and then a second function would be able to distinguish the type of identifier. Would providing that second function (the one that can check what type of identifier it is) be too specific to include in vastly, or would it be useful/generalizable enough for developers to include it?

Yes, that would be slower (without thinking about it much, I think the current solution is O(N) on the number of AST nodes, this one would be O(Nk), where k is depth), but usually these kinds of things are not hugely performance-critical (e.g. rewriting expressions happens once per expression for the lifetime of the app) so trading off some performance to get more flexibility could be a worthy tradeoff.

I agree, especially since the typical expression is mostly likely pretty simple, and wouldn't have many nodes anyways

@LeaVerou
Copy link
Member

LeaVerou commented Nov 1, 2023

Hmm this is a good point! We could provide this general function, which returns identifiers (we would have to rename this to something like extractIdentifiers),

Important distinction: top-level identifiers, i.e. basically what you're extracting now, but in one array. Just extracting identifiers wouldn't be that useful (you can do it already with a simple walker)

and then a second function would be able to distinguish the type of identifier. Would providing that second function (the one that can check what type of identifier it is) be too specific to include in vastly, or would it be useful/generalizable enough for developers to include it?

We won't know until we see what the code for it would look like! (if it's up to the vastly user)

let variables = extractVariables(expression);
let functions = [], data = [];

for (let variable of variables) {
	if (/* ??? */) {
		functions.push(variable);
	}
	else {
		data.push(variable);
	}
}

Can you fill in the /* ??? */ ?

@adamjanicki2
Copy link
Contributor Author

Important distinction: top-level identifiers, i.e. basically what you're extracting now, but in one array. Just extracting identifiers wouldn't be that useful (you can do it already with a simple walker)

Yes

We won't know until we see what the code for it would look like! (if it's up to the vastly user)

let variables = extractVariables(expression);
let functions = [], data = [];

for (let variable of variables) {
	if (/* ??? */) {
		functions.push(variable);
	}
	else {
		data.push(variable);
	}
}

Can you fill in the /* ??? */ ?

Where does parent come from? I figured it would be something like

const variables = extractVariables(expr);
const functions = [], data = [];

for (const variable of variables) {
    if (variable.parent && variable.parent.type === "CallExpression") {
        functions.push(variable);
    } else {
        data.push(variable);
    }
}

but when I ran a simple test, the parent field was not populated.

@LeaVerou
Copy link
Member

LeaVerou commented Nov 1, 2023

Where does parent come from?

Huh. Right now from serialize(), which is totally not the right place for it!

Just thinking out loud here:

Rather than having this be a side effect of one of the other functions, it should probably be its own function.
Challenge is how to prevent it from adding too much overhead? It’s super common to need parent references, but we only need to set them once (rewriting functions will need to adjust them of course). Perhaps the function can assume that if parent is set on a node, it will also be set on its children, but not sure if that's a good heuristic.

Note that functions cannot always do this on an as-needed basis: if e.g. closest() receives an AST node without a parent, it's stuck.

I figured it would be something like

const variables = extractVariables(expr);
const functions = [], data = [];

for (const variable of variables) {
    if (variable.parent && variable.parent.type === "CallExpression") {
        functions.push(variable);
    } else {
        data.push(variable);
    }
}

but when I ran a simple test, the parent field was not populated.

That’s not the main issue with this:

  • this will also fail for things like the foo in Array.isArray(foo), since that also has a CallExpression parent.
  • Would it always be the direct parent?
  • Worse yet, since we don't store the exact way a child is related to its parent, we'd need to fetch all children and check which one it is manually (it can only be one). We probably don't want to mess with the nodes too much, so this may be acceptable compared to the alternate solution of adding a parentProperty property or something.
  • Nit, but variable.parent?.type === "CallExpression" would be a little cleaner

Another way would be to treat the nodes as entirely untouchable, and have a WeakMap to map them to arbitrary metadata. This is suboptimal though:

  • Not a huge fan of the memory complexity, since the nodes may not be garbage collected for a while after vastly is done with them.
  • Every module would need to import this data structure and depend on it

@adamjanicki2
Copy link
Contributor Author

Also thinking out loud: what if instead of returning a list of nodes, this function returns a list of objects with this form?

{
    node: NodeObj;
    parentProperty: "callee" | "left" | "right" | ... ;
}

That way the node wouldn't be mutated, and someone who uses this function would be able to tell a variable's relationship to its parent. The downside is that this adds memory per each node in the output, but what do you think?

@adamjanicki2
Copy link
Contributor Author

Coming back to this after a few days, my above comment doesn't make sense given that we'd potentially need to walk up the tree multiple levels to determine if an identifier is a function or not. Perhaps there's some sort of way we might consider making a new function called expandAST which takes an AST and adds parent pointers to it, and then keep this function as is, extractVariables which takes an AST (expanded or not) and returns all top level identifiers.

Then we could call extractVariables(expandAST(ast)) to get a list of variables with parent pointers. From this list, we could use the parent pointers to filter out a list of functions from the list of variables, using closest

I'm struggling to think of a great way to solve this, but does this seem like a solution that could be reasonable? The memory complexity is again not good, but I can't think of a better solution that's not overfitting to Mavo's use case...

@LeaVerou
Copy link
Member

LeaVerou commented Nov 4, 2023

I started a new issue for the broader architectural discussion on parent references: #3

You don't need to block on it: since your function accepts whole ASTs, you can just call parents.setAll() as the first step for now.

This is somewhat orthogonal to being able to tell which top-level identifiers are what, because even with being able to get parents it's still nontrivial.

@adamjanicki2 adamjanicki2 changed the title [wip] extract function extractIdentifiers function Nov 5, 2023
Copy link
Member

@LeaVerou LeaVerou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, see comments

src/parents.js Outdated Show resolved Hide resolved
src/extract.js Outdated Show resolved Hide resolved
src/extract.js Outdated
}
}

return _extractIdentifiers(node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate function here? It seems to take the same arguments?
Other functions use this pattern when they need to pass more arguments to the recursive function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because it was easier to only have to pass in 1 argument in recursive subcalls rather than all of them. Plus, then parents.setAll only needs to be called once at the start of the function (I know that even if it gets called multiple times it will only walk the tree once because it uses the heuristic of if a node has a parent pointer then it doesn't walk its subtree, but still)

If you don't like this and would prefer it be simpler, I can unnest it

src/extract.js Outdated Show resolved Hide resolved
src/variables.js Outdated
* Recursively traverse the AST and return all top-level identifiers
* @param {object} node
* @param {object} [options]
* @param {function(object): boolean} [options.filter] A function that returns true if the node should be included
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave options out until we have use cases that need them.

src/variables.js Outdated
* @param {object} node
* @param {object} [options]
* @param {function(object): boolean} [options.filter] A function that returns true if the node should be included
* @param {boolean} [options.addParents] If true, add a parent property to each node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave that out until we know what we're doing in #3

@adamjanicki2 adamjanicki2 changed the title extractIdentifiers function general variables function Nov 15, 2023
name: "variables()",
run (expression) {
const ast = jsep(expression);
let identifiers = variables(ast);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation

src/variables.js Outdated
const propertyChildren = property.type === "Identifier" ? [] : variables(property);
return variables(object).concat(propertyChildren);
// Rest of the cases contain a single variable
// Also check for filter condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Also check for filter condition

Copy link
Member

@LeaVerou LeaVerou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See minor comments, but overall looks good!
I don't think it should hold the PR up, but the tests are very Mavo-specific, and it's unclear all are testing sufficiently different things.

@adamjanicki2
Copy link
Contributor Author

Got it, I will write some more robust tests

@LeaVerou
Copy link
Member

Got it, I will write some more robust tests

Keep in mind that's pretty low priority — everything else we discussed is far more important!

@adamjanicki2
Copy link
Contributor Author

Yes, I'm just going to write a handful more then merge in ~1 hour

@adamjanicki2 adamjanicki2 merged commit 79d6555 into main Nov 15, 2023
@adamjanicki2 adamjanicki2 deleted the extract branch November 15, 2023 17:53
@LeaVerou
Copy link
Member

Re: your most recent commit, that's a good first step, but I think we also need to remove some. It's not clear what each test is actually testing, and I suspect there is a lot of duplication.

@adamjanicki2
Copy link
Contributor Author

Ok, I can make a follow-up PR to deduplicate and get better coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants