Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow implicit multiplication of multiple single-character variables #2236

Closed
joshhansen opened this issue Jun 3, 2021 · 6 comments
Closed
Labels
category:expressions Issues about the expression parser, variable scoping etc.

Comments

@joshhansen
Copy link
Contributor

The quadratic formula's 4ac currently parses as a constant 4 times a single symbol ac. But it is obvious to anyone familiar with the formula that it represents a constant times symbol a times symbol c.

Enhancing the MathJS expression syntax to allow multiple single-character variables to be implicitly multiplied would make it more closely resemble mathematics as typically used. (Implicit multiplication of multi-character variables doesn't seem to be "allowed": revenue*units but not revenueunits.)

To do this would require the parser to know which variable and function names might be referenced, information presently available only to compile and evaluate.

The main issue would be the introduction of ambiguity relative to variable and function names.

For example, with variables s, q, r, and t declared, the expression sqrt(4) becomes ambiguous as it could either be a series of implicit multiplications or an invocation of the existing function sqrt.

A simple rule would eliminate the ambiguity: any token that can be understood as a function or variable name in its entirety should be interpreted as such; otherwise implicit multiplication of single-character variables should be considered.

So sqrt(4) would be interpreted as square root of four because a function sqrt is known to exist, even if variables s, q, r, and t are also defined and could possibly be interpreted as implicitly multiplied. But qrt would be interpreted as q * r * t because no function or variable qrt is already known to exist.

The restriction to single-character variables is key, as it eliminates one form of ambiguity that arises with multi-character variables. For instance, allowing implicit multiplication of multi-character variables with variables a, ab, bc, and c defined leads to two potential parses of abc: ab * c and a * bc. But with single-character variables only allowed in multiple implicit multiplication, the only possible parse would be a * b * c, if variables a, b, and c all exist.

It also turns an exhaustive search into a series of table lookups. The implicit multiplication interpretation is possible only if each character in the token is itself defined as a variable.

@cshaa
Copy link
Collaborator

cshaa commented Jun 3, 2021

But it is obvious to anyone familiar with the formula that it represents a constant times symbol a times symbol c.

I don't think this is obvious at all in a “programming setting”. For example Mathematica would treat it as one symbol called ac. As I described in #2230 (comment), I'm not a fan of syntax that parses differently in different lexical scopes. What's wrong with the unambiguous 4 a c? What is the usecase for your proposed syntax?

@joshhansen
Copy link
Contributor Author

I'm grading answers to math problems. I'd like users to be able to input the math using as natural an expression language as possible. From a user experience perspective, I need things like 4ac to parse as implicit multiplication, just like it would be understood when writing the answer on a page.

@cshaa
Copy link
Collaborator

cshaa commented Jun 3, 2021

That sounds like a very interesting use case, thanks for the explanation!

I think that whether this feature should be added to mathjs boils down to whether we want the parser to be more like Mathematica (an actual programming language) or like Wolfram Alpha (a heuristic-based expression parsing engine). @josdejong What is your vision regarding the parser?

However, even if we decided to support it, there's still a problem: the scope is unknown at parse time. It is, by design, only known once you either execute or compile it. Therefore, with the current design, we'd have to implement the heuristic “at runtime” (ie. when fetching parameters from the scope), we couldn't do it “at compile time” (ie. when parsing). One of the side-effects of this would be that parse('xyz') would still result in SymbolNode(xyz) and not OperatorNode(*, x, y, z). That means you still couldn't compare your students' answers.

I think that a better solution (both for your use case and for the math.js library) would be to add a support for visitors. Visitors are a common feature in syntax tree systems, but they're either missing from math.js, or they are undocumented. The code could look something like this:

function visitor(node) {
    if (isSymbolNode(node) && node.name.length > 1) {
        const multiplicands = [...node.name].map( x => new math.SymbolNode(x) )
        return new math.OperatorNode('*', 'multiply', multiplicands, true)
    }
    return node
}
const standardizedTree = parse(expr).visit(visitor)

@josdejong Can something like this be already achieved in the current math.js?

@joshhansen
Copy link
Contributor Author

@m93a I think you sum up the situation well.

Your visit function already exists: it's called transform, available on nodes in the expression tree. Your instinct was a good one.

@josdejong
Copy link
Owner

Interesting discussion, thanks @joshhansen.

Using single-character variables makes sense in mathematical context where you have a handful of variables only, with a sort of "fixed" meaning, so everyone already knows what you mean. So there it makes sense to interpret 4ac as 4*a*c.

In general however I think using single character variables is an anti pattern, and it is better to use meaningful, self describing variable names like width and gravity etc instead of w and g.

The expression parser is indeed determining interpreting an expression without looking at the actual values of variables. That makes it's behavior reliable and predictable. It may be interesting to look into taking into account the current scope to try understand the users intention, add heuristics. This can become very complex very soon though (you can look up some of the discussions we had about coming up with a good solution to interpret implicit multiplication in combination with division and units, see #792). In that case I suggest we open a separate topic to think this through, and implement it as a separate version of evaluate so we have a deterministic and a heuristic one.

If you do want to interpret single-character variables, it is quite easy to write either a regular expression replacing 4ac with 4*a*c, or write a transform to replace a SymbolNode holding a multi-character variable with multiple ones, something like:

function splitSingleVariableNames(node) {
  return (node.isSymbolNode && node.name.length > 1)
    ? new math.FunctionNode('prod', node.name.split('').map(name => new math.SymbolNode(name)))
    : node
}

const original = math.parse('4ac')
const transformed = original.transform(splitSingleVariableNames)

console.log(original.toString())    // '4 ac'
console.log(transformed.toString()) // '4 prod(a, c)'

@cshaa cshaa added the category:expressions Issues about the expression parser, variable scoping etc. label Jun 19, 2021
@cshaa
Copy link
Collaborator

cshaa commented Jun 19, 2021

Closing this for now, can reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:expressions Issues about the expression parser, variable scoping etc.
Projects
None yet
Development

No branches or pull requests

3 participants