Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit multiplication and unit division #792

Closed
valoricDe opened this issue Feb 4, 2017 · 38 comments
Closed

Implicit multiplication and unit division #792

valoricDe opened this issue Feb 4, 2017 · 38 comments

Comments

@valoricDe
Copy link

Hi, I tried following in the live parser on http://mathjs.org/#demo
17 h / 1h and the result was 17 h^2 which was kind of unexpected ^^,
Is it possible to have precedence between 1 and h (or units in general) before the implizit multiplication?

@josdejong
Copy link
Owner

Implicit multiplication can be tricky indeed, and I agree that it doesn't always work as expected. What should 1/2 h return? 0.5 h or 0.5 / h?

The current behavior is that implicit multiplication has the same precedence as explicit multiplication and division:

17 hour / 1 hour  =  17 * hour / 1 * hour  = 17 hour^2
17 / 1 hour       =  17 / 1 * hour         = 17 hour

I just tried this in google, which has a bit different behavior, more in line with what we typically would expect:

17 hour / 1 hour = (17 * hour) / (1 * hour) = 17      
    => implicit multiplication has highest precedence

17 / 1 hour      =  (17 / 1) * hour         = 17 hour   
    => implicit multiplication has lowest precedence

I'm not sure about the exact rules in which cases googles calculator gives implicit multiplication either higher or lower precedence. Maybe it's when the implicit multiplication is preceded by a division which has either a number or symbol as left hand side. We could give this some more thought and see whether this would be a good idea for math.js too.

@valoricDe
Copy link
Author

Maybe the google calculator tries to match units greedy to left hand nodes or numbers?

@ericman314
Copy link
Collaborator

ericman314 commented Oct 13, 2017

I would love to see a good resolution to this issue. Doing some testing of my own on Google, this is what I found:

Works as expected:
1 hour / 2 hour = (1 hour) / (2 hours) = 0.5
1 hour / (1+1) hour = (1 hour) / ((1 + 1) * hour) = 0.5

Addition definitely does not have higher precedence than multiplication:
1 hour / (1+1 hour) = Does not parse

Here's an interesting one--implicit addition:
4 lb + 2 1/2 lb = (4 pounds) + ((2 1/2) pounds) = 6.5 pounds

These two give the same result due to different precedence of implicit multiplication in each instance:
4 lb + 1/2 lb = (4 pounds) + ((1/2) pound) = 4.5 pounds
4 lb + 1 lb^2 / 2 lb = (4 pound) + ((1 (pound^2)) / (2 pound)) = 4.5 pounds

Trying to force a different behavior doesn't work:
1 lb^-1 + 1 / (2 lb) = (1 (pound^(-1))) + (1 / (2 pound)) = 3.30693393 kg-1
1 lb^-1 + 1 / 2 lb = Does not parse
1 lb^-1 + 1 lb / 2 lb^2 = (1 (pound^(-1))) + ((1 pound) / (2 (pound^2))) = 3.30693393 kg-1

Only some cases require a value in front of the unit:
1 m/s^2 + 1 m / 2 s^2 = (1 (m / (s^2))) + ((1 m) / (2 (s^2))) = 1.5 m / s2
1 m/s^2 + m / 2 s^2 = Does not parse
hour + 2 hour = hour + (2 hours) = 3 hours
2 hour + hour = Does not parse

This also gives the "wrong" result:
8.314 J / mol K + 8.314 J / mol K = (8.31400 ((J / mol) K)) + (8.31400 ((J / mol) K)) = 16 628 m3 kg s-2 mol-1

Definitely some interesting things going on there. It seems like operators follow this precedence, from highest to lowest:

  1. Implicit addition and multiplication within a compound or simple fraction (a b/c or b/c)
  2. Implicit multiplication
  3. Division

Rule 2 would make 1 hour / 2 hour return the expected 0.5, while Rule 1 would cause 1 / 2 hour to return 0.5 hour, also expected. I am guessing that changing the precedence of implicit multiplication would not be too difficult. But how hard would it be to recognize compound and simple fractions?

Edit: If implemented, this would obviously be a breaking change for v4 (or just configurable by a global setting).

@josdejong josdejong changed the title Unit Division Implicit multiplication and unit division Oct 16, 2017
@robinrodricks
Copy link

Would be excellent if implemented and made configurable, so does not break the existing apps that rely on the older behaviour.

Or perhaps we can just merge the tokens of the following: "50 kg" (as in, number followed by a unit), into a single token which is then parsed as usual by the expression engine.

@josdejong
Copy link
Owner

Yes I also would like to make some changes here. It will be breaking changes indeed, I've added this topic to the list in #682.

Whether it's much work or not depends on what we will change exactly, if there are some complicated look aheads needed it may be more work. @ericman314 do you think the rules you found are consistent, or are you still unsure? Can you expand the list of rules you found with the places of explicit addition/subtraction and explicit multiplication/division just to get things clear?

@robinrodricks I think we should make a hard choice here and not make this behavior configurable, I think that would result in a confusing mess.

@ericman314
Copy link
Collaborator

After thinking about this more, I would only propose changing one thing. Implicit multiplication between numbers and symbols, or symbols and symbols, would take higher precedence than regular multiplication and division. Currently it has equal precedence.

Current operator precedence:

Operators Description
... ...
*/.*./%mod Multiply, divide, modulus, implicit multiply
... ...

Proposed operator precedence:

Operators Description
... ...
symbol symbol, number symbol Implicit multiplication
*/.*./%mod Multiply, divide, modulus
... ...

Examples:

Expression Parsed as
1 / 2 * x (1 / 2) * x
1 / 2 x 1 / (2 * x)
1 / 2 x y 1 / ((2 * x) * y)
1 / 2 x * y (1 / (2 * x)) * y
2 1 / 2 Syntax error: number number is not supported

If any users are bothered by the higher precedence of implicit multiplication, they would always be able to add back in the *'s to restore the original behavior. This actually gives users more control over how their expression is parsed.

I've also done a little more investigating into Google's behavior for fractions. As impressive as it is, we would not reproduce that behavior right now. We could add this later, though I doubt there would be much interest for it.

For reference, here is Google's operator precedence from what I've observed:

Operators Description
number number/number Compound fraction
... ...
number / number Simple fraction
symbol symbol, number symbol Implicit multiplication
*/.*./%mod Multiply, divide, modulus
... ...

Google's different precedence for simple and compound fractions leads to some pretty intelligent behavior. 1 / 2 ^ 2 evaluates to 1 / (2 ^ 2) = 0.25, while 1 1 / 2 ^ 2 evaluates to (1 1 / 2) ^ 2 = 2.25. Another example: 1 1/2 ^ 4/5 parses as ((1 1/2)^4) / 5, while 1 1/2 ^ 3 4/5 parses as (1 1/2) ^ (3 4/5).

@josdejong
Copy link
Owner

Thanks for thinking this through @ericman314 ! I think you're right, giving implicit multiplication higher precedence than explicit multiplication/division will solve most of the issue. More advanced rules can be nice, but can also make the behavior less predictable for users, I'm not sure whether that's needed.

You specifically mention implicit multiplication of numbers and symbols, or symbols and symbols. Does that mean that you would like to keep the original behavior for other forms of implicit multiplication? How about implicit multiplication between numbers and parenthesis ( 2/3(4+5) and 2/(3+4)5 ) and parenthesis with parenthesis (2/(3+4)(5+6))?

@ericman314
Copy link
Collaborator

ericman314 commented Oct 18, 2017

You're right. To be consistent we should make all forms of implicit multiplication higher precedence, so there's no question how an expression will be parsed. That way the precedence of the implicit multiplication "operator" doesn't depend on the types of the operands.

@josdejong
Copy link
Owner

Yes that makes sense, just the simple rule implicit multiplication has higher precedence than explicit multiplication and division. That may be best.

I'm still a bit in doublt about 1/2 hour being evaluated as 1 / (2 * hour), don't you think that will become a new pitfall?

@nowherenearithaca
Copy link

nowherenearithaca commented Oct 18, 2017 via email

@FSMaxB
Copy link
Collaborator

FSMaxB commented Oct 18, 2017

@nowherenearithaca If you use parentheses, it is not ambiguous and you are safe.

Also at the moment, all kinds of multiplication have the same precedence (from left to right), but that would change if ericman314's suggestion would be implemented. But if you explicitly specify the multiplication via '*' it will still behave the same way it does currently.

@ericman314
Copy link
Collaborator

@josdejong that is indeed a valid concern. With my engineering background, I would type 0.5 hour rather than 1/2 hour, but others may prefer fractions. Google's solution to treat fractions differently is certainly clever, and most of the time it "just works". What if we try Google's way but just for simple fractions, not compound? I think that most of the time it will be invisible to the user: hopefully, it also "just works".

@ericman314
Copy link
Collaborator

@josdejong, I've implemented the new parsing rules for implicit multiplication and for fractions in my own branch here: https://github.com/ericman314/mathjs/blob/implicit-multiplication/lib/expression/parse.js

It's not ready for a PR yet but here is the comparison between the two files: develop...ericman314:implicit-multiplication

Try installing that file and running the unit tests to see how the new parsing rules change things.

@josdejong
Copy link
Owner

That's great @ericman314 ! I will play around with it this week to see how it looks/feels.

@robinrodricks
Copy link

Thanks @ericman314 for the fantastic solution to this problem! It would be great to move towards Google in terms of parsing behavior. Can't wait for the PR to play with the final version :)

@josdejong
Copy link
Owner

@ericman314 I'm working out all breaking changes for v4 now, and changing implicit multiplication is an important one in this regard. Shall I take over the start you already made? Or would you like to finish it?

@ericman314
Copy link
Collaborator

I'm happy to finish it. Should I submit the PR to the v4 branch?

@josdejong
Copy link
Owner

Thanks! Yes we will need to merge this in v4 since it will be a breaking change.

@ericman314
Copy link
Collaborator

So to be clear, there are two proposed changes to the parsing rules here:

  1. implicit multiplication now has higher precedence over multiplication and division
  2. division between two integers (fractions) has higher precedence over implicit multiplication

At present, 1/2x is parsed as (1/2) * x. Rule 1 causes it to be parsed as 1 / (2*x). Rule 2 causes it to again be parsed as (1/2) * x.

Since this is going to be a breaking change there are a few tests that are failing. But I did not expect this test to fail:

8 ./ 2 / 2  === 2

The expression now returns 8 instead of 2 because 2 / 2 is parsed as a fraction.

Here's a more illustrative example:

8.4 / 2 / 2 = 8.4
8.2 / 2 / 2 = 8.2
8.0 / 2 / 2 = 2

In the first two lines, 2 / 2 is interpreted as a fraction. But in the last line, 8.0 / 2 is first parsed as a fraction. I tried these expressions in Google and each evaluates left-to-right, as expected.

The reality is I can't make heads or tails of how Google is choosing to parse fractions. Sometimes they have higher precedence and sometimes not. For example, 2/3 kg is evaluated while 2/5 kg is not. There's some machine learning algorithm going on behind the scenes that tries to guess what the user wants to see.

Unless we can make rule 2 context-aware, I don't see how we can avoid the unexpected behavior. I see our options as first, we could remove rule 2, or second, try to apply rule 2 only where it makes sense.

@josdejong
Copy link
Owner

josdejong commented Jan 28, 2018

Thanks for your clear summary. I think we all agree on rule 1. We have to give rule 2. more thought I think.

I don't fully understand why two explicit divisions like 8.4 / 2 / 2 would have changed behavior, I think we can keep that the same?

How about the following rule 2:

  1. Explicit division gets higher precedence than implicit multiplication when:
    • lhs of the explicit division is a number
    • rhs of the explicit division is an implicit multiplication, where:
      • lhs is a number
      • rhs is a symbol

So some examples would give:

expression evaluated as
2 / 3 x (2 / 3) * x
2.5 / 5 kg (2.5 / 5) * kg
2.5 / 5 x y ((2.5 / 5) * x) * y
2 x / 5 y (2 * x) / (5 * y)
x 2.5 / 5 y (x * 2.5) / (5 * y)

@josdejong
Copy link
Owner

@ericman314 do you have an idea when you will have time to work on the implicit multiplication?

@ericman314
Copy link
Collaborator

ericman314 commented Feb 2, 2018 via email

@ericman314
Copy link
Collaborator

ericman314 commented Feb 3, 2018

After seeing the strange results of rule 2 in repeated divisions, I'm having doubts. In my opinion, when a user writes an expression, it should be absolutely clear to him/her how it will be parsed. I think if we try to implement rule 2, no matter how clever we are (and I thought my solution above was pretty clever), we will eventually find an example that results in unexpected behavior.

@josdejong, I know you might disagree with me here, but my vote is to not implement rule 2 at all and to make rule 1 optional:

math.config({ promoteImplicit: true })

Setting that option would promote the precedence of implicit multiplications above that of explicit multiplication and division. Because the default setting is false, this is not a breaking change, and totally harmless to users who don't know or care about that option. But it's there for whoever wants it.

@ericman314
Copy link
Collaborator

As a start, I submitted PR #1034 which adds a config option for implicit multiplication, but I'm happy to modify it as necessary according to whatever our final decision is here.

@josdejong
Copy link
Owner

Thanks for exploring Solution 2 @ericman314 . I totally believe you that this gives weird, hard to understand cases. That's really unfortunate, I had hoped we could find a nice solution which feels natural and consistent. Just for my understanding, can you post one or two of these weird cases?

Just to be sure we're on the same page: did you implement Solution 2 as described in your comment #792 (comment), or as described in my proposal here #792 (comment)?

Let me try to summarize the current state of the discussion:

  1. Normally, it's highly unfavorable to make breaking changes over non-breaking changes, but in this case I think we have the opportunity to do a breaking change for once. I think we should focus on what we think is the best solution for the long term, and not count this as an important factor for our decisions.
  2. I find it very important that the rules of the expression parser are easy to understand and feel consistent.
  3. I have strong doubts about the desirability of introducing an option promoteImplicit here and basically let the user decide. I expect that will become a major source of confusion (probably dethroning the current most asked question "0.1+0.2 gives wrong result"). I rather like to make one choice and live with it.
  4. With just Solution 1, the choice boils down to implicit multiplication having higher precedence than explicit multiplication/division or not.
  5. Both of these options from (4.) have unexpected behavior in certain cases. The meaning of 2x / 3y looks most logically (2*x) / (3*x), whilst 1/2 x more looks like it was meant as (1/2) * x. So: both solution will surprise unaware users in certain cases.

What may help is to try to get a clear picture of the audience and their expectations.

  • A) Mathematiciance are used to not relying on implicit multiplication because of the ambiguity. Applications like Matlab don't support implicit multiplication. Mathematica (Wolfram) treats it with the same precedence as explicit multipliations.
  • B) Programmers mostly don't have support for implicit multiplication in their programming languages, and are also trained not to rely on ambiguous operators and use parenthesis where needed. I don't think they will care much about it that much.
  • C) Regular, non mathematical users. Like users using google for calculations, pupils. A more "logic" way of evaluation of implicit multiplication (similar to google) would really help them.

The user group that has most benefit for a change in implicit multiplication is group C "regular users". I'm afraid that we will not really help them since both solutions that we can offer them will stil give unexpected results in certain cases as explained in (5.)

@ericman314
Copy link
Collaborator

Jos, thanks for clarifying your position.

I jumped to conclusions without trying your suggestion. To be honest I wasn't exactly sure how to do it. I suppose the parser would simply look ahead a few tokens to find the pattern [number] / [number] [symbol]? If that's the case, it's almost just as easy as my first implementation of rule 2 which matches [integer] / [integer].

The reason my first implementation of rule 2 failed was that it was being applied in instances where implicit multiplication wasn't even happening, such as 8.4 / 2 / 2. The parser would match 2 / 2 first since it looks like [integer] / [integer].

I will try your suggestion. Hopefully I can get it to work. I've never seen a parser written like in parse.js. Functions for consuming tokens are called recursively from highest precedence to lowest. Quite elegant. As I understand it, here is the new operator precedence from highest to lowest:

  1. Explicit division when the division matches this pattern: [number] / [number] [symbol]
  2. Implicit multiplication
  3. All other division and multiplication

@josdejong
Copy link
Owner

josdejong commented Feb 3, 2018

I think you're right with your last three rules. We may have to also catch symbol / symbol symbol, to handle a case like 8.314 J / mol K , what do you think?

The reason my first implementation of rule 2 failed was that it was being applied in instances where implicit multiplication wasn't even happening, such as 8.4 / 2 / 2. The parser would match 2 / 2 first since it looks like [integer] / [integer].

In that Solution 2 proposal my idea was that implicit multiplication would never have higher precedence except in the special case of number / number symbol. I haven't deeply thought through how this would work out but it may be an interesting experiment.

I haven't yet looked into the code on how to implement such a "lookahead", but it may be as easy as adding a check for our pattern number / number symbol at the point where we create the OperatorNode with implicit multiplication (here in the code), and if so, replace the already created left hand side node (which is an operator node with a division) with a changed one.

@ericman314
Copy link
Collaborator

ericman314 commented Feb 3, 2018

I've actually just finished implementing it, with implicit multiplication always having higher precedence except for the pattern number / number symbol, in which the / is raised above implicit multiplication. I was testing out all the examples we've seen so far in this discussion. I am extremely pleased with the results:

2 / 3 x = (2 / 3) x
2.5 / 5 kg = (2.5 / 5) kg
2.5 / 5 x y = ((2.5 / 5) x) y
2 x / 5 y = (2 x) / (5 y)
x 2.5 / 5 y = (x (2.5 / 5)) y
17 h / 1 h = (17 h) / (1 h)
1 / 2 x = (1 / 2) x
1 / 2 * x = (1 / 2) * x
1 / 2 x y = ((1 / 2) x) y
1 / 2 (x y) = (1 / 2) (x y)
1 / 2x * y = ((1 / 2) x) * y
y / 2 x = y / (2 x)
y / 2 * x = (y / 2) * x
y / 2 x w = y / ((2 x) w)
y / v x w = y / ((v x) w)
1 h / (1+1) h = (1 h) / ((1 + 1) h)
4 lb + 1/2 lb = (4 lb) + ((1 / 2) lb)
4 lb + 1 lb^2 / 2 lb = (4 lb) + ((1 (lb ^ 2)) / (2 lb))
1 lb^-1 + 1 / (2 lb) = (1 (lb ^ (-1))) + (1 / (2 lb))
1 lb^-1 + 1 / 2 lb = (1 (lb ^ (-1))) + ((1 / 2) lb)
1 lb^-1 + 1 lb / 2 lb^2 = (1 (lb ^ (-1))) + ((1 lb) / (2 (lb ^ 2)))
1 m/s^2 + 1 m / 2 s^2 = ((1 m) / (s ^ 2)) + ((1 m) / (2 (s ^ 2)))
1 m/s^2 + m / 2 s^2 = ((1 m) / (s ^ 2)) + (m / (2 (s ^ 2)))
8.4 / 2 / 2 = (8.4 / 2) / 2

And my personal favorite also works:

8.314 J/mol K = (8.314 J) / (mol K)

The only one that doesn't match your suggestion is x 2.5 / 5 y, but that's such an odd way to write it anyway; usually constants are written before symbols. I also added number / number ( as a second pattern on which to apply rule 2, so that 1 / 2 (x y) and 1 / 2 x y would be equivalent.

If you're OK with this behavior I'll go ahead and write the tests and edit the docs.

Edit: Here is the diff if you're interested in seeing how I did the look ahead.

@josdejong
Copy link
Owner

josdejong commented Feb 3, 2018

that. is. awesome!

The example x 2.5 / 5 y is indeed an odd one, we shouldn't worry about that.

How does 2 1/2 x evaluate? I think that will be (2 * (1/2)) * x right?

@josdejong
Copy link
Owner

About unit testing, I was just thinking, we should rewrite tests like these:

assert.equal(parseAndEval('1/2a', {a:2}), 1);

with something like:

assert.equal(parseAndStringifyWithParens('1/2a'), '(1 / 2) a');

that's much easier to understand instead of having to calculate and compare the result.

@ericman314
Copy link
Collaborator

number number isn't supported at all currently, so 2 1/2 x gives `SyntaxError: Unexpected part "1" (char 3)

Agreed. I'll get started on the tests. Also, when stringifying a parsed tree, do we need to alter when parentheses are displayed to account for the new precedence?

@josdejong
Copy link
Owner

number number isn't supported at all currently, so 2 1/2 x gives `SyntaxError: Unexpected part "1" (char 3)

ah, of course! That's good. (We could later on implement support for implicit addition in a case like 2 1/2 but no need now.

when stringifying a parsed tree, do we need to alter when parentheses are displayed to account for the new precedence?

I think we can keep it as it is, we have the option toString({parenthesis: 'all'}) to get the unambiguous output.

I've gone through your list with examples again, I'm really happy with this behavior, it all looks logical to me! I'm very curious to your implementation.

There will always some weird edge cases, but for all "typical" inputs it looks great. As for the edge cases: that's the risk of working with implicit multiplication. These edge cases are not a "new" problem, we already had this problem, I just think we made the problem smaller. The predictability is lower because we have more complicated rules now, but I think that weights off against better handling most typical use cases.

@ericman314
Copy link
Collaborator

Ouch, there's 590 instances of parseAndEval in parse.test.js. I'll just start with the new tests and go from there.

@josdejong
Copy link
Owner

josdejong commented Feb 3, 2018

No need to replace all, in many cases parseAndEval is the right one, I think we should only replace the tests in it('should parse implicit multiplication', ...), and the new tests you're about to write

@ericman314
Copy link
Collaborator

If you're curious, here's my version of parse.js that contains the implementation.

@ericman314
Copy link
Collaborator

I've submitted the PR but couldn't figure out how to leave package-lock.json alone.

@josdejong
Copy link
Owner

Great! going to review your work immediately.

yeah, package.lock can easily give huge changes or merge conflicts though it looks like it improved with the latest versions of npm. No problem!

@josdejong
Copy link
Owner

Implemented via #1036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants