Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added a context-based matching algorithm #593

Merged
merged 13 commits into from
Jun 30, 2020

Conversation

connectdotz
Copy link
Collaborator

@connectdotz connectdotz commented Jun 1, 2020

This is a pretty significant change, I want to put it out there as early as possible so we can start the discussion and testing:

This PR replaced the "name" based test/assertion matching algorithm with the "context-based" algorithm.

why

We are all aware of many test/assertion matching issues due to static analysis (parsing) vs. runtime result, such as template-literal, jest.each etc. To resolve these types of issues are not trivial, and we have our share of trying...

This PR is to take a different approach with this challenge:

If we assume the source code parser and jest output are "correct", the problem is just to stitch them together, then we can simply match by "sequence"(self locations) plus "hierarchy" (describe/test relationship) to be safe. Lack of a better term, I called this "context". The simplest way to describe it is "assertion 1 should always match test block 1, assertion 2 matches test block 2..."

details

  • The sequence is determined by test or assertion file's own location info, so even if the source map didn't align, it will not matter. (though right now I did add an extra check to warn us if the name nor line do not match)
  • each file can be considered a tree of nodes, the container node is the describe block, tests are the leaf nodes. The context matching is isolated in each container level, so you might have one describe block mismatched while everything else can still be matched correctly.
  • when context mismatched we did fallback to simple plain name matching as a best-effort for most likely transient/rare situations, such as developers are developing the tests, test without expects, etc. (We can learn more from the real-world testing, so far my local test seems to confirm that hypothesis...)
  • as far as I know, this should close all of our open matching issues, such as template-literal, or even jest.each.
    • the way to resolve jest.each is to group assertions with the same line into a single "node", so they can be matched to the source test block like anything else.

testing

I have run with this plugin for a couple of days and are reasonably happy about it so far.

But it is new, so I did put in a lot more "checks" and "warnings" to let us known when things are not expected, we can fine-tune those later...

remaining tasks:

review

I know this is a pretty heavy-duty one, but the scope is quite concentrated, pretty much all in the match-by-context, (the rest are mainly to address eslint and new attributes) so it might not be as daunting...

Appreciate your time and I think the community will be greatly benefitted if this PR does work as well as I hoped... 🤞

btw, I fixed a coverage "cheat" that used to only include the files with tests in coverage calculation, now I added the whole src as the base, so the coverage % should be lower than before...


fix #570
fix #491
fix #427
fix #478
fix #405
fix #294
fix #281

@connectdotz connectdotz added this to the v4.0 milestone Jun 1, 2020
@connectdotz connectdotz mentioned this pull request Jun 1, 2020
15 tasks
@coveralls
Copy link

coveralls commented Jun 1, 2020

Pull Request Test Coverage Report for Build 797

  • 159 of 165 (96.36%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-2.5%) to 88.15%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/TestResults/match-node.ts 47 48 97.92%
src/TestResults/TestResultProvider.ts 14 16 87.5%
src/TestResults/match-by-context.ts 95 98 96.94%
Totals Coverage Status
Change from base Build 784: -2.5%
Covered Lines: 1033
Relevant Lines: 1150

💛 - Coveralls

@connectdotz
Copy link
Collaborator Author

@seanpoulter @stephtr jest-editor-support release is cut, the PR is buildable, you should be able to run and test it now...

(The test coverage is down, as I mentioned earlier, is expected, so don't worry about it.)

Not sure if you have started to look at this PR yet, most of the change is in match-by-context. Feel free to start even just a partial review, so we can move forward... I am planning to cut a v4 beta after this PR, since it is a significant change, the earlier we start live testing the better...

@connectdotz
Copy link
Collaborator Author

ping @seanpoulter @stephtr

Copy link
Member

@stephtr stephtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thank you for this large work!
I haven't tested it yet in practice nor did I yet have a look at the tests.

Most of my in-place comments are about style, which you don't have to consider.
However, I also noticed a few other things:

  • I'm not completely happy about the ContainerNode having a DataNodeType for the childData, if we use DataGroupNode only for the TestAssertionStatus but actively filter it for ItBlock.
  • One thing I was wondering about is whether it happens for both, TestAssertionStatus and ItBlock that there are ContainerNodes with both, childContainers and childData being populated?
  • Would it make sense to split up the match-by-context file into multiple files? For example one could outsource the ContainerTree. One could also separate the ContainerType into a tree container, which just only cares about the childContainers and then extend it in a class, which adds the childData.
  • For releasing a beta it would be fine as is. For the long term I would welcome more documentation in the form of comments, for example what's the functions' purpose.

src/TestResults/TestResultProvider.ts Show resolved Hide resolved
src/TestResults/TestResultProvider.ts Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/TestResultProvider.ts Show resolved Hide resolved
src/TestResults/match-by-context.ts Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
@connectdotz
Copy link
Collaborator Author

@stephtr thanks for reviewing this! Very helpful. I will address individual comments separately, here are some of your high-level comments:

One thing I was wondering about is whether it happens for both, TestAssertionStatus and ItBlock that there are ContainerNodes with both, childContainers and childData being populated?

The short answer is yes. Think about the container is the describe block, a describe block can contain both test block and other describe blocks.

Would it make sense to split up the match-by-context file into multiple files? For example one could outsource the ContainerTree. One could also separate the ContainerType into a tree container, which just only cares about the childContainers and then extend it in a class, which adds the childData.

I think this is related to the question above, if every container would contain both child container and data, then it might not make sense to separate them into different classes/files?

If we just want to reduce the module "size", we could split the class definition from the functions that use it? Although ContainerNode is really an internal class (I was going to make it private), i.e. used only by the match functions, not sure if we should split it just to reduce the size?

I'm not completely happy about the ContainerNode having a DataNodeType for the childData, if we use DataGroupNode only for the TestAssertionStatus but actively filter it for ItBlock.

The whole idea about using a generic container is that it does not care what the underlying data type is, only the "shape" of them... but I agree to always check if it is single vs group data node is tedious, let me see if I can encapsulate that better...

@connectdotz
Copy link
Collaborator Author

@stephtr made a lot of good suggestions, I went back to the code and did some major clean up, updated some of the old copy-paste code, and refactor the rest quite a bit. It was pretty embarrassing how bad the code was, hopefully, it is better now 🤞 .

Sorry took a while to get back to you, this is a pretty busy week, well, let's see if we can wrap this up over the weekend before we turn into pumpkins again...

@connectdotz connectdotz requested a review from stephtr June 20, 2020 02:24
@connectdotz
Copy link
Collaborator Author

hey, @stephtr @seanpoulter looks like you guys are also busy... how do you suggest we move forward with the beta?

@stephtr
Copy link
Member

stephtr commented Jun 25, 2020

Sorry for the delay. I'll either do it tomorrow evening or on Saturday.
Since the change is quite large, I would have liked seanpoulter to also take a look at it. However, since it is going to be a substantial improvement, I guess we can also wait for that after a beta has been released.

Copy link
Member

@stephtr stephtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many parts (for example the sort function) are now way easier to understand!

I think this is related to the question above, if every container would contain both child container and data, then it might not make sense to separate them into different classes/files?
If we just want to reduce the module "size", we could split the class definition from the functions that use it? Although ContainerNode is really an internal class (I was going to make it private), i.e. used only by the match functions, not sure if we should split it just to reduce the size?

I was thinking about splitting up the tree classes since the initial code felt quite complex to me. However with the refactoring I'm perfectly happy with them as they are.
Concerning separating the tree classes from the matching algorithm: Since the tree doesn't need to know anything about TestAssertionStatus and ItBlock and the file is already 362 lines long, I would prefer it. In my opinion that would also support other people in getting a better overview over the code.

By the way, since we are going to release a beta, should we make a separate branch, for either the v4 beta or the old, publicly available version? Just in case one wants to release a small bugfix to the old code.

src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
// the match algorithm: first match by sequence if their have the same structure;
// then fallback to simple name-based matching. Upon each test block, it invokes the
// callback to process the matched results.
const matchList = <N1 extends NodeType<ItBlock>, N2 extends NodeType<TestAssertionStatus>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a one-to-many relationship, I would still label the types and variables in a way that one can distinguish between code and assertions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean? you mean just the generic label?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only talking about names like N1/2 and n. Since the function obviously separates between ItBlock and TestAssertionStatus containers, I would also adapt the variable names to tell those two more easily apart.

src/TestResults/match-by-context.ts Outdated Show resolved Hide resolved
Comment on lines 58 to 59
const line = a?.line ?? (allowLocation && a.location?.line);
return line >= t.start.line && line <= t.end.line;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If allowLocation is false, line can evaluate to false, which in the process of comparison is converted to 0.
By the way, could you help me why we can't use a.location in some cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

could you help me why we can't use a.location in some cases?

when we only want to show the error line. the a.location is the location of the test block, not the "expect" that caused the error.

@connectdotz
Copy link
Collaborator Author

Concerning separating the tree classes from the matching algorithm: Since the tree doesn't need to know anything about TestAssertionStatus and ItBlock and the file is already 362 lines long, I would prefer it. In my opinion that would also support other people in getting a better overview over the code.

ok, I give in, after refactoring the matching functions, they do look substantial, fine, split them to 2 files now.

By the way, since we are going to release a beta, should we make a separate branch, for either the v4 beta or the old, publicly available version? Just in case one wants to release a small bugfix to the old code.

I thought about that, but consider it is easy enough for master patch off the tag, which we always do now for every release, we should be ok to release beta off the master head...


this is a good collaboration, @stephtr did a wonderful job in reviewing the code! (hey, we should have you do more code review!) I think the code is a lot better now than when I started. But I also realized that we can do refactoring and discuss styles forever... Sometimes style/readability is rather subjective and there are many ways to do the same thing, we should probably draw a line somewhere so we can move forward... we could definitely use the energy in other v4 tasks and beta testing/bug-fixes...

@connectdotz
Copy link
Collaborator Author

looks like @seanpoulter is busy. @stephtr you have spent quite a bit of time on this PR, judging you have not approved it, I take it there is some concrete concern that has not been addressed yet? If it's more along the line like "the change is big so something might not be quite right" kind of worry, I think the best way to tackle that is to put it to the real-world testing...

I am thinking to cut a beta either tonight or tomorrow night, do you object?

Copy link
Member

@stephtr stephtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, today either GitHub was down or I didn't have time to write an answer.
I wanted to write the response approving the PR in the evening, but the short answer is I'm fine with it as is.
Nice work 👍

@connectdotz connectdotz merged commit a33b727 into jest-community:master Jun 30, 2020
@connectdotz connectdotz deleted the context-based-matching branch June 30, 2020 02:22
legend1202 pushed a commit to legend1202/vscode-jest that referenced this pull request Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants