Fix for #2068: better QuickOpen heuristics #2462

dangoor · 2013-01-02T17:10:04Z

The new heuristics don't relate so much to longest substring as they do to trying to find contiguous matches around characters in "special" positions in the string.

this is just a checkpoint. my current plan is to remove that function because it slows things down too much

Created a new QuickOpen matching algorithm that searches for matches among "special characters" (path markers, camelCase changes) and does so left-to-right rather than right-to-left as in the old algorithm. The new algorithm does still give an added bonus to matches that occur within the filename, which is checked first. Also, there is now a collection of tests to try out the QuickOpen logic and ensure that it is working sanely. The scores look pretty sane now. In this commit, I added some code to help debug scores and was able to get the scores to be pretty sane and the results look quite nice. I also removed the dead code. fix a display bug when there is no query added comments for new QuickOpen algorithm. Note that in the process I spotted a bug and added a failing test, which I have not had a chance to fix yet. partial fix for a bug in the new quickopen logic fixed the bug with strings that are longer than the final segment fix an off-by-one problem that left an initial character match out of the last segment

dangoor · 2013-01-02T17:11:23Z

Adding a link for convenience: Fix for #2068

peterflynn · 2013-01-05T02:01:07Z

src/search/QuickOpen.js

@@ -22,7 +22,8 @@
 */

 /*jslint vars: true, plusplus: true, devel: true, nomen: true, indent: 4, maxerr: 50 */
-/*global define, $, window, setTimeout */
+/*global define, $, window, setTimeout, ArrayBuffer, Int8Array */
+/*unittests: QuickOpen */


Out of curiosity -- is "unittests" a machine-read annotation of some sort, or just a shorthand you're using?

It is actually machine read... http://www.blueskyonmars.com/2013/01/02/test-driving-brackets/

Oh, nice! So when's the follow-up pull request to add these notations to all our other JS files? :-)

heh. We just need to separate out the slow running tests from the fast running ones, and then it's a piece of cake.

I'll note that I just fired up Brackets with QuickOpen.js open, hit command-P and it didn't actually run the tests... so it looks like my extension is not working as it should. It does rerun the last run tests, which is handy but not the same as actually reading which tests the file says should be run.

peterflynn · 2013-01-08T01:55:46Z

Almost done reviewing -- will wrap up later tonight.

peterflynn · 2013-01-08T05:56:49Z

src/search/QuickOpen.js

+
+            // a bonus is given for characters that match at the beginning
+            // of the filename
+            if (c === 0 && (strCounter > lastSegmentStart)) {


Since c is always strCounter - 1 given the current pair of calls, could this be simplified to just if (c === lastSegmentStart)? I'm a little thrown by the === 0 though -- is the intent for this bonus to apply only during the lastSegmentSearch() phase, and not in other cases where a char matches the filename start?

This should just be c === lastSegmentStart. Not quite sure how I ended up with something so convoluted.

peterflynn · 2013-01-16T00:26:14Z

src/utils/StringMatch.js

+            }
+
+            // if we've finished the query, or we haven't finished the query but we have no
+            // more backtracking we can do, then we're all done searching.


I found this comment a little confusing since it's in the opposite order as the two clauses in the if. Worth rewording (or reordering the code, conversely)?

good point. I flipped the code.

peterflynn · 2013-01-16T01:45:49Z

Still trying to wrap my head around deadBranches and backtrackTo a little better. Will be back online later tonight to continue reviewing...

dangoor · 2013-01-16T02:23:09Z

I wonder if I can come up with some ASCII art to make them clearer. The key is that the normal pattern prefers the special characters, but in so doing it skips over possible matches. So, I used backtrackTo to keep taking us back one special character at a time (while pulling off every other character that may have matched in between), and then we take it forward consecutively from there. We didn't get a match in the specials, but maybe there's one lurking between those last two specials we checked.

But, a problem remains: what happens if we match a consecutive character, then switch back to hitting specials and then hit the end of the string again without a complete match? That's where deadBranches comes in. At the time we set backtrackTo, we also have positively identified that the part of the query after queryCounter does not appear after that special character. If we find ourselves heading down that path again, we need to stop looking for specials because we're not going to find what we're looking for. Without deadBranches, it would keep trying to match up the specials. I did try a simpler approach where it just keeps track of the highest special it should scan to, but it turned out that that approach would actually not do specials scanning when it should.

This is akin to dynamic programming, if not exactly that. stringMatch is not exactly a generalized routine... we have some idea how it's used, and I tried to match the algorithm to that.

There were also a couple of minor code changes (variable renames and such) but no algorithmic changes.

dangoor · 2013-01-16T20:31:14Z

@peterflynn and I worked through an example on IRC of how the backtracking works and I added that example in the doc for _generateMatchList (in commit 112b206) in hopes that how the matching works will be a bit clearer.

peterflynn · 2013-01-16T23:14:56Z

Ok, I think it's all starting to make sense to me :-)

Here are a few things I think we might want to capture in the docs in some way:
(note: I haven't yet read through the big block comment added in 112b206, so apologies if some of this is already in there)

We only backtrack() when we're exhausted both special AND normal forward searches past that point, for the query remainder we currently have. For a different query remainder, we may well get further along - hence deadBranches[] being dependent on queryCounter; but in order to get a different query remainder, we must give up one or more current matches by backtracking.
Normal "any char" forward search is a superset of special matching mode -- anything that would have been matched in special mode could also be matched by normal mode
backtrack() always goes at least as far back as str[backtrackTo-1] before allowing forward searching to resume
When deadBranches[qi] = si it means if we're still trying to match queryStr[qi] and we get to str[si], there's no way we can match the remainer of queryStr with the remainder of str -- either using specials-only or full any-char matching.
We know this because deadBranches[] is set in backtrack(), and we don't get to backtrack() unless either:
1. We've already exhausted both special AND normal forward searches past that point
  (i.e. backtrack() due to strCounter >= str.length, yet queryCounter < query.length)
2. We stopped searching further forward due to a previously set deadBranches[] value
  (i.e. backtrack() due to strCounter > deadBranches[queryCounter], yet queryCounter < query.length)

Hopefully that is all correct! If not lmk...

peterflynn · 2013-01-16T23:15:15Z

Also needs a (hopefully trivial) merge with master before this is mergeable

dangoor · 2013-01-17T02:15:52Z

Yes, that is how it all works. It's not very important, but there is one minor adjustment. You say:

Normal "any char" forward search is a superset of special matching mode -- anything that would have been matched in special mode could also be matched by normal mode

could be, because those special characters would be traversed, but it does not happen. It doesn't happen because the specials are traversed first and only after that fails does it resort to matching the normal characters. So, as it progresses forward character-by-character it will compare a special if it hits one, but it won't be a match because it had already compared it.

Otherwise, everything you said is spot on. Maybe I'll just add your points in directly, because I think that someone else's interpretation of that code (plus the comments that I already have there) can only help someone new who's approaching it for the first time.

And, yes, the merge is trivial. I did that merge when I created the branch for the performance improvement yesterday.

comment.

Conflicts: src/search/QuickOpen.js

peterflynn · 2013-01-17T04:09:55Z

Hmm, ok I just ran across another case where it seems to fail to find a match: open StringMatch.js and search for "_computerangesa" (or any longer substring of _computeRangesAndScore) -- it won't show any results. If I take out the "s" ("_computeRangeAndScore") then it matches.

peterflynn · 2013-01-17T04:12:08Z

Also one case of funny scoring: if I search for "jsutil," it ranks JSLintUtils.js above JSUtils.js. It seems like both the longer contiguous match and the preference for shorter strings should work in JSUtils' favor, so I'm not sure what's happening...

I tried turning on DEBUG_SCORES, but I'm only seeing the total number in the Quick Open results list -- not the breakdown of individual scoring attributes. Is there more to it than just changing the initializer from false to true?

dangoor · 2013-01-17T13:02:43Z

I'll look into the "_computerangesa" case and the scoring. Scoring is easy to tweak and will never be 100% perfect all the time. The current algorithm counts uppercase letters after lowercase ones as special (the camelCase pattern), so JSLintUtils.js has an extra special than JSUtils.js. I'll see if there's something that makes sense to tweak.

When DEBUG_SCORES is on, hover over the score to see the individual parts (it's a tooltip).

The problem was that backtrackTo was causing backtracking to go too far back for the "s", because it had already backtracked to the "r" previously (when it hit the "g" in the query). backtrackTo was the original mechanism I used in backtracking before adding deadBranches. It turns out that backtracking really needs to go back before deadBranches[queryCounter], because where we need to backtrack to depends on where we are in the query.

@peterflynn

@peterflynn noted that "jsutil" matched "JSLintUtils.js" over "JSUtils.js". This change gives a significant boost to consecutive matches that started on a special character. I also boosted specials a little more to balance out specials vs. consecutive matches.

njx · 2013-01-17T15:55:40Z

I haven't been following the algorithm in detail, but I wonder if we should treat all uppercase letters as "special", not just ones after lowercase letters, precisely because of cases like "JSUtils", where conceptually the "U" really is the start of a "word", and arguably the "S" is too (because it's part of an acronymish abbreviation standing in for a word). I think most contiguous strings of uppercase letters in identifier names tend to be abbreviations like this.

dangoor · 2013-01-17T16:11:14Z

@njx that's certainly an option (which I considered just now, but found another way to accomplish the same thing). I'm guessing that we'll turn the knobs a bit on the scoring parameters over time to improve how the matches feel (because it's pretty subjective). I found a different tweak this morning that I think will work out nicely (see commit f325030 if interested).

@peterflynn good catch on the _computerangesa search. You had asked at one point if there was a need for backtrackTo to be able to move forward. I couldn't think of a case, but this problem was actually an instance of that very problem. As I said in the commit message, backtrackTo was my starting point for the backtracking, but it turned out that it wasn't necessary and was even harmful.

So, the fix here actually made things a bit simpler. When we determine that past point X in the string, the last Y parts of the query can't find a match, we just keep track of that. We won't go hunting beyond that point for a match for that part of the query, and if we need to backtrack we'll make sure that we rewind to the previous special before point X. deadBranches is all of the bookkeeping we need.

peterflynn · 2013-01-18T00:05:30Z

src/utils/StringMatch.js

+        closeRangeGap(str.length);
+
+        // shorter strings that match are often better than longer ones
+        var lengthPenalty = -1 * Math.round(str.length * DEDUCTION_FOR_LENGTH);


I don't think we should bother fixing this right now, but I just noticed that rounding combined with DEDUCTION_FOR_LENGTH < 1 means we don't discriminate between strings that are only 1-2 chars different in length. I wonder if it'd be safe to just not bother rounding?

Yes, that's true, but I don't think 1-2 characters is a big deal (except maybe as a simple tiebreaker). I believe I added the rounding to make the scores more pleasant to look at when DEBUG_SCORES is on (not a valid reason, admittedly).

I actually just confirmed that I can eliminate the DEDUCTION_FOR_LENGTH entirely right now and the tests still pass. I think this parameter has become less important after the other scoring tweaks.

peterflynn · 2013-01-18T00:37:36Z

The code changes all look good. I'm just going to run on this branch for a while longer to make sure I don't hit any other cases that seem off -- and if not, I think we're good to merge!

dangoor · 2013-01-18T05:16:16Z

Awesome. Thanks again for digging into this one and sticking with it.

When this merges, I think I'll redo my pull request for the performance improvement. That code is really quite straightforward and I think we should get it in soon, even if not sprint 19.

peterflynn · 2013-01-18T21:47:59Z

Alright, played with it a bunch more all all still seems well -- time to merge!

Fix for #2068: better QuickOpen heuristics

This change broke a few extensions. Looking at the fix, it seems that all QuickOpen plugins will likely need these functions, so we may as well re-export them as we have been doing rather than requiring the extensions to all add another import. If we do decide to deprecate these later, we should do so with deprecation warnings (something we weren't doing when these were moved to the StringMatch module). Revert "Marked as deprecated by #2462 in Sprint 19" This reverts commit 49e0827.

dangoor added 3 commits December 31, 2012 22:23

add an initial QuickOpen test.

ba87bcc

added longestCommonSubstring to quickopen file.

4f66e1b

this is just a checkpoint. my current plan is to remove that function because it slows things down too much

ghost assigned peterflynn Jan 2, 2013

peterflynn reviewed Jan 5, 2013
View reviewed changes

peterflynn reviewed Jan 8, 2013
View reviewed changes

peterflynn reviewed Jan 16, 2013
View reviewed changes

mostly doc changes per review feedback.

112b206

There were also a couple of minor code changes (variable renames and such) but no algorithmic changes.

dangoor added 2 commits January 16, 2013 22:14

add notes on how the matching algorithm works, from @peterflynn's

bfb4fdf

comment.

Merge branch 'master' into dangoor/fix-2068

1bd19a5

Conflicts: src/search/QuickOpen.js

dangoor added 2 commits January 17, 2013 10:22

peterflynn reviewed Jan 18, 2013
View reviewed changes

peterflynn added a commit that referenced this pull request Jan 18, 2013

Merge pull request #2462 from adobe/dangoor/fix-2068

289489d

Fix for #2068: better QuickOpen heuristics

peterflynn merged commit 289489d into master Jan 18, 2013

peterflynn deleted the dangoor/fix-2068 branch January 18, 2013 21:48

core-ai-bot mentioned this pull request Aug 29, 2021

[CLOSED] Fix for #2068: better QuickOpen heuristics brackets-archive/bracketsIssues#2346

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for #2068: better QuickOpen heuristics #2462

Fix for #2068: better QuickOpen heuristics #2462

dangoor commented Jan 2, 2013

dangoor commented Jan 2, 2013

peterflynn Jan 5, 2013

dangoor Jan 5, 2013

peterflynn Jan 7, 2013

dangoor Jan 8, 2013

peterflynn commented Jan 8, 2013

peterflynn Jan 8, 2013

dangoor Jan 8, 2013

peterflynn Jan 16, 2013

dangoor Jan 16, 2013

peterflynn commented Jan 16, 2013

dangoor commented Jan 16, 2013

dangoor commented Jan 16, 2013

peterflynn commented Jan 16, 2013

peterflynn commented Jan 16, 2013

dangoor commented Jan 17, 2013

peterflynn commented Jan 17, 2013

peterflynn commented Jan 17, 2013

dangoor commented Jan 17, 2013

njx commented Jan 17, 2013

dangoor commented Jan 17, 2013

peterflynn Jan 18, 2013

dangoor Jan 18, 2013

peterflynn commented Jan 18, 2013

dangoor commented Jan 18, 2013

peterflynn commented Jan 18, 2013

Fix for #2068: better QuickOpen heuristics #2462

Fix for #2068: better QuickOpen heuristics #2462

Conversation

dangoor commented Jan 2, 2013

dangoor commented Jan 2, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterflynn commented Jan 8, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterflynn commented Jan 16, 2013

dangoor commented Jan 16, 2013

dangoor commented Jan 16, 2013

peterflynn commented Jan 16, 2013

peterflynn commented Jan 16, 2013

dangoor commented Jan 17, 2013

peterflynn commented Jan 17, 2013

peterflynn commented Jan 17, 2013

dangoor commented Jan 17, 2013

njx commented Jan 17, 2013

dangoor commented Jan 17, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterflynn commented Jan 18, 2013

dangoor commented Jan 18, 2013

peterflynn commented Jan 18, 2013