Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added multiple term search functionality (with default phrase search) #5579

Merged
merged 1 commit into from
May 27, 2016

Conversation

jazzy-em
Copy link
Contributor

@jazzy-em jazzy-em commented Dec 23, 2014

This pull-request is very similar to #5496
Change allows:

  • in addition to the standard (phrase) search adds functionality to search multiple words separated by a space (multiple terms search)
  • define the word (or word list) to search in the url hash tag #search. When page has been loaded, document is scrolling to the search results.
  • specify the type of search (phrase or multiple term search) in the url hash tag #phrase (true or absent hash tag - default phrase search, false - multiple term search)
  • specify the type of search (phrase or multiple term search) in UI (checkbox 'Phrase', by default this is checked).

pdfjspull

Examples:
find phrase 'Locking tames tamed tame':

search=Locking%20tames%20tamed%20tame&phrase=true

or

search=Locking%20tames%20tamed%20tame

find multiple term search 'Locking tames tamed tame':

search=Locking%20tames%20tamed%20tame&phrase=false

Screenshots:
pdfjspull-phrase

pdfjspull-multiple


This change is Reviewable

@timvandermeij
Copy link
Contributor

What exactly are the differences with #5496? Good to know so we can compare them.

@jazzy-em
Copy link
Contributor Author

Disadvantages of #5496:

  • Requires '&' in the parameters of url. Multiple search does not work on 'viewer.html?search=up%20locking', requires 'viewer.html?&search=up%20locking'
  • Multiple term search in Ability to Highlight Multiple Search Terms #5220 #5496 does not work correctly. Try searching in a test document 'search=up%20locking' and 'viewer.html?&search=locking%20up'.
    m-up-locking
    m-locking-up
    The results are different, this is wrong.
  • When resize page the search results highlight breaks.
  • Incomplete integration into an existing search engine:
    • Multiterm search only runs from the url, there is no way to multiterm search through UI
    • There is no navigation on multiterm search results (back-forward buttons dont work)
    • There is no ability to case-sensitive multiterm search
    • There is no ability to turn off/turn on 'highlight all' for multiterm search results

@jazzy-em
Copy link
Contributor Author

Advantages of my solution:

  • Fixes Ability to Highlight Multiple Search Terms #5220 #5496 disadvantages listed above
  • Do not break default behavior of PDF.js search
  • IMHO, my solution is more appropriate to existing code
  • There is no new entities / classes
  • There is no SetTimeout's :)
  • There is no new global variables

@@ -6,3 +6,4 @@ tags
Makefile
node_modules/
examples/node/svgdump/
.idea
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this change.

@timvandermeij
Copy link
Contributor

That is really good to know. Your solution looks pretty clean. I have added some comments to be addressed for this PR. Thanks!

@jazzy-em
Copy link
Contributor Author

All defects fixed.

@timvandermeij
Copy link
Contributor

Really nice. Could you also squash the commits into one commit so it can be reviewed more easily? See https://github.com/mozilla/pdf.js/wiki/Squashing-Commits on how to do this.

@jazzy-em
Copy link
Contributor Author

Done

PDFViewerApplication.findBar.findField.value = hashParams['search'];
PDFViewerApplication.findBar.dispatchEvent('again', false);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this code be moved to the PDFViewerApplication.setHash?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides moving this into setHash, I also think that the if's needs to be enclosed in !PDFViewerApplication.supportsIntegratedFind check. Otherwise this will result in strange behaviour in the Firefox versions of PDF.js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurydelendik, @Snuffleupagus, yes, you are right. But in some cases requires to use PDF.js search toolbar despite the fact that the built-in search is also available.
For example, PDF.js is used in iframe as part of the third-party applications. It is very important for our application.
What the best way to do it? Maybe we can use the new hash tag #disableIntegratedFind and set supportsIntegratedFind in false, if #disableIntegratedFind = true?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I completely understand your questions/concerns here!?

Regarding moving the code into setHash: that shouldn't be an issue as far as I can tell, so please try moving this code there instead.

But in some cases requires to use PDF.js search toolbar despite the fact that the built-in search is also available.

Please note that there are some search related features that, for security reasons, are currently disabled in the Firefox versions of PDF.js. So for now, I personally don't see why the feature in this PR would be any different. See e.g. viewer.js#L711-L713.

For example, PDF.js is used in iframe as part of the third-party applications.

Please note that supportsIntegratedFind is only ever true in the Firefox versions of PDF.js, and only when the viewer is used standalone.
In all other versions of the viewer, and certainly when it's in an iframe, supportsIntegratedFind === false, so that shouldn't be an issue. See viewer.js#L326-L337.

It is very important for our application.

I understand that, but showing the PDFFindBar in the Firefox versions of PDF.js could be confusing for users given that it's normally not used there.
(The main use case for the viewer is, after all, to be the user interface of PDF.js for Firefox.)

What the best way to do it? Maybe we can use the new hash tag #disableIntegratedFind and set supportsIntegratedFind in false, if #disableIntegratedFind = true?

Letting hash parameters enable/disable functions in PDF.js is a bad practice, which we are actively moving away from.
Given my answers above, I don't think that you actually need a way to toggle this feature, so I (currently) see no reason to add a parameter to the API either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. This feature is not needed.

@jazzy-em
Copy link
Contributor Author

All suggestions have been implemented

if (!this.supportsIntegratedFind) {
if ('phrase' in params) {
this.findBar.phraseSearch.checked =
(params['phrase'] === 'true');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should fit on the previous line, please move it there instead.

@Snuffleupagus
Copy link
Collaborator

I've added a couple of small comments about the coding style. Once you've addressed them, please squash the commits.

@jazzy-em
Copy link
Contributor Author

@D4FR3NCH We reduced scope of this PR by removing UI changes for speeding UP the landing of this PR. Multi term search works, but it's hard to see it :) UI changes will be implemented in next PR.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented May 26, 2016

Based on e.g. the #5579 (comment) by @yurydelendik, I'd expect that this patch does not change the default find behaviour in the PDF.js find bar/Firefox integrated find, but rather only adds support for the #search hash parameter.

However, as is, this patch is changing the default find behaviour in PDF.js:
STR

  1. Open the preview, http://107.21.233.14:8877/5b9fd62b585f13a/web/viewer.html.
  2. Open the find bar, Ctrl+F.
  3. Enter dynamic languages as the search term.

AR:
Either dynamic or languages are found and highlighted.

ER:
Only dynamic languages is found, since this is the current behaviour in PDF.js. Compare this patch with the current master: http://mozilla.github.io/pdf.js/web/viewer.html.


Even if the behaviour of the #search hash parameter is to treat the input as a word list, according to http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf#page=7, I'm not sure why that should also affect the default find functionality in PDF.js!?

Wouldn't it make more sense to keep the current behaviour for the PDF.js find bar/Firefox find integration? E.g. by instead setting phraseSearch: true, in web web/pdf_find_bar.js and web/firefoxcom.js?

@jazzy-em
Copy link
Contributor Author

@Snuffleupagus @yurydelendik What should be the behavior in this case (described by @Snuffleupagus)?
If there is no #search parameter, use the default phrase search. If there is #search in url, use the multi term search?
Or always use phrase search by default?

@yurydelendik
Copy link
Contributor

What should be the behavior in this case (described by @Snuffleupagus)?

The current find behavior shall not change (with or without #search= parameter), so it shall use phaseSearch=true. For #search=, we shall "invoke", highlight all, ignore case, non-phase search on start, and reset query and params on first find operation.

, I tested at [http://107.21.233.14:8877/5b9fd62b585f13a/web/viewer.html#search=just%20dynamic]

The #search= has weird syntax #search="word1 word2" -- notice quotes and space. We can automatically trim start/end quotes if present to make it compatible with the spec. Handling invalid queries such as #search=just%20dynamic or #search=just+dynamic is out of scope of this PR (it's fine if they work).

@yurydelendik
Copy link
Contributor

Few more changes, I think these will be the last. Thanks for addressing the comments fast!

Previously, yurydelendik (Yury Delendik) wrote…

What should be the behavior in this case (described by @Snuffleupagus)?

The current find behavior shall not change (with or without #search= parameter), so it shall use phaseSearch=true. For #search=, we shall "invoke", highlight all, ignore case, non-phase search on start, and reset query and params on first find operation.

, I tested at [http://107.21.233.14:8877/5b9fd62b585f13a/web/viewer.html#search=just%20dynamic]

The #search= has weird syntax #search="word1 word2" -- notice quotes and space. We can automatically trim start/end quotes if present to make it compatible with the spec. Handling invalid queries such as #search=just%20dynamic or #search=just+dynamic is out of scope of this PR (it's fine if they work).


Reviewed 2 of 4 files at r2, 1 of 1 files at r3.
Review status: all files reviewed at latest revision, 4 unresolved discussions.


web/app.js, line 1920 [r3] (raw file):

  PDFViewerApplication.findController.executeCommand('find', {
    query: e.query,
    phraseSearch: e.phraseSearch,

caseSensitive: false, highlightAll: true, findPrevious: false for webViewerFindFromUrlHash case


web/firefoxcom.js, line 166 [r3] (raw file):

      type: evt.type.substring('find'.length),
      query: evt.detail.query,
      phraseSearch: !!evt.detail.phraseSearch,

phaseSearch: true,


web/pdf_find_bar.js, line 112 [r3] (raw file):

        query: this.findField.value,
        caseSensitive: this.caseSensitive.checked,
        phraseSearch: false,

phraseSearch: true,


web/pdf_link_service.js, line 200 [r3] (raw file):

          this.eventBus.dispatch('findFromUrlHash', {
            source: this,
            query: params['search'],

Preprocess to remove start and end quotes. Will just .replace(/"/g, '') be enough?


Comments from Reviewable

@jazzy-em
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 4 unresolved discussions.


web/app.js, line 1920 [r3] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

caseSensitive: false, highlightAll: true, findPrevious: false for webViewerFindFromUrlHash case

Done.

web/firefoxcom.js, line 166 [r3] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

phaseSearch: true,

Done.

web/pdf_find_bar.js, line 112 [r3] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

phraseSearch: true,

Done.

web/pdf_link_service.js, line 200 [r3] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

Preprocess to remove start and end quotes. Will just .replace(/"/g, '') be enough?

It removes all quotes, but I think it's ok. Done.

Comments from Reviewable

@yurydelendik
Copy link
Contributor

/botio-windows preview

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_preview from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/d18363a9b2ee6f8/output.txt

@@ -1251,6 +1251,7 @@ var PDFViewerApplication = {
eventBus.on('rotateccw', webViewerRotateCcw);
eventBus.on('documentproperties', webViewerDocumentProperties);
eventBus.on('find', webViewerFind);
eventBus.on('findFromUrlHash', webViewerFindFromUrlHash);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please change the event name to lower-case, so that it's consistent with all the existing ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@yurydelendik
Copy link
Contributor

Reviewed 4 of 4 files at r4, 2 of 2 files at r5.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@yurydelendik yurydelendik merged commit 5aefce6 into mozilla:master May 27, 2016
@yurydelendik
Copy link
Contributor

Thank you for the patch.

@timvandermeij
Copy link
Contributor

Nice work!

@jazzy-em
Copy link
Contributor Author

Cool, thanks!

@summerisgone
Copy link

Wow, I thought that would never happen 👍

@dasilvacontin
Copy link

dasilvacontin commented Jun 3, 2016

Weird, I'm trying http://mozilla.github.io/pdf.js/web/viewer.html#search=Java and it rarely loads up the query + scroll to first match.

It's like it only works the first time, and then if I reload it doesn't work anymore. Maybe these changes are not already available in that URL?

@jazzy-em
Copy link
Contributor Author

jazzy-em commented Jun 3, 2016

then if I reload

If URL hash doesn't change (if your press F5 or reload page it doesn't change), in IE and Chrome PDFLinkService_setHash method doesn't run. In Firefox it works fine.
Looks like this is base code bug (if this is a bug), because with other URL hash parameters there is same case. Try to open http://mozilla.github.io/pdf.js/web/viewer.html#page=3, scroll to other page and then reload page.

@dasilvacontin
Copy link

Looks like this is base code bug (if this is a bug), because with other URL hash parameters there is same case. Try to open http://mozilla.github.io/pdf.js/web/viewer.html#page=3, scroll to other page and then reload page.

Yep, works only first time, doesn't on reload. It scrolls back to where I was.

@mainseq
Copy link

mainseq commented Jun 3, 2016

I believe you may have to fire the first event with inline script or a setTimeout on refresh. We run into this problem with other code and apps quite frequently. The next pages will highlight fine.

@yurydelendik
Copy link
Contributor

It's like it only works the first time, and then if I reload it doesn't work anymore

Looks like this problem is not related to this PR, so discussion is somewhat off-topic. @dasilvacontin, could you create a separate issue, and provide all details such as your configuration and steps to reproduce? (Personally I'm confused about "first time" and "if I reload it" terms)

I believe you may have to fire the first event with inline script or a setTimeout on refresh.

The same here "inline script" or "setTimeout on refresh" -- not sure what does it mean in PDF.js context but it sounds like "magic". BTW, inline scripts are not used for security reason and setTimeout is bad practice in async apps.

@mainseq
Copy link

mainseq commented Jun 24, 2016

I am trying to search tracemonkey for "Dynamic languages", "Google Docs", and JavaScript at the same time. How would I construct my URL? This used to work in #5496. Could someone post an example? Thanks.

@timvandermeij
Copy link
Contributor

This PR has been merged a month ago and is quite big now in terms of comments, so please use IRC/the mailing list for questions and the issue tracker for actual problems regarding this PR.

@mozilla mozilla locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.