Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix not escaping special characters in search pattern #5938

Merged
merged 5 commits into from
Feb 14, 2020

Conversation

dawidm
Copy link
Contributor

@dawidm dawidm commented Feb 11, 2020

fixes #5892

The error when searching for "DOI 10.1210/endrev/bnz006" (or any phrases containing special javascript regular expression characters) is caused by not escaping special characters before passing search pattern to javascript script used for highlighting words in preview view.

fixes JabRef#5892

* add method to get search pattern for searched words with escaped
 javascript regexp special characters (for search without regular
 expressions)

* in preview viewer use search pattern with escaped javascript regexp
 special characters
@dawidm dawidm marked this pull request as ready for review February 11, 2020 23:13
/* Returns a regular expression pattern in the form (w1)|(w2)| ... wi are escaped if no regular expression search is enabled
* @param escapeSpecialCharsForJS whether to escape characters in wi for javascript regexp (escaping all special characters) or for java (using \Q and \E)
*/
private Optional<Pattern> joinWordsToPattern(boolean escapeSpecialCharsForJS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better is to use an enum here, eg.. EscapMode, with java and javascript as values:
https://www.teamten.com/lawrence/programming/prefer-enums-over-booleans.html

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! In general the fix looks good to me, just some minor code improvements!

@@ -18,6 +18,9 @@

public class SearchQuery implements SearchMatcher {

// regexp pattern for escaping special characters in javascript regex
public static final String JAVASCRIPT_ESCAPED_CHARS_PATTERN = "[\\.\\*\\+\\?\\^\\$\\{\\}\\(\\)\\|\\[\\]\\\\/]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Pattern.compile to gain some performace improvements here

@@ -133,7 +148,7 @@ public boolean isRegularExpression() {
// compile the words to a regular expression in the form (w1)|(w2)|(w3)
StringJoiner joiner = new StringJoiner(")|(", "(", ")");
for (String word : words) {
joiner.add(regularExpression ? word : Pattern.quote(word));
joiner.add(regularExpression ? word : (escapeSpecialCharsForJS ? word.replaceAll(JAVASCRIPT_ESCAPED_CHARS_PATTERN, "\\\\$0") : Pattern.quote(word)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract this to a regular if-else, it's easier to understand on the first look than this chained conditions

* use enum to specify special characters escape mode

* use compiled regex pattern instead of string
@dawidm dawidm requested a review from Siedlerchr February 12, 2020 14:11
Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me!
For external contributions we have the policy, that a second dev looks over your code before it's merged.

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Feb 12, 2020
Copy link
Member

@calixtus calixtus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got two comments, everything else looks good to me.
A test would be great...

if (regularExpression) {
joiner.add(word);
}
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't checkstyle complaining about putting this in two lines? Has no effect, but looks somewhat odd...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a mistake, I'll change it. Also looks odd to me. Checkstyle wasn't complaining.

}

// Returns a regular expression pattern in the form (w1)|(w2)| ... wi are escaped for javascript if no regular expression search is enabled
public Optional<Pattern> getJsPatternForWords() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, although "Js" is quite obvious, I would reword that to getJavaScriptPatternForWords, since you are creating a new Method just for the sole purpose to make its use more obvious in it's name. So why not go all the way?

public static final Pattern JAVASCRIPT_ESCAPED_CHARS_PATTERN = Pattern.compile("[\\.\\*\\+\\?\\^\\$\\{\\}\\(\\)\\|\\[\\]\\\\/]");

/**
* Metod for escaping special characters in regular expressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method

@dawidm
Copy link
Contributor Author

dawidm commented Feb 12, 2020

A test would be great...

Ok, I will write a test.

//first word contain all javascript special regex characters that should be escaped individually in text based search
String queryText = "([{\\\\^$|]})?*+./ word1 word2.";
SearchQuery textQueryWithSpecialChars = new SearchQuery(queryText, false, false);
String pattern = "(\\(\\[\\{\\\\\\^\\$\\|\\]\\}\\)\\?\\*\\+\\.\\/)|(word1)|(word2\\.)";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created tests but I'm not sure about readbility of this approach.

Copy link
Member

@Siedlerchr Siedlerchr Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this looks totally crazy with those escaping, but it's fine.

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for creating the tests as well!

@calixtus
Copy link
Member

I think this is really great as well! Thank you!

@Siedlerchr Siedlerchr merged commit ec93ad3 into JabRef:master Feb 14, 2020
Siedlerchr added a commit that referenced this pull request Feb 19, 2020
* upstream/master:
  followup fix
  Fixfetcher (#5948)
  Bump byte-buddy-parent from 1.10.7 to 1.10.8 (#5952)
  Added MenuButtons to IntegrityCheckDialog (#5955)
  Reimplement custom entry types dialog (#5799)
  Bump unirest-java from 3.4.03 to 3.5.00 (#5953)
  MySQL: Allow public key retrieval (#5909)
  Restructure and improve docs for setting up IntelliJ (#5960)
  Change syntax for Oracle multi-row insert SQL statement (#5837)
  Bump classgraph from 4.8.62 to 4.8.64 (#5954)
  Squashed 'src/main/resources/csl-styles/' changes from c531528..9e81857
  Fix not escaping special characters in search pattern (#5938)
Siedlerchr added a commit that referenced this pull request Mar 6, 2020
* upstream/master:
  Bump classgraph from 4.8.62 to 4.8.64 (#5954)
  Squashed 'src/main/resources/csl-styles/' changes from c531528..9e81857
  Fix not escaping special characters in search pattern (#5938)
  fix missing gson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Searching for a DOI results in an ERROR
4 participants