Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping backslashes; blacklisting gene name recognition #272

Closed
wants to merge 2 commits into from

Conversation

jvwong
Copy link
Member

@jvwong jvwong commented Nov 30, 2017

  1. The backslashes have to be escaped due to 'pathway-commons' npm package escape of backslashes.
  2. The 'blacklist' was present in pathways-search but removed, but I put it back because hgnSymbols contains stupid gene symbols alternative names (e.g. 'cell')

@@ -16,15 +16,21 @@ const processPhrase = (phrase, collection) => {
'refseq'
];

// Do NOT recognize these gene symbols
const symbolNameBlackList = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not work without this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch - then for that file, line 15 should be AND rather than OR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -5,7 +5,7 @@ const getHGNCData = require('./hgnc');
const sanitize = (s) => {
// Escape (with '\'), to treat them literally, symbols, such as '*', ':', or space,
// which otherwise play special roles in a Lucene query string.
return s.replace(/([\!\*\+\-\&\|\(\)\[\]\{\}\^\~\?\:\/\\"\s])/g, '\\$1')
return s.replace(/([\!\*\+\-\&\|\(\)\[\]\{\}\^\~\?\:\/\\"\s])/g, '\\\\$1')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so could you give an example of what the query will look like after this?
the goal is that backslashes should be added to the query?

Copy link
Member Author

@jvwong jvwong Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For input 'mu-type opioid' the 'queries' would be the array:

[ '(name:mu\\\\-type\\\\ opioid) OR (name:*mu\\\\-type\\\\ opioid*) OR (name:*mu\\\\-type* AND name:*opioid*)',
  '(name:*mu\\\\-type* OR name:*opioid*)',
  'mu-type opioid' ]

which, in turn, the 'pathway-common' npm package search function code interprets as:
['(name:mu\\-type\\ opioid) OR (name:*mu\\-type\\ opioid*) OR (name:*mu\\-type* AND name:*opioid*)',
...
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the pathway-commons npm package does this already, should we just not use sanitize?

Does it work if you just make sanitize return the original string?

Copy link
Member

@IgorRodchenkov IgorRodchenkov Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with line 8 change - should stay '\\$1', which e.g., replaces:


- ' ' with '\ '
- '\' with '\\'
- ':' with '\:'
- '  ' with '\ \ '
- etc.

which was correct
(and pathway-commons js client should not do any other tricks but URL-encoding query parameters...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then it should be changed in the pathway-commons module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there is some weird behaviour where qs.stringify and the escapeLucene functions are both being run multiple times on the query... maybe it might make sense to move away from using the pathway-commons npm package.

Copy link
Member

@d2fong d2fong Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max and Igor's proposed new string:

s.replace(/([\/\\!*+-&":|()\[\]{}\^\~\?\s])/g, '\\$1')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manfred lib changes:

  • add escape spaces for escapeLucene
  • do not escapeLucene if toggle is undefined

https://github.com/PathwayCommons/cpath2-client/blob/master/src/search.js#L102

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app changes:

  • needs to send toggle to the lib to determine if the toggle should be set to true or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've (with @maxkfranz) got an idea; see PathwayCommons/cpath2-client#12.

@d2fong
Copy link
Member

d2fong commented Dec 2, 2017

I fixed the hgnc filtering bug here 5defbdd

@jvwong jvwong closed this Dec 7, 2017
@jvwong jvwong deleted the searchLucene branch December 8, 2017 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants