-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escaping backslashes; blacklisting gene name recognition #272
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ const getHGNCData = require('./hgnc'); | |
const sanitize = (s) => { | ||
// Escape (with '\'), to treat them literally, symbols, such as '*', ':', or space, | ||
// which otherwise play special roles in a Lucene query string. | ||
return s.replace(/([\!\*\+\-\&\|\(\)\[\]\{\}\^\~\?\:\/\\"\s])/g, '\\$1') | ||
return s.replace(/([\!\*\+\-\&\|\(\)\[\]\{\}\^\~\?\:\/\\"\s])/g, '\\\\$1') | ||
}; | ||
|
||
const processPhrase = (phrase, collection) => { | ||
|
@@ -16,15 +16,21 @@ const processPhrase = (phrase, collection) => { | |
'refseq' | ||
]; | ||
|
||
// Do NOT recognize these gene symbols | ||
const symbolNameBlackList = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blacklist is checked in here: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it not work without this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice catch - then for that file, line 15 should be AND rather than OR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
'CELL' | ||
]; | ||
|
||
const tokens = phrase.split(/\s+/g); | ||
|
||
return tokens | ||
.map(token => { | ||
//if symbol is recognized by at least one source | ||
const blackListed = symbolNameBlackList.indexOf(token.toUpperCase()) > -1; | ||
const recognized = sourceList.some(source => utilities.sourceCheck(source, token)) | ||
|| collection.has(token.toUpperCase()); | ||
const sanitized = sanitize(token); | ||
return recognized ? ( 'xrefid:' + sanitized ) : ( 'name:' + '*' + sanitized + '*' ); | ||
return !blackListed && recognized ? ( 'xrefid:' + sanitized ) : ( 'name:' + '*' + sanitized + '*' ); | ||
}); | ||
}; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so could you give an example of what the query will look like after this?
the goal is that backslashes should be added to the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For input 'mu-type opioid' the 'queries' would be the array:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the pathway-commons npm package does this already, should we just not use sanitize?
Does it work if you just make sanitize return the original string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with line 8 change - should stay
'\\$1'
, which e.g., replaces:which was correct
(and pathway-commons js client should not do any other tricks but URL-encoding query parameters...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then it should be changed in the pathway-commons module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe there is some weird behaviour where qs.stringify and the escapeLucene functions are both being run multiple times on the query... maybe it might make sense to move away from using the pathway-commons npm package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max and Igor's proposed new string:
s.replace(/([\/\\!*+-&":|()\[\]{}\^\~\?\s])/g, '\\$1')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manfred lib changes:
https://github.com/PathwayCommons/cpath2-client/blob/master/src/search.js#L102
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app changes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've (with @maxkfranz) got an idea; see PathwayCommons/cpath2-client#12.