Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: exact dataset title match #3486

Closed
tiborsimko opened this issue Dec 5, 2023 · 2 comments · Fixed by #3498
Closed

search: exact dataset title match #3486

tiborsimko opened this issue Dec 5, 2023 · 2 comments · Fixed by #3498
Assignees

Comments

@tiborsimko
Copy link
Member

Current behaviour

Consider a user searching for a CMS dataset with a known dataset title such as /EG/Run2010A-Apr21ReReco-v1/AOD.
The user copies-pastes the dataset name from a paper into the search box and expects to find the record.

(1) In production, an invalid query s obtained: https://opendata.cern.ch/search?page=1&size=20&q=%2FEG%2FRun2010A-Apr21ReReco-v1%2FAOD

oops-prod

(2) We have improved the situation during the previous Invenio upgrade sprint in 2021, see #2930. Using master branch at commit 1b8cd22, the exact dataset title search works nicely: http://127.0.0.1:5000/search?q=%2FEG%2FRun2010A-Apr21ReReco-v1%2FAOD&l=list&order=asc&p=1&s=10&sort=bestmatch

oops-master-old

(3) However, the recent Invenio upgrade (#3450) made things not work again: https://opendata-dev.cern.ch/search?q=%2FEG%2FRun2010A-Apr21ReReco-v1%2FAOD&l=list&order=asc&p=1&s=10&sort=bestmatch

oops-master-new

Expected behaviour

We should find the given dataset when the user copies-pastes the exact dataset title as in 2 above. See remarks in #2930.

@psaiz
Copy link
Contributor

psaiz commented Dec 11, 2023

Assuming that we start from the fix of the AND, something like:

+++ b/cernopendata/config.py
@@ -215,7 +215,7 @@ RECORDS_REST_ENDPOINTS["recid"]["search_index"] = "records"
 def _query_parser_and(qstr=None):
     """Parser that uses the Q() with AND from search engine dsl."""
     if qstr:
-        return dsl.Q("query_string", query=qstr, default_operator="AND")
+        return dsl.Q("query_string", query=qstr.replace("/", "\\/"), default_operator="AND")
     return dsl.Q()

would do the trick.

Note that if the user escapes the characters, then the page will return no results.

Screenshot 2023-12-11 at 17 39 45
Screenshot 2023-12-11 at 17 39 36

@psaiz
Copy link
Contributor

psaiz commented Dec 13, 2023

I was considering several options:

  1. Escaping the /, as mentioned in the previous comment
  2. Changing the "query_string" by a "simple_query_string". Discarded this idea, since it would change some of the other semantics
  3. Do the standard query, and, if it fails, do one of the previous actions. Discarded this idea as well, since it would depend then on the number of / (queries with an even number of those are correct).

So, long story short, the escaping sounds to me as the best option. Incoming PR with that...

@psaiz psaiz self-assigned this Dec 13, 2023
@psaiz psaiz linked a pull request Dec 13, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants