Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchQueryParser: Add "OR" search operator #12061

Merged
merged 11 commits into from
Oct 6, 2023

Conversation

fwcd
Copy link
Member

@fwcd fwcd commented Oct 5, 2023

Fixes #8881

This adds | and (uppercase) OR as syntax for the OR operator in search queries. Thus queries such as house | disco or equivalently house OR disco will now yield the combined results from querying house and disco.

Both | and OR have strong precedent, e.g. with Google using these operators1. Especially | is terse, easy to read/write and familiar to technical users.

Footnotes

  1. https://ahrefs.com/blog/google-advanced-search-operators

@fwcd fwcd force-pushed the or-search-operator branch from 2b236ba to e3d9b14 Compare October 5, 2023 00:05
@fwcd fwcd force-pushed the or-search-operator branch from e3d9b14 to 4abc9e6 Compare October 5, 2023 00:09
Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Can you also ass a test for this?

src/library/searchqueryparser.cpp Outdated Show resolved Hide resolved
src/library/searchqueryparser.cpp Outdated Show resolved Hide resolved
@fwcd
Copy link
Member Author

fwcd commented Oct 5, 2023

I've updated the parser to handle quotes correctly (i.e. title:"some | title" | house, for example, will now behave as expected) and added some tests to verify that this works correctly.

@fwcd fwcd force-pushed the or-search-operator branch from 02912b6 to 93b96bc Compare October 5, 2023 11:43
@fwcd
Copy link
Member Author

fwcd commented Oct 5, 2023

I don't understand why the tests are failing in CI, they work on my machine™. Off all the tests, SearchQueryParserTest.EmptyOrOperator, whose implementation is rather trivial, seems to fail due to an interrupt. Any ideas?

@fwcd
Copy link
Member Author

fwcd commented Oct 5, 2023

Couldn't reproduce the test failure in an Arch Linux Docker container, neither on arm64 nor on x86_64.

@fwcd
Copy link
Member Author

fwcd commented Oct 5, 2023

Ah, I see, CI ran into a debug assert which I didn't get since I was building in RelWithDebInfo mode:

critical [0x55a1d4186d30] DEBUG ASSERT: "!m_nodes.empty()" in function virtual bool OrNode::match(const TrackPointer&) const at /home/runner/work/mixxx/mixxx/src/library/searchquery.cpp:127

// An empty OR node would always evaluate to false
// which is inconsistent with the generated SQL query!
VERIFY_OR_DEBUG_ASSERT(!m_nodes.empty()) {
// Evaluate to true even if the correct choice would
// be false to keep the evaluation consistent with
// the generated SQL query.
return true;
}

Might be an interesting question how we should handle this. Apart from the top-level AndNode, empty queries will simply not produce a node at all and queries such as | produce an (empty) OrNode. IMO an empty OrNode, even if it's a bit odd, shouldn't be a semantic error and simply not match anything. Could we update the SQL generation to reflect that?

@daschuer
Copy link
Member

daschuer commented Oct 5, 2023

I think we should consider the intermediate results when typing. I can imagine that showing nothing because of a semantic error feels bad. Is this related here? I have not tested. What happen if you type "ABBA |" ?

The other issue is that we should only assert impossible conditions. Sincea user can trigger it, it must not be an assertion.

@fwcd
Copy link
Member Author

fwcd commented Oct 5, 2023

What happen if you type "ABBA |"?

Empty children are currently filtered out, so this would be equivalent to simply typing ABBA:

for (const QString& rawAndNode : rawAndNodes) {
if (!rawAndNode.isEmpty()) {
pQuery->addNode(parseAndNode(rawAndNode));
}
}

The only real edge case where the user would end up with an empty OrNode would e.g. be |, | | etc. Probably a contrived example, but we wouldn't want to hit this debug assert at all, ideally.

I'd be fine with specifying the empty OR as meaning "match everything", even if it deviates from the common convention to define it as "match nothing" if that makes SQL generation more convenient and/or lines up with what the user expects (i.e. still showing all tracks, when the query is | and thus empty-ish).

The (logically sound) alternative would be to define the empty OR as matching nothing and emit FALSE as SQL. I'd probably prefer that, since it would only show up in edge cases anyway (see above) and be consistent with boolean logic elsewhere.

We should remove this debug assert either way. Wdyt?

@JoergAtGithub
Copy link
Member

we could even support both | and OR, which is what Google does

I think we should support both!

This is consistent with boolean logic and will generally only show up in
edge cases to the user (e.g. when the search query consists of an empty
OR operator, i.e. `|`)
@fwcd
Copy link
Member Author

fwcd commented Oct 5, 2023

@JoergAtGithub I've had a go at that, but parsing it seems to be less trivial than I would've hoped for, particularly the tests around literal or still fail: 3e70205

@fwcd
Copy link
Member Author

fwcd commented Oct 6, 2023

Both | and OR are now supported and I've added a bunch of tests to make sure that especially the latter is handled correctly with all of its edge cases (e.g. different casing or adjacent letters shouldn't cause it to be parsed as an operator).

Additionally, I have implemented the empty-OR-as-false interpretation, which essentially amounted to removing the debug assert and generating FALSE as SQL code if there are no child nodes. As explained in #12061 (comment) this is the logically sound way to handle this and, in practice, only affects edge cases anyway (so unless the user e.g. searches |, i.e. the empty OR operator, the results table isn't necessarily empty).

@fwcd fwcd force-pushed the or-search-operator branch from 885595c to c753ebe Compare October 6, 2023 00:40
@fwcd fwcd force-pushed the or-search-operator branch from c753ebe to adad086 Compare October 6, 2023 01:44
Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works like a charm now and the code LGTM as well.
Special thanks for the extra efforts in testing.

@daschuer daschuer merged commit dfadd3a into mixxxdj:main Oct 6, 2023
@daschuer daschuer added this to the 2.5.0 milestone Oct 6, 2023
@fwcd fwcd deleted the or-search-operator branch October 6, 2023 09:11
@Be-ing
Copy link
Contributor

Be-ing commented Oct 9, 2023

Thanks for implementing my 6 year old feature request.

@ronso0
Copy link
Member

ronso0 commented Dec 21, 2023

I'm wondering: i don't spot any changes to user-facing strings (tr), what was the reason to target 2.5 instead of 2.4?
Do you think it would be worth backporting it to 2.4?

@fwcd
Copy link
Member Author

fwcd commented Dec 21, 2023

My general philosophy was to target main for any non-trivial new features so we can focus on improving stability in 2.4 while giving these changes some time for testing. Since this isn't a huge change, I don't have strong feelings here, but it does change the parsing logic of search queries a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make a way to combine search query nodes with an OR relationship
5 participants