Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "" to filter for explicite empty fileds #1798

Merged
merged 15 commits into from
Dec 16, 2018
Merged

Conversation

daschuer
Copy link
Member

@daschuer daschuer commented Sep 2, 2018

This allows to filter all Tracks without a genre like that genre:""
It works for all text fields, including crate:""

https://bugs.launchpad.net/mixxx/+bug/1788086

@daschuer daschuer added this to the 2.3.0 milestone Sep 2, 2018
@daschuer
Copy link
Member Author

The Appveyor failure is unrelated to this PR.

@benis
Copy link
Contributor

benis commented Oct 1, 2018

I've tested this on Mac, where it appears to work as intended with no unintended side effects.

@daschuer
Copy link
Member Author

daschuer commented Oct 2, 2018

Thank you for testing.
Do you also have interest to review the code, to get this finally merged?

@benis
Copy link
Contributor

benis commented Oct 2, 2018

I would, but I don't think I know C++ or the syntax well enough for it to be a meaningful review.

bool NullTextFilterNode::match(const TrackPointer& pTrack) const {
for (const auto& sqlColumn: m_sqlColumns) {
QVariant value = getTrackValueForColumn(pTrack, sqlColumn);
if (!value.isValid() || !qVariantCanConvert<QString>(value)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition might cause that different columns are picked by match() and toSql(). toSql() simply picks the first column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. This is only relevant for "artist" which has two columns. So I think in our case if the "artist" column is empty or invalid it should be enough to match.

if (!value.isValid() || !qVariantCanConvert<QString>(value)) {
continue;
}
// only use the major coloumn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo


QString NullTextFilterNode::toSql() const {
for (const auto& sqlColumn: m_sqlColumns) {
// only use the major coloumn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

}

QString NullTextFilterNode::toSql() const {
for (const auto& sqlColumn: m_sqlColumns) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is confusing if you just want to pick the first column from a non-empty list.

}

QString NoCrateFilterNode::toSql() const {
return QString("id NOT IN (%1)").arg(CrateStorage::formatQueryForTrackIdsWithCrate());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRATETABLE_ID

@@ -153,8 +170,13 @@ void SearchQueryParser::parseTokens(QStringList tokens,
mixxx::track::io::key::ChromaticKey key =
KeyUtils::guessKeyFromText(argument);
if (key == mixxx::track::io::key::INVALID) {
pNode = std::make_unique<TextFilterNode>(
m_pTrackCollection->database(), m_fieldToSqlColumns[field], argument);
if (argument == "\"\"") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""" should become a constant

m_pCrateStorage->selectAllTracksSorted());

while (crateTracks.next()) {
m_matchingTrackIds.push_back(crateTracks.trackId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Track IDs might be pushed multiple times, which is not necessary. The result contains tuples of (track id, crate id) sorted by track id, but we only need distinct track ids here.

We can also think about adding a new query instead of post-filtering the result.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 3, 2018

Using "" for empty/missing numeric fields feels strange t first, but at least it is consistent. It might even be more intuitive for the common user, who does not consider the restrictions of a strongly typed programming language.

@daschuer
Copy link
Member Author

daschuer commented Oct 4, 2018

Done. Thank you for review.

@@ -16,6 +16,8 @@
#include "util/memory.h"
#include "library/crate/cratestorage.h"

const QString kExpliciteEmpty = "\"\""; // "" searches for an empty string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the constant is not a valid English term and needs to be explained. The comment is misleading, because it applies to all kind of fields and not only strings.

  • Maybe kMissingFieldSearchTerm or something else?
  • The SQL uses "IS NULL" which does not match fields with empty strings!? What is the desired result for string fields? I suggest to cover both NULL and empty strings with this filter.

@@ -91,6 +93,23 @@ class TextFilterNode : public QueryNode {
QString m_argument;
};

class NullTextFilterNode : public QueryNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is inappropriate if we want to filter for both NULL and empty strings.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 7, 2018

As expected empty string fields are not covered by this filter yet.

@daschuer
Copy link
Member Author

daschuer commented Oct 8, 2018

Done!

@uklotzde
Copy link
Contributor

uklotzde commented Oct 8, 2018

After a closer look at the parser I really have to asked a question: Why don't we evaluate empty arguments like they are supposed to be?? The odd special case handling at the beginning of SearchQueryParser::getTextArgument() prevents us from using empty arguments for this use case. Instead we now add another special case by introducing this counterintuitive empty quoted argument workaround.

We should instead remove the existing special case and use empty arguments to search for missing or empty fields. Did you notice that only spaces inserted on the right side get this special case handling, but not those on the left side? One more reason for removing this ugly legacy code, even if it changes the behaviour. Only the regex should be responsible for tokenizing the filter strings into field and argument.

@daschuer
Copy link
Member Author

I think there can be a way to keep some of the special case handling and allow to use an empty argument for searching for empty strings. As a last resort, we can also ditch the feature to allow a space after the token.

I think we need to define the big goal and adjust our small steps towards this. I also like to stick with the quoted search as an "exact match" like Google does. After the discussion how to find BPM values, I am convinced that the fuzzy match should be the default and if one wants to have the exact match, he has to use quotes.
This PR is already in line with this.

By the way a singles space after the token like "title: " may also search for all titles that contain a space aka two words.

What do you think?

Conflicts:
	src/library/searchquery.cpp
@uklotzde
Copy link
Contributor

The idea for distinguishing between an exact and a fuzzy search not reflected in the code. Now it's just one more special case that makes refactoring the code harder when implementing the full-featured solution. That's my only (minor) concern. But the code has to be rewritten anyway.

Usability (also a minor issue): The feature is difficult to discover without reading the manual.

It's ok for this small change. But I don't like this way of piling up new features on top without cleaning up the mess below, even if this requires dropping old, inconsistent features.

@daschuer
Copy link
Member Author

Ok, so how to continue here?
Merge it as it is or should I add the blankspace = emptyOrNull case first?

The first candidate for exact and fuzy match can be BPM.
"120" finds only 120.00 tracks wile plain 120 should match 119-121

@@ -535,6 +547,17 @@ CrateTrackSelectResult CrateStorage::selectTracksSortedByCrateNameLike(const QSt
}
}

TrackSelectResult CrateStorage::selectAllTracksSorted() const {
FwdSqlQuery query(m_database, QString(
"SELECT %1 FROM %2 GROUP BY %1 ORDER BY %1").arg(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use GROUP BY instead of DISTINCT?

bool NoCrateFilterNode::match(const TrackPointer& pTrack) const {
if (!m_matchInitialized) {
TrackSelectResult tracks(
m_pCrateStorage->selectAllTracksSorted());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried, because generally the size of this query's result if of the same magnitude as the size of the whole library.

The only way for preventing this would be to use an adaptive strategy depending on the actual contents of the crate tracks table, i.e. for deciding if most tracks are organized in crates or not. This comes at the cost of one or two additional queries for determining the sizes upfront to chose the optimum query that will return either a blacklist or a whitelist.

But the whole search thing does not scale very well, so let's just keep it as is without introducing more complexity.

@@ -91,6 +93,23 @@ class TextFilterNode : public QueryNode {
QString m_argument;
};

class NullOrEmptyTextFilterNode : public QueryNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have separate NullOrEmptyTextFilterNode/TextFilterNode and NoCrateFilterNode/FilterNode and the decision which one to use is done by the parser. But there is only one NumericFilterNode which does this decision internally.

I understand the motivation behind this, i.e. to separate and modularize the code for different use cases. But now the design has become inconsistent and responsibilities are scattered across different levels of abstraction.

@uklotzde
Copy link
Contributor

We can keep the functionality as is. But we have to at least fix the design issues to keep this maintainable.

@daschuer
Copy link
Member Author

OK, done.
I have also considered to add new regular expressions, but I think it is much more performant to exam the found matches individually.

Conflicts:
	src/library/searchqueryparser.cpp
@daschuer
Copy link
Member Author

@uklotzde, All comments are addressed, Can we merge this?

QString m_operator;
double m_dOperatorArgument;
bool m_bRangeQuery;
double m_dRangeLow;
double m_dRangeHigh;
};

class NullNumericFilterNode : public QueryNode {
public:
NullNumericFilterNode(const QStringList& sqlColumns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit

@@ -102,6 +121,20 @@ class CrateFilterNode : public QueryNode {
mutable std::vector<TrackId> m_matchingTrackIds;
};

class NoCrateFilterNode : public QueryNode {
public:
NoCrateFilterNode(const CrateStorage* pCrateStorage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit

Copy link
Contributor

@uklotzde uklotzde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add explicit to all single argument constructors. Otherwise I think we can merge this. LGTM.

@daschuer
Copy link
Member Author

Thank you for review.

@daschuer daschuer merged commit 0be1a70 into mixxxdj:master Dec 16, 2018
esbrandt added a commit to mixxxdj/manual that referenced this pull request May 3, 2020
@daschuer daschuer deleted the 1788086 branch September 26, 2021 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants