Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query index improvements #6376

Merged
merged 10 commits into from
Mar 23, 2023
Merged
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
# NEXT RELEASE

### Enhancements
* <New feature description> (PR [#????](https://github.com/realm/realm-core/pull/????))
* Performance improvement for the following queries ([6376](https://github.com/realm/realm-core/issues/6376)):
* Significant (~75%) improvement when counting (`Query::count()`) the number of exact matches (with no other query conditions) on a string/int/UUID/ObjectID property that has an index. This improvement will be especially noticiable if there are a large number of results returned (duplicate values).
* Significant (~99%) improvement when querying for an exact match on a Timestamp property that has an index.
* Significant (~99%) improvement when querying for a case insensitive match on a Mixed property that has an index.
* Moderate (~25%) improvement when querying for an exact match on a Boolean property that has an index.
* Small (~5%) improvement when querying for a case insensitive match on a Mixed property that does not have an index.
* Expose `Results::is_valid()` in the C API, in order to prevent to use an invalid Results. (PR [#6407](https://github.com/realm/realm-core/pull/6407))

### Fixed
* Fixed a crash when querying a mixed property with a string operator (contains/like/beginswith/endswith) or with case insensitivity. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed)
* Querying for equality of a string on an indexed mixed property was returning case insensitive matches. For example querying for `myIndexedMixed == "Foo"` would incorrectly match on values of "foo" or "FOO" etc. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed)
* Adding an index to a Mixed property on a non-empty table would crash with an assertion. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed)
* `SyncSession::pause()` could hold a reference to the database open after shutting down the sync session, preventing users from being able to delete the realm. ([#6372](https://github.com/realm/realm-core/issues/6372), since v13.3.0)
* `Logger::set_default_logger()` did not perform any locking, resulting in data races if it was called while the default logger was being read on another thread ([PR #6398](https://github.com/realm/realm-core/pull/6398), since v13.7.0).

Expand Down
5 changes: 5 additions & 0 deletions src/realm/index_string.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,11 @@ size_t StringIndex::count(T value) const
return m_array->index_string_count(Mixed(value), m_target_column);
}

inline Allocator& StringIndex::get_alloc() const noexcept
{
return m_array->get_alloc();
}

inline void StringIndex::destroy() noexcept
{
return m_array->destroy_deep();
Expand Down
21 changes: 14 additions & 7 deletions src/realm/query.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -992,12 +992,14 @@ void Query::aggregate(QueryStateBase& st, ColKey column_key) const
auto node = pn->m_children[best];
if (node->has_search_index()) {
auto keys = node->index_based_keys();
REALM_ASSERT(keys);
// The node having the search index can be removed from the query as we know that
// all the objects will match this condition
pn->m_children[best] = pn->m_children.back();
pn->m_children.pop_back();
for (auto key : keys) {
auto obj = m_table->get_object(key);
const size_t num_keys = keys->size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this loop really making a difference?

for (size_t i = 0; i < num_keys; ++i) {
auto obj = m_table->get_object(keys->get(i));
if (pn->m_children.empty() || eval_object(obj)) {
st.m_key_offset = obj.get_key().value;
st.match(realm::npos, obj.get<T>(column_key));
Expand Down Expand Up @@ -1312,15 +1314,18 @@ void Query::do_find_all(TableView& ret, size_t limit) const
auto best = find_best_node(pn);
auto node = pn->m_children[best];
if (node->has_search_index()) {
auto keys = node->index_based_keys();
REALM_ASSERT(keys);
KeyColumn& refs = ret.m_key_values;

// The node having the search index can be removed from the query as we know that
// all the objects will match this condition
pn->m_children[best] = pn->m_children.back();
pn->m_children.pop_back();

auto keys = node->index_based_keys();
for (auto key : keys) {
const size_t num_keys = keys->size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it seems it is a pattern, why do we need to do this rather than a simpler for(const auto& key : keys)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration pattern here isn't really part of the optimization. The change is because keys was previously a std::vector<ObjKey> and now it is a IndexEvaluator* which doesn't have built in iterator support (I suppose I could add iterator support if you really wanted to keep the loop the same). The change in type allows us to dynamically fetch the keys directly from BPTree one at a time as needed. The previous way was to always evaluate all the results and return them in the vector. But that is entirely unused when we only have one query condition and are doing a count of results. So the optimization here is to fetch the matching object keys as needed because sometimes we don't need them at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the iterator is a "nice to have" feature, but that's ok to leave it as it is, it is up to you.

for (size_t i = 0; i < num_keys; ++i) {
ObjKey key = keys->get(i);
if (limit == 0)
break;
if (pn->m_children.empty()) {
Expand Down Expand Up @@ -1433,13 +1438,15 @@ size_t Query::do_count(size_t limit) const
auto node = pn->m_children[best];
if (node->has_search_index()) {
auto keys = node->index_based_keys();
REALM_ASSERT(keys);
if (pn->m_children.size() > 1) {
// The node having the search index can be removed from the query as we know that
// all the objects will match this condition
pn->m_children[best] = pn->m_children.back();
pn->m_children.pop_back();
for (auto key : keys) {
auto obj = m_table->get_object(key);
const size_t num_keys = keys->size();
for (size_t i = 0; i < num_keys; ++i) {
auto obj = m_table->get_object(keys->get(i));
if (eval_object(obj)) {
++cnt;
if (cnt == limit)
Expand All @@ -1449,7 +1456,7 @@ size_t Query::do_count(size_t limit) const
}
else {
// The node having the search index is the only node
auto sz = keys.size();
auto sz = keys->size();
cnt = std::min(limit, sz);
}
}
Expand Down
30 changes: 20 additions & 10 deletions src/realm/query_conditions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ struct Contains : public HackClass {
{
if (m1.is_null())
return !m2.is_null();
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -133,7 +133,7 @@ struct Like : public HackClass {
{
if (m1.is_null() && m2.is_null())
return true;
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -186,7 +186,7 @@ struct BeginsWith : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return b2.begins_with(b1);
Expand Down Expand Up @@ -232,7 +232,7 @@ struct EndsWith : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -395,7 +395,7 @@ struct ContainsIns : public HackClass {
{
if (m1.is_null())
return !m2.is_null();
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -479,7 +479,7 @@ struct LikeIns : public HackClass {
{
if (m1.is_null() && m2.is_null())
return true;
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, since you are repeating this over several ifs, you could have added this check inside Mixed::types_are_comparable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that both of the values could be numerics, so Mixed::types_are_comparable would actually return true, but trying to convert to a string will fail. We don't want to change Mixed::types_are_comparable because that is used in other places for other non-string values.

BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -544,7 +544,7 @@ struct BeginsWithIns : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -610,7 +610,7 @@ struct EndsWithIns : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -675,8 +675,18 @@ struct EqualIns : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
return (m1.is_null() && m2.is_null()) ||
(Mixed::types_are_comparable(m1, m2) && operator()(m1.get_binary(), m2.get_binary(), false, false));
if (m1.is_null() && m2.is_null()) {
return true;
}
else if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary)) {
return operator()(m1.get_binary(), m2.get_binary(), false, false);
}
else {
return m1 == m2;
}
}
return false;
}

template <class A, class B>
Expand Down
Loading