Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate arguments for lists #6674

Merged
merged 13 commits into from
May 29, 2023
Merged

Conversation

nicola-cab
Copy link
Member

What, How & Why?

When a list of arguments is passed into a query we don't allocate memory for those arguments, instead we just keep a reference to whatever ptr has been passed to core.
I guess this was done for performance reasons, because passing each argument to a query does allocate internal memory for cloning the argument/s.

But there are at least 2 drawbacks with this approach:

  1. it obliges the SDK that is calling into core for creating/running the query to allocate and keep the memory alive for the specific argument (this may or may not be easy) and most importantly, if the argument itself, changes between the construction and the execution of the query, the query itself may return different results, without any apparent reason.

  2. The API looks odd, and it is not consistent with what we do if we pass each argument individually, in this case we allocate the memory and we copy the data.

Fixes: #6614

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed.

std::shared_ptr<Subexpr> ret;

for (const auto& mixed : mixed_args) {
if (!mixed.is_null()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Null isn't being handled correctly here (it is inserting the last moved value of ret into the list).

@@ -1355,6 +1280,136 @@ std::unique_ptr<Subexpr> ConstantNode::visit(ParserDriver* drv, DataType hint)
return ret;
}

std::vector<std::shared_ptr<Subexpr>> ConstantNode::clone_list_of_args(std::vector<Mixed>& mixed_args)
{
std::vector<std::shared_ptr<Subexpr>> args_in_list;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, I'd recommend building up a ConstantMixedList. It should take care of owning the data. That will simplify a lot of code at the call site.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly,

 void ConstantMixedList ::set(size_t n, Mixed val)
    {
        Value<Mixed>::set(n, val);
        (*this)[n].use_buffer(m_buffer[n]);
    }

which copies the buffer only for string and binary.

switch (get_type()) {
        case type_String:
            buf = std::string(string_val);
            string_val = StringData(buf);
            break;
        case type_Binary:
            buf = std::string(binary_val);
            binary_val = BinaryData(buf);
            break;
        default:
            break;
    }

It seems to me that these are the only 2 types for which we need to deep copy stuff. So it should be fine. Thanks for the suggestion.

return args_in_list;
}

std::unique_ptr<Subexpr> ConstantNode::clone_arg(ParserDriver* drv, DataType type, size_t arg_no, DataType hint,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

CHANGELOG.md Outdated
@@ -9,6 +9,7 @@
* Access token refresh for websockets was not updating the location metadata ([#6630](https://github.com/realm/realm-core/issues/6630), since v13.9.3)
* Fix several UBSan failures which did not appear to result in functional bugs ([#6649](https://github.com/realm/realm-core/pull/6649)).
* Fix an out-of-bounds read in sectioned results when sectioned are removed by modifying all objects in that section to no longer appear in that section ([#6649](https://github.com/realm/realm-core/pull/6649), since v13.12.0)
* Fix allocate arguments for list in queries. ([#6674](https://github.com/realm/realm-core/pull/6674), since v12.5.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest noting that this only affects the query parser.

@nicola-cab nicola-cab requested a review from ironage May 26, 2023 13:48
@@ -191,6 +191,10 @@ class ConstantNode : public ValueNode {
{
target_table = table_name.substr(1, table_name.size() - 2);
}

// std::vector<std::shared_ptr<Subexpr>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand on this or remove the comment

for (auto& val : mixed_list) {
values->set(ndx++, val);
}
auto args_in_list = copy_list_of_args(mixed_list);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could assign this to ret immediately and save a line of code :)

std::unique_ptr<ConstantMixedList> args_in_list = std::make_unique<ConstantMixedList>(mixed_args.size());
size_t ndx = 0;
for (const auto& mixed : mixed_args) {
if (!mixed.is_null())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need special handling of null here as setting a null is a valid thing to do for a Mixed. This happens to work because the list is preallocated to the correct size and default initialized to nulls. So if we have a query like x IN {0, 1, 2, NULL, 3} it happens to work but you end up with a reordered list with the NULL coming last because of the missing ndx++ at that iteration in this loop. Not sure if you did this as an optimization, but I'd find it more readable to not have this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just thought that assigning a mixed set to null did not have much sense. But that's OK. I removed the check.

@nicola-cab nicola-cab requested a review from ironage May 26, 2023 18:18
values->set_comparison_type(*m_comp_type);
}
ret = std::move(values);
ret = std::move(copy_list_of_args(mixed_list));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this std::move is now unnecessary and causes a warning

@nicola-cab nicola-cab requested a review from ironage May 26, 2023 18:39
Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nicola-cab nicola-cab merged commit a172b42 into master May 29, 2023
@nicola-cab nicola-cab deleted the nc/allocate_arguments_for_lists branch May 29, 2023 08:53
kraenhansen added a commit that referenced this pull request Jun 19, 2023
* Updated release notes

* Update catch2 v3.3.2 (#6605)

* Make core infer platform and cpu_arch, while bundle_id must be provided by SDK's (#6612)

* platform and cpu_arch are inferred by core, bundle_id must be provided by SDK's

* update changelog

* Return proper value for X86_64 arch

Co-authored-by: Christian Melchior <[email protected]>

* Get fine-grained platform for Apple devices

* Fix tests

* small fixes

* fix more tests

* Fix mistake in changelog

---------

Co-authored-by: Christian Melchior <[email protected]>

* use consistent rounding, following SERVER-75392 (#6477)

* fix entries that went to the wrong change version (#6632)

* Special-case main thread runloop scheduler

* Improve SectionedResults performance

SectionedResults used a std::map in a few places where the keys are a dense
range (i.e. always [0..map.size())) and so they can be std::vector instead. The
maps keyed on Mixed are now std::unordered_map.

Change notifications now report changes as a `std::vector<IndexSet>` rather
than `std::map<size_t, IndexSet>`. This is slower and uses more memory when the
only sections that changed are near the end of a SectionedResults with a large
number of sections, but is much faster if all sections changed or if the
sections which changed are early in the SectionedResults. Change notifications
now reuse buffers, which increases persistent memory usage slightly but
significant reduces allocations.

Change notifications for a single section now only compute the changes for that
section rather than computing the full changes and then filtering out the
changes for other sections.

* use static_assert rather than a old home rolled one

* fix warning of redefine of CHECK macro

* fix unused function warning

* silence warnings in bid128_to_string

* Introduce BPlusTree::for_all

* Prevent program from crashing when removing backlinks

* Fix broken snapshot of collection of objects

* Fix importing Results with deleted collection

The result should be an empty result, not the whole table.

* geospatial validation of polygons (#6607)

* geospatial validation of polygons

* Loop->Ring, added tests

* use std::unique

* changelog

* Benchmark for full-text search

* Allow to filter benchmark and run only list of specified names
* Add simple benchmark for fulltext search with index

* Filter out unresolved links in Dictionary::get_any()

* Add support for early exit in BPlusTree::for_all()

* Geospatial feedback (#6645)

* verify local results match a server query

* disallow geowithin on top level tables

* fix geo queries with ANY/ALL/NONE

* geospatial validation of points

* rename GeoCenterSphere -> GeoCircle

* review feedback

* better testing and fix any/all/none geospatial

* format

* Geospatial basic queries benchmarks (#6621)

* Add basic benchmarks for Geospatial type and queries

* Less copying in GeoWithinCompare

* Bring back caching of s2 region into Geospatial

* remove transaction overhead from measurements

* a couple small optimizations

* formatting

* simplify geospatial query evaluations

* changelog

---------

Co-authored-by: James Stone <[email protected]>

* Updated baas server tag for CI (#6650)

* Prepare release

* Updated release notes

* Access token refresh for websockets was not updating the location metadata (#6631)

* Always refresh metadata on app login
* Updated changelog
* Always update location when requested; fix c_api test
* Update test to properly evaluate websocket redirections; added one more test
* Updated changelog and fixed compile warning
* Added location checks back to test
* added mutex locking around location updated state and reworked requesting location update to use flag
* clang format and fix incorrect timeout value
* Reworked update location logic a bit and removed unused function
* Free mutex before calling completion on early exit in init_app_metadata

* maybe fix a race in a test (#6651)

* Use std::optional to store cached leaves in query nodes (#6653)

Our use of aligned_storage was basically a complicated manual version of this.
I was hoping this'd have binary size benefits, but it ended up making the
library 100 bytes larger instead. Nonetheless, it greatly simplifies things.

* Fix a few UBSan failures hit by tests

* Avoid performing unaligned reads in Array::get_chunk()

* Fix a lock order inversion in tests (#6666)

The cycle was DaemonThread::m_running_on_change_mutex =>
RealmCoordinator::m_realm_mutex  => SyncManager::m_mutex  =>
RealmCoordinator::s_coordinator_mutex  =>
DaemonThread::m_running_on_change_mutex, and it happened due to
DaemonThread::remove() being called inside RealmCoordinator::clear_cache()
while holding s_coordinator_mutex. Fortunately we don't actually need to be doing that.

As the cycle required RealmCoordinator::clear_all_caches(), this was only
applicable to tests.

* Allow geo coordinate numeric argument substitutions (#6663)

* allow geo coordinate numeric argument substitutions

* review feedback

* explicit cast to address warning

* Remove catch() clause to prevent truncating stack trace in AsyncOper::do_recycle_and_execute() (#6667)

* Fix an assertion failure if an async write callback ran during a write transaction (#6661)

Between when the callback after acquiring the write lock is scheduled and when
it's invoked a synchronous write transaction can be begun, and if it's not
ended before the next time the scheduler gets to run, the scheduled callback
will be invoked inside the write. When this happens we want to just do nothing.
Ending the synchronous write transaction will take care of rescheduling the
async write it preempted.

* core release 13.13.0

* Updated release notes

* Allocate arguments for lists (#6674)

* Small documentation and code fixes (#6672)

* Fix crash when opening FLX realm after client reset failure (#6671)

* Fix crash when opening FLX realm after client reset failure

* Update changelog

* Don't superceed pending subscriptions in case of a client reset failure

* Add test

* Changes after code review

* Support sorting based on values from a dictionary (#5311)

Co-authored-by: Sebastian Valle <[email protected]>
Co-authored-by: James Stone <[email protected]>

* Filter out external sources from Eclipse (#6682)

Indexer has a hard time dealing with Catch2

* Use cross-compilers instead of CentOS image (#6559)

* Use cross-compilers instead of CentOS image

* changelog

* fix bad merge

* refactor toolchain files

* clarify useToolchain exception circumstances

* Remap github URL to ssh to fix BAAS dependency using https:// (#6685)

* core v13.14.0

* Updated release notes

* Switch to building with Xcode 14 (#6647)

* better fix explanation in the changelog for list of args in the query parser (#6692)

* Remove constructor for GeoPoint and GeoPolygon (#6679)

Co-authored-by: Mathias Stearn <[email protected]>

* Fix failing "sync: non-synced metadata table doesn't result in non-additive schema change" tests (#6697)

* Reporting correct error message on HTTP errors for Browser target

* User/Server API key provider becomes a single 'API key' provider (#6696)

* Allow frozen Realms to be opened with additive schema changes (#6693)

* allow frozen Realms to be opened with additive schema changes

* lint

* strengthen tests and comments

* Update src/realm/object-store/shared_realm.cpp

Co-authored-by: Thomas Goyne <[email protected]>

---------

Co-authored-by: Thomas Goyne <[email protected]>

* Reverted minimum swift version to fix failing CI tests (#6706)

* core release v13.15.0

* Updated release notes

* Fix client reset test with invalid query (#6711)

* Fix SessionWrapper use-after-free crash when tearing down sessions (#6676)

* Changed SessionWrapper pointer to bind_ptr; added session ident history
* Fix teardown if client is destroyed before session
* Session no longer holds bind_ptr to SessionWrapper; reverted some changes
* Fixed return and updated some comments
* Don't process errors if session is shutting down
* Added extra checks for session state
* Updates from review
* Updated some finalized checks
* Rolled back some changes
* Added output to ASSERTS and moved session history to unordered_set
* Remove session history entry on normal close
* Updated comment in sync tests

* Add [baas] and [local] tags to object store sync tests to identify the tests that rely on BAAS or not (#6710)

* Use Columns<Link> when property is Dictionary of links (#6705)

If a Dictionary property has links as value type, we can use Columns<Link> to handle
the links instead of the basic Columns<Dictionary>. This has the effect that when we
compare with a single value, we will optimize to use LinksToNode. So we need to make
LinksToNode handle the Dictionary case.

When we compare with a list of links, we must ensure that the list is converted to
a list obj ObjKeys - which is the type that Column<Link> evaluates to.

 Use LinksToNode for lists in QueryParser

* better changelog message for the fix related to queries with list of arguments (#6717)

* Fixes for Emscripten target (Passing header from fetch response. Using Config.path for inMemory Realm) (#6716)

* Fixes for Emscripten target: Passing header for fetch response. Passing the RealmConfig.path to be used for inMemory Realm, this is needed for registering SyncSession

Co-authored-by: Jørgen Edelbo <[email protected]>

* release 13.15.1

* Updated spec.yml to remove User & Server prefix from ApiKey credentials

---------

Co-authored-by: James Stone <[email protected]>
Co-authored-by: realm-ci <[email protected]>
Co-authored-by: Kirill Burtsev <[email protected]>
Co-authored-by: Daniel Tabacaru <[email protected]>
Co-authored-by: Christian Melchior <[email protected]>
Co-authored-by: Thomas Goyne <[email protected]>
Co-authored-by: Thomas Goyne <[email protected]>
Co-authored-by: Jørgen Edelbo <[email protected]>
Co-authored-by: Michael Wilkerson-Barker <[email protected]>
Co-authored-by: Nicola Cabiddu <[email protected]>
Co-authored-by: Sebastian Valle <[email protected]>
Co-authored-by: Yavor Georgiev <[email protected]>
Co-authored-by: Ferdinando Papale <[email protected]>
Co-authored-by: Mathias Stearn <[email protected]>
Co-authored-by: Nabil Hachicha <[email protected]>
Co-authored-by: Finn Schiermer Andersen <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C-API] Query arguments with list of strings are not copied when query is parsed
3 participants