Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2152 Don't emit transaction log instructions for mutations on newly-created objects #7734

Merged
merged 5 commits into from
Jun 4, 2024

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented May 24, 2024

The non-sync transaction logs are only used to drive notifications and notifications don't care about mutations on objects in the same commit as the objects were created it, so we don't need to emit the instructions at all. This significantly cuts the size of the transaction log for commits which are primarily inserting objects.

This does a very basic check for "newly-created" which tracks the most recently created object for each table and skips mutation instructions for that object. This handles recursively creating an object and all of its embedded objects without the overhead of tracking every single object created within a transaction, and insertion workflows will typically not return to an object after creating another object in the same table.

This requires adding an additional small amount of tracking for embedded objects, as Replication previously didn't know when new embedded objects were created.

The sample Realm which prompted this had a ton of CT history entries for updating the bootstrap store that were mildly annoying to skip over while looking for the actual bootstrap application instructions, so I made the bootstrap store not emit CT history. I don't think this is a meaningful optimization outside of very narrow circumstances (e.g. if the server decides to send us a whole bunch of single-object changesets).

realm-trawler turned out to be broken for sync Realms, so I fixed that.

The existing object-store notification tests do a pretty good job of validating that instructions are emitted when they need to be, so the new tests focus on validating that they aren't when they shouldn't, which isn't testable at the object-store level (as the point of this is to stop emitting instructions which the object-store transaction log handler was just discarding).

@tgoyne tgoyne self-assigned this May 24, 2024
@cla-bot cla-bot bot added the cla: yes label May 24, 2024
Copy link

coveralls-official bot commented May 24, 2024

Pull Request Test Coverage Report for Build thomas.goyne_391

Details

  • 358 of 362 (98.9%) changed or added relevant lines in 6 files are covered.
  • 74 unchanged lines in 15 files lost coverage.
  • Overall coverage increased (+0.03%) to 90.876%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/pending_bootstrap_store.cpp 32 36 88.89%
Files with Coverage Reduction New Missed Lines %
src/realm/query_engine.hpp 1 94.11%
src/realm/util/serializer.cpp 1 90.43%
test/test_query2.cpp 1 98.73%
src/realm/replication.cpp 2 80.89%
src/realm/util/file.cpp 2 78.73%
test/test_shared.cpp 2 96.74%
test/fuzz_group.cpp 3 50.72%
src/realm/sync/instruction_applier.cpp 5 68.95%
src/realm/sync/noinst/server/server_history.cpp 5 63.44%
src/realm/sync/noinst/server/server.cpp 6 73.74%
Totals Coverage Status
Change from base Build 2379: 0.03%
Covered Lines: 214850
Relevant Lines: 236420

💛 - Coveralls

@tgoyne tgoyne force-pushed the tg/create-object-repl branch 5 times, most recently from 5dca5c6 to a3880c3 Compare June 3, 2024 17:39
@tgoyne
Copy link
Member Author

tgoyne commented Jun 3, 2024

We appear to currently not have any benchmarks which exercise this very well; everything which measures insertion performance only has one column and this becomes more beneficial the more columns there are. The single column case is ~5% faster.

@tgoyne tgoyne force-pushed the tg/create-object-repl branch from a3880c3 to feb080c Compare June 3, 2024 18:20
@tgoyne tgoyne marked this pull request as ready for review June 3, 2024 18:54
@tgoyne tgoyne requested a review from danieltabacaru June 3, 2024 18:54
@tgoyne tgoyne changed the title Don't emit transaction log instructions for mutations on newly-created objects RCORE-2152 Don't emit transaction log instructions for mutations on newly-created objects Jun 3, 2024
Copy link
Collaborator

@danieltabacaru danieltabacaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

src/realm/exec/realm_trawler.cpp Outdated Show resolved Hide resolved
src/realm/replication.hpp Outdated Show resolved Hide resolved
src/realm/replication.hpp Outdated Show resolved Hide resolved
@tgoyne tgoyne force-pushed the tg/create-object-repl branch 2 times, most recently from ef23dc3 to 8e51722 Compare June 4, 2024 19:00
tgoyne added 5 commits June 4, 2024 14:51
Nothing observes this table for notifications, so we don't need CT history.
This is a pretty minor optimization unless we get a very large number of very
small changests from the server.
…d objects

The non-sync transaction logs are only used to drive notifications and
notifications don't care about mutations on objects in the same commit as the
objects were created it, so we don't need to emit the instructions at all. This
significantly cuts the size of the transaction log for commits which are
primarily inserting objects.

This does a very basic check for "newly-created" which tracks the most recently
created object for each table and skips mutation instructions for that object.
This handles recursively creating an object and all of its embedded objects
without the overhead of tracking every single object created within a
transaction, and insertion workflows will typically not return to an object
after creating another object in the same table.

This requires adding an additional small amount of tracking for embedded
objects, as Replication previously didn't know when new embedded objects were
created.
These were used for converting to the file format where ObjKeys were derived
from primary keys.
@tgoyne tgoyne force-pushed the tg/create-object-repl branch from 8e51722 to 5d5e26d Compare June 4, 2024 21:52
@tgoyne tgoyne merged commit 1f78955 into master Jun 4, 2024
39 checks passed
@tgoyne tgoyne deleted the tg/create-object-repl branch June 4, 2024 23:40
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants