-
-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unnecessary queries when batch-removing statuses, 100x faster #15387
Conversation
8dd0c74
to
b3689bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll re-read it but so far it looks good to me.
b3689bb
to
2334add
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm comfortable with that going into 3.3.0 without another RC first, but it looks good to me so far.
I also have made a bunch of inline comments that I believe would further improve performances, but they can reasonably be postponed to another PR, this one seems fine as is.
# We do not batch all deletes into one to avoid having a long-running | ||
# transaction lock the database, but we use the delete method instead | ||
# of destroy to avoid all callbacks. We rely on foreign keys to | ||
# cascade the delete faster without loading the associations. | ||
statuses_and_reblogs.each(&:delete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could have some middle-ground here and batch some deletes together?
# Delete multiple statuses and reblogs of them as efficiently as possible | ||
# @param [Enumerable<Status>] statuses An array of statuses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment should be updated to note that some side effects aren't handled by this service and should be taken care of separately, e.g. notifications not being cleaned up.
@tags = statuses_and_reblogs.each_with_object({}) { |s, h| h[s.id] = s.tags.map { |tag| tag.name.mb_chars.downcase } } | ||
@json_payloads = statuses_and_reblogs.each_with_object({}) { |s, h| h[s.id] = Oj.dump(event: :delete, payload: s.id.to_s) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, those are only used once per status, so I am unsure why it's pre-computed.
|
||
return unless account.local? | ||
|
||
statuses.each do |status| | ||
FeedManager.instance.unpush_from_home(account, status) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's only ever used in a context where the user gets deleted, I wonder if we could get rid of that completely.
ASSOCIATIONS_WITHOUT_SIDE_EFFECTS = %w( | ||
account_pins | ||
aliases | ||
conversation_mutes | ||
conversations | ||
custom_filters | ||
devices | ||
domain_blocks | ||
featured_tags | ||
follow_requests | ||
identity_proofs | ||
migrations | ||
mute_relationships | ||
muted_by_relationships | ||
notifications | ||
scheduled_statuses | ||
status_pins | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bookmarks could be in there too (there's Chewy tracking, but the account is gonna be deleted anyway, right?)
@account.statuses.reorder(nil).find_in_batches do |statuses| | ||
statuses.reject! { |status| reported_status_ids.include?(status.id) } if @options[:reserve_username] | ||
BatchedRemoveStatusService.new.call(statuses, skip_side_effects: @options[:skip_side_effects]) | ||
statuses.reject! { |status| reported_status_ids.include?(status.id) } if keep_account_record? | ||
|
||
BatchedRemoveStatusService.new.call(statuses, skip_side_effects: skip_side_effects?) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if the reported_status_ids
shouldn't be a where.not(id: reported_status_ids)
instead and the whole thing an in_batches
rather than find_in_batches
.
Deleting an account speed comparison: