-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkp/pkp#8933 Restore original file after cancelling file upload wizard #8941
Conversation
a80dd3b
to
fc94ba2
Compare
5383bf9
to
5ff9333
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one longer comment about modifying the schema at run-time like this. I don't know if you'll have time to rethink that approach, but something to think about anyway. Otherwise, it's all small comments. 👍
379e695
to
9b2f174
Compare
$table->dropForeign('event_log_user_id_foreign'); | ||
$table->dropIndex('event_log_user_id'); | ||
$table->bigInteger('user_id')->nullable()->change(); | ||
$table->foreign('user_id')->references('user_id')->on('users')->onDelete('cascade'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want an ON DELETE CASCADE
here? I would think ON DELETE SET NULL
would be more appropriate -- the point is to be able to continue to log events when a user account doesn't exist, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong: the only appropriate way to remove a user account is to merge it with another one. For this specific case we also combine their log entries. Thus, on delete cascade here would prevent data corruption in unexpected cases. If not, I can set to null on delete.
Forgot to mention, the idea behind making user_id
nullable, is to allow to log events from the system, automated events or other cases when a specific user isn't involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me, thanks, please go ahead. But please do add a ->comment
to the user_id
column in LogMigration
to explain what a null value means (we'll add more of these in future releases)!
$mapLocaleWithSettingIds = []; | ||
foreach ($logChunks as $row) { | ||
// Get locale based on a submission file ID log entry | ||
$locale = $this->getContextPrimaryLocale($row, $sitePrimaryLocale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do this all in SQL for a significant performance boost. Do a LEFT JOIN
to the appropriate table for each assoc_type
, then join on journals (contexts) using COALESCE() around the candidate IDs; that'll give you a single
journalsjoin regardless of what kind of
assoc_type` was present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking further about performance: would we be able to turn this around to optimize it? Currently it's...
- for each file upload, edit, or revise event,
- get its context ID's primary locale, and
- update the setting's locale column to match.
If I understand it right, it would probably be a lot faster to go the other way around:
- For each context,
- Get its primary locale, and
- Update any file upload, edit, or revise event log entry settings to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that was also on my mind as a belated idea but I wasn't sure if the performance boost would be worth to re-implement it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, there are 3M entries in SciELO's event log table; 567k are in (SUBMISSION_LOG_FILE_UPLOAD, SUBMISSION_LOG_FILE_EDIT, SUBMISSION_LOG_FILE_REVISION_UPLOAD). So the cost of going through these individually would be very high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting context primary locale having only submission or submission file. Event log doesn't have context_id
column, thus I need to iterate over all those entries. I meant to move statement that selects context locale out of the loop.
I've optimised migration where possible. The major change is setting chunking to 10000, which means that the update statement will run once for 10k entries. On my laptop 500k records are updated within 15 minutes. If it's not enough, I can try to create a wider join statement, a series of something like:
SELECT * FROM event_log AS e
JOIN event_log_settings AS es on e.log_id = es.log_id
JOIN submissions AS s ON e.assoc_id = s.submission_id
WHERE e.assoc_type = 1048585
AND es.setting_name = 'filename'
AND s.context_id = '?'
Then updating log entries altogether, knowing the associated context and its primary locale. It might be faster but less prone to corrupted data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, thanks -- please go ahead, and I'll do a sanity check with a large dataset once all upgrade-related merges are complete in order to make sure we don't regress on performance.
87f2613
to
f18aae2
Compare
f18aae2
to
25123ae
Compare
No description provided.