-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If archive_invalidations is in inconsistent state, fix as getting next archive to process. #16886
Conversation
core/CronArchive/QueueConsumer.php
Outdated
$bind[] = $archiveToProcess['report']; | ||
} | ||
|
||
$sql = "SELECT DISTINCT period FROM `$table` WHERE idsite = ? AND name = ? AND period > ? AND ? >= date1 AND date2 >= ? $reportClause"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw is the distinct actually needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could do an array_unique as well, was trying to reduce the amount of data selected since there could potentially be a lot of invalidations for a site
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good should be fine to keep distinct for now
core/CronArchive/QueueConsumer.php
Outdated
@@ -278,6 +280,8 @@ public function getNextArchivesToProcess() | |||
|
|||
$this->logger->debug("Processing invalidation: $invalidationDesc."); | |||
|
|||
$this->repairInvalidationsIfNeeded($invalidatedArchive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis any thought about doing this in CronArchive::launchArchivingFor()
after it finished successfully? Technically we don't need to schedule the higher period if it's not successful and also it could prevent a race condition where the day
archive takes many hours and by the time it's finished the higher period does not exist anymore if I see this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any thought about doing this in CronArchive::launchArchivingFor() after it finished successfully?
I figured it would be better to do it here, since when we invalidate through ArchiveInvalidator, we always add all higher periods. So at the start, we always have the whole list of invalidations.
Technically we don't need to schedule the higher period if it's not successful and also it could prevent a race condition where the day archive takes many hours and by the time it's finished the higher period does not exist anymore if I see this right?
The only case where the higher period would not exist anymore is if another archiver was processing the same site and picked it up, but this could happen w/o this repairing logic... In that case we'd have to do the intersecting period logic w/ a query on archive_invalidations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So at the start, we always have the whole list of invalidations.
you mean then we can batch insert it? I reckon ideally we do it after finishing the archive just to make sure there can't be a race condition or any other issues I would say. It be fine if it's just one insert at a time I would say (we could always tweak when needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsteur no I mean, in the invalidations table, normally we have all the higher periods before we start archiving, so doing the repair logic here mimics that behavior.
We can still do it after finishing an archive (though there will be less test coverage then).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be great to do it after finishing the archive. Could keep the method and just call it from CronArchive instead of QueueConsumer @diosmosis ? It's really just to avoid random edge cases etc which will happen
core/CronArchive/QueueConsumer.php
Outdated
} | ||
|
||
// archive is for week that is over two months, we don't need to care about the month | ||
if ($label == 'month' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we don't need to care about this case? There might be still issues eg when someone imports data in the past etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because the week will be on two months, so it won't actually be used in archiving either of the parent months.
There could be the case where there was a day that was archived, then for some reason the month/year disappear, and it's just the week. Then I guess we would need new month/year re-archives, but wouldn't be able to.
It seemed a waste to have to re-archive both months, but I can add it if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it now 👍 We only need to archive the correct month
@tsteur updated, there are still tests for it, but some of the effects seen in the queue consumer tests are no longer there |
core/CronArchive.php
Outdated
|
||
$higherPeriods = Db::fetchAll($sql, $bind); | ||
$higherPeriods = array_column($higherPeriods, 'period'); | ||
$higherPeriods = array_flip($higherPeriods); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not important but could have done instead below an in_array($id, $higherPeriods)
. not too important though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, thought this was a bit faster, but for an array w/ 3 items at most, probably not
'date1' => $period->getDateStart()->getDatetime(), | ||
'date2' => $period->getDateEnd()->getDatetime(), | ||
'period' => $id, | ||
'ts_invalidated' => $archiveToProcess['ts_invalidated'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this have maybe the current date? Just to be more clear when it was inserted. Or is it so it's included to be archived in this archive run maybe? (as we maybe do somewhere and ts_invalidated < idSiteStartTime
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's done that way so we process it in this batch. (getNextInvalidatedArchive does the ts_invalidated < idSiteStartTime
)
core/CronArchive.php
Outdated
$bind[] = $archiveToProcess['report']; | ||
} | ||
|
||
$sql = "SELECT DISTINCT period FROM `$table` WHERE idsite = ? AND name = ? AND period > ? AND ? >= date1 AND date2 >= ? $reportClause"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering. Do we here also need to include and status = 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis worked for me. Only need to maybe have a look at https://github.com/matomo-org/matomo/pull/16886/files#r537995820 as maybe the query needs to have to check status
Otherwise good to merge once the other comment re the |
Description:
In case the archive_invalidations table ends up in an inconsistent state, make sure we process invalidations higher than what we find.
cc @tsteur
Review