-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If archive_invalidations is in inconsistent state, fix as getting next archive to process. #16886
Changes from 2 commits
959f0aa
6151be4
1790e01
991d19b
77798a6
9c3a48b
8408363
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,10 +14,12 @@ | |
use Piwik\ArchiveProcessor\Parameters; | ||
use Piwik\ArchiveProcessor\Rules; | ||
use Piwik\CliMulti\RequestParser; | ||
use Piwik\Common; | ||
use Piwik\CronArchive; | ||
use Piwik\DataAccess\ArchiveSelector; | ||
use Piwik\DataAccess\Model; | ||
use Piwik\Date; | ||
use Piwik\Db; | ||
use Piwik\Exception\UnexpectedWebsiteFoundException; | ||
use Piwik\Period; | ||
use Piwik\Period\Factory as PeriodFactory; | ||
|
@@ -278,6 +280,8 @@ public function getNextArchivesToProcess() | |
|
||
$this->logger->debug("Processing invalidation: $invalidationDesc."); | ||
|
||
$this->repairInvalidationsIfNeeded($invalidatedArchive); | ||
|
||
$archivesToProcess[] = $invalidatedArchive; | ||
} | ||
|
||
|
@@ -311,6 +315,66 @@ public function getNextArchivesToProcess() | |
return $archivesToProcess; | ||
} | ||
|
||
// public for tests | ||
public function repairInvalidationsIfNeeded($archiveToProcess) | ||
{ | ||
$table = Common::prefixTable('archive_invalidations'); | ||
|
||
$bind = [ | ||
$archiveToProcess['idsite'], | ||
$archiveToProcess['name'], | ||
$archiveToProcess['period'], | ||
$archiveToProcess['date1'], | ||
$archiveToProcess['date2'], | ||
]; | ||
|
||
$reportClause = ''; | ||
if (!empty($archiveToProcess['report'])) { | ||
$reportClause = " AND report = ?"; | ||
$bind[] = $archiveToProcess['report']; | ||
} | ||
|
||
$sql = "SELECT DISTINCT period FROM `$table` WHERE idsite = ? AND name = ? AND period > ? AND ? >= date1 AND date2 >= ? $reportClause"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw is the distinct actually needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could do an array_unique as well, was trying to reduce the amount of data selected since there could potentially be a lot of invalidations for a site There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. all good should be fine to keep distinct for now |
||
|
||
$higherPeriods = Db::fetchAll($sql, $bind); | ||
$higherPeriods = array_column($higherPeriods, 'period'); | ||
$higherPeriods = array_flip($higherPeriods); | ||
|
||
$invalidationsToInsert = []; | ||
foreach (Piwik::$idPeriods as $label => $id) { | ||
// lower period than the one we're processing or range, don't care | ||
if ($id <= $archiveToProcess['period'] || $label == 'range') { | ||
continue; | ||
} | ||
|
||
if (isset($higherPeriods[$id])) { // period exists in table | ||
continue; | ||
} | ||
|
||
// archive is for week that is over two months, we don't need to care about the month | ||
if ($label == 'month' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure why we don't need to care about this case? There might be still issues eg when someone imports data in the past etc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's because the week will be on two months, so it won't actually be used in archiving either of the parent months. There could be the case where there was a day that was archived, then for some reason the month/year disappear, and it's just the week. Then I guess we would need new month/year re-archives, but wouldn't be able to. It seemed a waste to have to re-archive both months, but I can add it if necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it now 👍 We only need to archive the correct month |
||
&& Date::factory($archiveToProcess['date1'])->toString('m') != Date::factory($archiveToProcess['date2'])->toString('m') | ||
) { | ||
continue; | ||
} | ||
|
||
$period = Period\Factory::build($label, $archiveToProcess['date1']); | ||
$invalidationsToInsert[] = [ | ||
'idarchive' => null, | ||
'name' => $archiveToProcess['name'], | ||
'report' => $archiveToProcess['report'], | ||
'idsite' => $archiveToProcess['idsite'], | ||
'date1' => $period->getDateStart()->getDatetime(), | ||
'date2' => $period->getDateEnd()->getDatetime(), | ||
'period' => $id, | ||
'ts_invalidated' => $archiveToProcess['ts_invalidated'], | ||
]; | ||
} | ||
|
||
$fields = ['idarchive', 'name', 'report', 'idsite', 'date1', 'date2', 'period', 'ts_invalidated']; | ||
Db\BatchInsert::tableInsertBatch(Common::prefixTable('archive_invalidations'), $fields, $invalidationsToInsert); | ||
} | ||
|
||
private function archiveArrayContainsArchive($archiveArray, $archive) | ||
{ | ||
foreach ($archiveArray as $entry) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis any thought about doing this in
CronArchive::launchArchivingFor()
after it finished successfully? Technically we don't need to schedule the higher period if it's not successful and also it could prevent a race condition where theday
archive takes many hours and by the time it's finished the higher period does not exist anymore if I see this right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it would be better to do it here, since when we invalidate through ArchiveInvalidator, we always add all higher periods. So at the start, we always have the whole list of invalidations.
The only case where the higher period would not exist anymore is if another archiver was processing the same site and picked it up, but this could happen w/o this repairing logic... In that case we'd have to do the intersecting period logic w/ a query on archive_invalidations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean then we can batch insert it? I reckon ideally we do it after finishing the archive just to make sure there can't be a race condition or any other issues I would say. It be fine if it's just one insert at a time I would say (we could always tweak when needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsteur no I mean, in the invalidations table, normally we have all the higher periods before we start archiving, so doing the repair logic here mimics that behavior.
We can still do it after finishing an archive (though there will be less test coverage then).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be great to do it after finishing the archive. Could keep the method and just call it from CronArchive instead of QueueConsumer @diosmosis ? It's really just to avoid random edge cases etc which will happen