Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If archive_invalidations is in inconsistent state, fix as getting next archive to process. #16886

Merged
merged 7 commits into from
Dec 8, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions core/CronArchive.php
Original file line number Diff line number Diff line change
Expand Up @@ -764,11 +764,6 @@ private function getApiToInvalidateArchivedReport()

public function invalidateArchivedReportsForSitesThatNeedToBeArchivedAgain($idSiteToInvalidate)
{
if ($this->model->isInvalidationsScheduledForSite($idSiteToInvalidate)) {
$this->logger->debug("Invalidations currently exist for idSite $idSiteToInvalidate, skipping invalidating for now...");
return;
}

if (empty($this->segmentArchiving)) {
// might not be initialised if init is not called
$this->segmentArchiving = new SegmentArchiving($this->processNewSegmentsFrom, $this->dateLastForced);
Expand Down
64 changes: 64 additions & 0 deletions core/CronArchive/QueueConsumer.php
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,12 @@
use Piwik\ArchiveProcessor\Parameters;
use Piwik\ArchiveProcessor\Rules;
use Piwik\CliMulti\RequestParser;
use Piwik\Common;
use Piwik\CronArchive;
use Piwik\DataAccess\ArchiveSelector;
use Piwik\DataAccess\Model;
use Piwik\Date;
use Piwik\Db;
use Piwik\Exception\UnexpectedWebsiteFoundException;
use Piwik\Period;
use Piwik\Period\Factory as PeriodFactory;
Expand Down Expand Up @@ -278,6 +280,8 @@ public function getNextArchivesToProcess()

$this->logger->debug("Processing invalidation: $invalidationDesc.");

$this->repairInvalidationsIfNeeded($invalidatedArchive);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis any thought about doing this in CronArchive::launchArchivingFor() after it finished successfully? Technically we don't need to schedule the higher period if it's not successful and also it could prevent a race condition where the day archive takes many hours and by the time it's finished the higher period does not exist anymore if I see this right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thought about doing this in CronArchive::launchArchivingFor() after it finished successfully?

I figured it would be better to do it here, since when we invalidate through ArchiveInvalidator, we always add all higher periods. So at the start, we always have the whole list of invalidations.

Technically we don't need to schedule the higher period if it's not successful and also it could prevent a race condition where the day archive takes many hours and by the time it's finished the higher period does not exist anymore if I see this right?

The only case where the higher period would not exist anymore is if another archiver was processing the same site and picked it up, but this could happen w/o this repairing logic... In that case we'd have to do the intersecting period logic w/ a query on archive_invalidations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at the start, we always have the whole list of invalidations.

you mean then we can batch insert it? I reckon ideally we do it after finishing the archive just to make sure there can't be a race condition or any other issues I would say. It be fine if it's just one insert at a time I would say (we could always tweak when needed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsteur no I mean, in the invalidations table, normally we have all the higher periods before we start archiving, so doing the repair logic here mimics that behavior.

We can still do it after finishing an archive (though there will be less test coverage then).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be great to do it after finishing the archive. Could keep the method and just call it from CronArchive instead of QueueConsumer @diosmosis ? It's really just to avoid random edge cases etc which will happen


$archivesToProcess[] = $invalidatedArchive;
}

Expand Down Expand Up @@ -311,6 +315,66 @@ public function getNextArchivesToProcess()
return $archivesToProcess;
}

// public for tests
public function repairInvalidationsIfNeeded($archiveToProcess)
{
$table = Common::prefixTable('archive_invalidations');

$bind = [
$archiveToProcess['idsite'],
$archiveToProcess['name'],
$archiveToProcess['period'],
$archiveToProcess['date1'],
$archiveToProcess['date2'],
];

$reportClause = '';
if (!empty($archiveToProcess['report'])) {
$reportClause = " AND report = ?";
$bind[] = $archiveToProcess['report'];
}

$sql = "SELECT DISTINCT period FROM `$table` WHERE idsite = ? AND name = ? AND period > ? AND ? >= date1 AND date2 >= ? $reportClause";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw is the distinct actually needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do an array_unique as well, was trying to reduce the amount of data selected since there could potentially be a lot of invalidations for a site

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good should be fine to keep distinct for now


$higherPeriods = Db::fetchAll($sql, $bind);
$higherPeriods = array_column($higherPeriods, 'period');
$higherPeriods = array_flip($higherPeriods);

$invalidationsToInsert = [];
foreach (Piwik::$idPeriods as $label => $id) {
// lower period than the one we're processing or range, don't care
if ($id <= $archiveToProcess['period'] || $label == 'range') {
continue;
}

if (isset($higherPeriods[$id])) { // period exists in table
continue;
}

// archive is for week that is over two months, we don't need to care about the month
if ($label == 'month'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we don't need to care about this case? There might be still issues eg when someone imports data in the past etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the week will be on two months, so it won't actually be used in archiving either of the parent months.

There could be the case where there was a day that was archived, then for some reason the month/year disappear, and it's just the week. Then I guess we would need new month/year re-archives, but wouldn't be able to.

It seemed a waste to have to re-archive both months, but I can add it if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it now 👍 We only need to archive the correct month

&& Date::factory($archiveToProcess['date1'])->toString('m') != Date::factory($archiveToProcess['date2'])->toString('m')
) {
continue;
}

$period = Period\Factory::build($label, $archiveToProcess['date1']);
$invalidationsToInsert[] = [
'idarchive' => null,
'name' => $archiveToProcess['name'],
'report' => $archiveToProcess['report'],
'idsite' => $archiveToProcess['idsite'],
'date1' => $period->getDateStart()->getDatetime(),
'date2' => $period->getDateEnd()->getDatetime(),
'period' => $id,
'ts_invalidated' => $archiveToProcess['ts_invalidated'],
];
}

$fields = ['idarchive', 'name', 'report', 'idsite', 'date1', 'date2', 'period', 'ts_invalidated'];
Db\BatchInsert::tableInsertBatch(Common::prefixTable('archive_invalidations'), $fields, $invalidationsToInsert);
}

private function archiveArrayContainsArchive($archiveArray, $archive)
{
foreach ($archiveArray as $entry) {
Expand Down
13 changes: 1 addition & 12 deletions core/DataAccess/Model.php
Original file line number Diff line number Diff line change
Expand Up @@ -713,7 +713,7 @@ public function isSimilarArchiveInProgress($invalidation)
public function getNextInvalidatedArchive($idSite, $archivingStartTime, $idInvalidationsToExclude = null, $useLimit = true)
{
$table = Common::prefixTable('archive_invalidations');
$sql = "SELECT idinvalidation, idarchive, idsite, date1, date2, period, `name`, report
$sql = "SELECT idinvalidation, idarchive, idsite, date1, date2, period, `name`, report, ts_invalidated
FROM `$table`
WHERE idsite = ? AND status != ? AND ts_invalidated <= ?";
$bind = [
Expand Down Expand Up @@ -876,15 +876,4 @@ public function resetFailedArchivingJobs()
$query = Db::query($sql, $bind);
return $query->rowCount();
}

public function isInvalidationsScheduledForSite($idSite)
{
$table = Common::prefixTable('archive_invalidations');

$bind = [(int) $idSite];

$sql = "SELECT idsite FROM `$table` WHERE idsite = ? LIMIT 1";
$value = Db::fetchOne($sql, $bind);
return !empty($value);
}
}
Loading