-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Archive.php archive invalidation to Loader… #15616
Conversation
… when about to launch archiving.
Seems a test is failing now. Otherwise guess looks good to merge |
Here's some thoughts maybe that would likely improve our problem but not really 100% sure if it makes sense and would work @diosmosis diff --git a/core/ArchiveProcessor/Loader.php b/core/ArchiveProcessor/Loader.php
index c99dd3d02e..d6f896fee6 100644
--- a/core/ArchiveProcessor/Loader.php
+++ b/core/ArchiveProcessor/Loader.php
@@ -79,15 +79,30 @@ class Loader
{
$this->params->setRequestedPlugin($pluginName);
+ // no valid archive exists, make sure to invalidate existing archives
+ if (($isBrowserArchiveEnabled || $isArchiveTriggered) && $this->invalidateBeforeArchiving) {
+ $this->invalidatedReportsIfNeeded(); // SHOULD INVALIDATE ALL REPORTS
+ }
+
list($idArchive, $visits, $visitsConverted) = $this->loadExistingArchiveIdFromDb();
if (!empty($idArchive)) {
+ // IF REPORT INCLUDES TODAY....
+ // IF ARCHIVE WAS GENERATED WITHIN LAST time_before_today_archive_considered_outdated SECONDS DO NOT RUN
+ // INVALIDATE LOGIC SINCE REPORT WOULD NOT BE REGENERATED ANYWAY
+
+ // IF ARCHIVE WAS GENERATED MORE THAN time_before_today_archive_considered_outdated SECONDS AGO,
+ // RUN INVALIDATION LOGIC AND CALL loadExistingArchiveIdFromDb() again in case it is no longer valid
+
+ // IF REPORT NOT INCLUDES TODAY
+ // IDEALLY WE INVALIDATE REPORTS ONLY THAT AFFECT THAT PERIOD TO NOT SLOW DOWN EG RANGE REQUESTS
+
return $idArchive;
}
- // no valid archive exists, make sure to invalidate existing archives
- if ($this->invalidateBeforeArchiving) {
- $this->invalidatedReportsIfNeeded();
- }
+ // NO NEED TO INVALIDATE IF THE ARCHIVE DOES NOT EXIST SINCE IT WOULDN'T MAKE A DIFFERENCE TO THE RESULT. MEANING
+ // IF WE DID INVALIDATE REPORTS, THEN THE OUTPUT BE STILL THE SAME. EG THIS WAY WE AVOID INVALIDATING REPORTS WHEN
+ // RANGE IS SELECTED AND THE REPORT DOES NOT EXIST YET. AS A RESULT OF THIS WE WOULD ALSO NOT INVALIDATE AND
+ // REARCHIVE ANY SUBPERIODS
/** @var ArchivingStatus $archivingStatus */
$archivingStatus = StaticContainer::get(ArchivingStatus::class); Generally speaking eg on Cloud:
While many other users have browser archiving disabled, segment archiving is still enabled etc. For these users it be great to invalidate all reports only when When browser archiving is enabled, it'll be a bit of a problem since we often have to invalidate reports in order to get accurate results unless some report was generated recently (say within 900s if default config is used). This might be still fine though. It'll just make Matomo even slower but maybe not possible to avoid it until we are in Matomo 4. |
@tsteur I tried making changes to handle every possible case for when archiving, can you take a look? I haven't modified tests yet, you can probably just look at the latest commit. |
core/DataAccess/ArchiveSelector.php
Outdated
if (!isset($result['idarchive']) | ||
|| !isset($result['nb_visits']) | ||
) { | ||
return [false, false, false, true]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw here it returned for me isAnyArchiveExists = true
even though there was no archive with a done flag. Only an archive with a nb_visits
. This can happen if eg the archiving fails before the done flag was written but after nb_visits was written. Not sure it's important.
0117b39
to
b6de30f
Compare
5deb1ac
to
052d824
Compare
052d824
to
2dd4958
Compare
@tsteur noticed an issue w/ this change, if we invalidate only if there is something to invalidate, we may end up archiving more than once in some situations. For example:
We can't remove the date/site in the list in this case since it could also invalidate other archives that are not currently being requested. The only way to do this change would be to run the invalidate query as a SELECT first then invalidate if it returns something, but that would of course be pointless. NOTE: this only affects browser archiving, btw. Also with the 'today' change it may end up being a bit of an edge case... only other issue is that some archives may not be invalidated if they are never requested. Which would only be an issue if the options in the list are removed. We can have a daily task in that case to just run the invalidation I guess. |
…e check in CronArchive.
Daily task for this should work. 👍 |
Or maybe at the beginning of the archiving we still need to invalidate all reports like we used to for cron archiving. For browser archiving handling would need a task indeed. |
We still do this, I haven't changed that (I disabled the Archive.php code for invalidation if browser archiving is enabled, so it's all done by core:archive, which I think is better IMO). |
@tsteur tests should pass now, ready for a review. I may add some more tests in the meantime. |
plugins/CoreAdminHome/Tasks.php
Outdated
@@ -81,6 +88,17 @@ public function schedule() | |||
$this->scheduleTrackingCodeReminderChecks(); | |||
} | |||
|
|||
public function invalidateOutdatedArchives() | |||
{ | |||
if (false && !Rules::isBrowserTriggerEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the false is probably debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is still there @diosmosis ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, must've missed it? not sure how
core/ArchiveProcessor/Loader.php
Outdated
@@ -242,4 +271,33 @@ private function getIdSitesToArchiveWhenNoVisits() | |||
|
|||
return $cache->fetch($cacheKey); | |||
} | |||
|
|||
private function invalidatedReportsIfNeeded() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis might be good to add some tests for this maybe? Like it could return the invalidated date/siteIds which we then can test against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests
$isLast = preg_match('/^last([0-9]+)/', $date, $matches); | ||
|
||
// don't do this check for a single period that includes today | ||
if ($isTodayIncluded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis not 100% understanding this part. today
would be mostly included right? I understand that we can assume basically there is no valid archive for today because we archive each time anyway. I'm just wondering if it would still need to process below code in case
- Cron Archive starts
- The default date is
last2
- We invalidate all reports
- We are about to archive a site
- This logic runs
- Now if previously some reports from 10 days ago are invalidated, we would need to change it to
last10
basically
Not sure it's clear what I mean? Or maybe this is not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here was if it was a single period like, period = week, day = today
, then we don't check for an existing usable archive. If it is for a date in the past, then we check if there is a usable archive and if so we don't need to archive.
For lastN
values, however, it is more complicated. Here, we want to check for every date except today. So we check for valid archives. If the date is last5
and for the 5th, 4th and 3rd day we have valid archives, then we change it to last2. The only difference is if we have valid archives for every day, we still do last1 (I think the min is last1 now but we can change it back to last2).
Maybe I can add some tests for this code too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests, maybe you can better see what it's trying to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fully getting it now 👍
core/CronArchive.php
Outdated
$newDate = 'last' . min($lastNValue, $newRangePeriod->getNumberOfSubperiods()); | ||
} else if ($isTodayIncluded) { | ||
$isThereArchiveForAllPeriods = false; | ||
$newDate = 'last1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw not sure but might still need to use last2 here to workaround some archiving problems where it would maybe create a range date for today and some plugins may have trouble with it. I suppose maybe shouldn't hurt to use last2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this too, but I wasn't sure... I'll change it just in case.
} else if ($isTodayIncluded) { | ||
$isThereArchiveForAllPeriods = false; | ||
$newDate = 'last1'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw... I was thinking... if it is last1
(aka today
, this week
, this month
), then we could technically get the date when this archive was last generated, and if it was generated less than time_before_(today|$period)_archive_considered_outdated
seconds ago, return $isThereArchiveForAllPeriods = true
and skip the archive this time.
Only if it's easy to do though. Otherwise could also just launch the archiving. On cloud where we consider archives outdated sometimes only evert 6, 12, 18 hours (depending on config), then it could save us launching quite a few archive launches.
Not sure it's clear what I mean? It be probably similar to what we have in browser archiving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already done, regardless of whether browser archiving is used or not. In Loader.php we always use the result of getMinTimeArchiveProcessed()
. So it should always look for an archive that is out of date, correct? I could maybe add a debug log for the future...
👍 left another comment @diosmosis few tests are still failing but I suppose you still work in it. eg https://travis-ci.org/matomo-org/matomo/jobs/659112026?utm_medium=notification&utm_source=github_status |
@tsteur fixed some tests, there are some unrelated failures though |
@@ -85,6 +85,10 @@ private function getStackTrace($exception) | |||
|
|||
public static function getWholeBacktrace(\Exception $exception, $shouldPrintBacktrace = true) | |||
{ | |||
if (!$shouldPrintBacktrace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis should this one still be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise the backtrace is always printed. This was an unrelated bug I found.
* Move Archive.php archive invalidation to Loader so we only invalidate when about to launch archiving. * Attempt to handle more cases when invalidating before launching archiving. * fix possible sql error * fix possible error * fixing some tests * remove test code * Only invalidate specific archive being requested. * Do not invalidate on today in tracker and avoid existing valid archive check in CronArchive. * more test fixes * Attempt to fix more tests. * Fixing last tests. * another test fix * Invalidate in scheduled task if browser triggered archiving is enabled. * deal with TODO * Get ArchiveSelectorTest to pass. * applying review feedback including new tests * apply review feedback & fix tests * fix couple more tests Co-authored-by: Thomas Steur <[email protected]>
* Move Archive.php archive invalidation to Loader so we only invalidate when about to launch archiving. * Attempt to handle more cases when invalidating before launching archiving. * fix possible sql error * fix possible error * fixing some tests * remove test code * Only invalidate specific archive being requested. * Do not invalidate on today in tracker and avoid existing valid archive check in CronArchive. * more test fixes * Attempt to fix more tests. * Fixing last tests. * another test fix * Invalidate in scheduled task if browser triggered archiving is enabled. * deal with TODO * Get ArchiveSelectorTest to pass. * applying review feedback including new tests * apply review feedback & fix tests * fix couple more tests Co-authored-by: Thomas Steur <[email protected]>
… so we only invalidate when about to launch archiving.
Not sure if there are tests I can add for this just yet.