epoch_interval setting in the journal file needs to be honored in all cases #341

nars1 · 2018-08-14T18:19:49Z

Final Release Note

Description

Below is a test case that demonstrates the issue.

$ cat restart
rm *.gld *.dat *.mj*
mumps -run GDE exit >& gde.out
mupip create >& mu_cre.out
mupip set -journal="enable,on,nobefore,epoch_interval=2" -reg "*" >& setjnl.out
mumps -run x
mupip journal -extract -noverify -detail -forward -fences=none mumps.mjl >& jnlext.out
grep -E "EPOCH|SET" mumps.mjf

$ cat x.m
        set ^x=1 hang 8
        set ^x=2 hang 5
        set ^x=3 hang 1
        set ^x=4 hang 1
        set ^x=5
        quit

Below is the output of the above test

0x000100d0 [0x00b8] :: EPOCH  \64874,50932\1\1175120456\595\0\1\0\98\101\1
0x00010278 [0x0040] :: SET    \64874,50932\1\3485712716\596\0\0\0\0\0\0\^x="1"
0x000102b8 [0x00b8] :: EPOCH  \64874,50938\2\3647418316\596\0\0\0\96\101\1
0x00010370 [0x0040] :: SET    \64874,50940\2\215069095\596\0\0\0\0\0\0\^x="2"
0x000103b0 [0x0040] :: SET    \64874,50945\3\556710171\596\0\0\0\0\0\0\^x="3"
0x000103f0 [0x0040] :: SET    \64874,50946\4\900332931\596\0\0\0\0\0\0\^x="4"
0x00010430 [0x00b8] :: EPOCH  \64874,50947\5\301624417\596\0\0\0\96\101\1
0x000104e8 [0x0040] :: SET    \64874,50947\5\3236711807\596\0\0\0\0\0\0\^x="5"
0x00010528 [0x00b8] :: EPOCH  \64874,50948\6\4135141905\596\0\0\0\96\101\1

Notice that the epoch-interval is set to 2 seconds which means we expect at least one EPOCH record to be written between any updates that are separated by >= 2 seconds. But in the above case, we have the SET of ^x=2 done at 64874,50940 and the SET of ^x=4 done at 64874,50946 which are separated by 6 seconds (a lot more than the epoch-interval of 2 seconds) but still no EPOCH is written.

This is a regression introduced in GT.M V6.3-002 (most likely as part of GTM-8708).

Draft Release Note

The epoch_interval setting in a journal file is honored in all cases. That is, if two updates to the journal file are separated by N seconds where N is the epoch_interval setting, then the two updates are guaranteed to be separated by at least one EPOCH records in the journal file. Previously (starting GT.M V6.3-002), this setting was not honored in some cases (e.g. when an idle/free epoch got written) resulting in delaying the EPOCH by as much as 6 seconds.

The text was updated successfully, but these errors were encountered:

…honored in all cases Code fixes for GTM-8708 in GT.M V6.3-002 wanted to prevent the update helper writer process from unnecessarily making calls to grab_crit_immediate() in case the instance had no updates. It did this by setting the next epoch time (jbp->next_epoch_time) to a huge value (MAXUINT4) the moment a wcs_flu() call was done to write a regular epoch (JRE gvstat) where it noticed an idle epoch (JRI gvstat) had already been written (because of inactivity since the last update). This caused the jbp->next_epoch_time vs jgbl.gbl_jrec_time check in updhelper_writer() to automatically skip invoking the grab_crit_immediate() during periods of idleness. But this introduced the YottaDB#341 regression. It is now possible for a regular epoch to not be written in a timely fashion. This is because once jbp->next_epoch_time gets set to MAXUINT4 as part of an update, if the next update on the same journal file happens say N seconds later (where N could be as high as 5 seconds, the hardcoded inactivity time before an idle epoch kicks in), the call to t_end/tp_tend for the next update finds jbp->next_epoch_time set to MAXUINT4 and resets it to the current time (which is the time of the next update, not the time of the prior update) plus the epoch_interval. This means the next_epoch_time is set to a value that is incorrect by at most N seconds. That is, it is possible for two updates to be separated by as much as N + epoch_interval seconds without having an intervening EPOCH in the journal file. This violates the definition of the epoch_interval setting. The fix for this is to undo all the changes done as part of GTM-8708. So jbp->next_epoch_time is never set to MAXUINT4. Instead, updhelper_writer.c is enhanced to check if an EPOCH is the most recent record in the journal buffer (jbp->post_epoch_freeaddr == jbp->rsrv_freeaddr) and if so we skip checking jbp->next_epoch_time (and invoking grab_crit_immediate()) altogether since we know there is nothing left to be flushed in the journal file. This achieves the objective of GTM-8708. And for YottaDB#341, jbp->next_epoch_time is updated in wcs_flu() at the time when we notice an epoch is the most recent journal record in the journal file. This ensures we honor the epoch_interval setting.

… in all cases (#342) Code fixes for GTM-8708 in GT.M V6.3-002 wanted to prevent the update helper writer process from unnecessarily making calls to grab_crit_immediate() in case the instance had no updates. It did this by setting the next epoch time (jbp->next_epoch_time) to a huge value (MAXUINT4) the moment a wcs_flu() call was done to write a regular epoch (JRE gvstat) where it noticed an idle epoch (JRI gvstat) had already been written (because of inactivity since the last update). This caused the jbp->next_epoch_time vs jgbl.gbl_jrec_time check in updhelper_writer() to automatically skip invoking the grab_crit_immediate() during periods of idleness. But this introduced the #341 regression. It is now possible for a regular epoch to not be written in a timely fashion. This is because once jbp->next_epoch_time gets set to MAXUINT4 as part of an update, if the next update on the same journal file happens say N seconds later (where N could be as high as 5 seconds, the hardcoded inactivity time before an idle epoch kicks in), the call to t_end/tp_tend for the next update finds jbp->next_epoch_time set to MAXUINT4 and resets it to the current time (which is the time of the next update, not the time of the prior update) plus the epoch_interval. This means the next_epoch_time is set to a value that is incorrect by at most N seconds. That is, it is possible for two updates to be separated by as much as N + epoch_interval seconds without having an intervening EPOCH in the journal file. This violates the definition of the epoch_interval setting. The fix for this is to undo all the changes done as part of GTM-8708. So jbp->next_epoch_time is never set to MAXUINT4. Instead, updhelper_writer.c is enhanced to check if an EPOCH is the most recent record in the journal buffer (jbp->post_epoch_freeaddr == jbp->rsrv_freeaddr) and if so we skip checking jbp->next_epoch_time (and invoking grab_crit_immediate()) altogether since we know there is nothing left to be flushed in the journal file. This achieves the objective of GTM-8708. And for #341, jbp->next_epoch_time is updated in wcs_flu() at the time when we notice an epoch is the most recent journal record in the journal file. This ensures we honor the epoch_interval setting.

nars1 added the upstream issue label Aug 14, 2018

nars1 added this to the r124 milestone Aug 14, 2018

nars1 self-assigned this Aug 14, 2018

nars1 mentioned this issue Aug 14, 2018

[#341] epoch_interval setting in the journal file needs to be honored in all cases #342

Merged

nars1 added a commit to nars1/YottaDBtest that referenced this issue Aug 14, 2018

[YottaDB#341] New r124/ydb341 subtest (tests YottaDB/YDB#341)

27b0a14

nars1 added a commit to YottaDB/YDBTest that referenced this issue Aug 14, 2018

[#341] New r124/ydb341 subtest (tests YottaDB/YDB#341) (#375)

bc00302

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epoch_interval setting in the journal file needs to be honored in all cases #341

epoch_interval setting in the journal file needs to be honored in all cases #341

nars1 commented Aug 14, 2018 •

edited

Loading

epoch_interval setting in the journal file needs to be honored in all cases #341

epoch_interval setting in the journal file needs to be honored in all cases #341

Comments

nars1 commented Aug 14, 2018 • edited Loading

Final Release Note

Description

Draft Release Note

nars1 commented Aug 14, 2018 •

edited

Loading