Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epoch_interval setting in the journal file needs to be honored in all cases #341

Open
nars1 opened this issue Aug 14, 2018 · 0 comments
Open
Assignees
Milestone

Comments

@nars1
Copy link
Collaborator

nars1 commented Aug 14, 2018

Final Release Note

Description

Below is a test case that demonstrates the issue.

$ cat restart
rm *.gld *.dat *.mj*
mumps -run GDE exit >& gde.out
mupip create >& mu_cre.out
mupip set -journal="enable,on,nobefore,epoch_interval=2" -reg "*" >& setjnl.out
mumps -run x
mupip journal -extract -noverify -detail -forward -fences=none mumps.mjl >& jnlext.out
grep -E "EPOCH|SET" mumps.mjf

$ cat x.m
        set ^x=1 hang 8
        set ^x=2 hang 5
        set ^x=3 hang 1
        set ^x=4 hang 1
        set ^x=5
        quit

Below is the output of the above test

0x000100d0 [0x00b8] :: EPOCH  \64874,50932\1\1175120456\595\0\1\0\98\101\1
0x00010278 [0x0040] :: SET    \64874,50932\1\3485712716\596\0\0\0\0\0\0\^x="1"
0x000102b8 [0x00b8] :: EPOCH  \64874,50938\2\3647418316\596\0\0\0\96\101\1
0x00010370 [0x0040] :: SET    \64874,50940\2\215069095\596\0\0\0\0\0\0\^x="2"
0x000103b0 [0x0040] :: SET    \64874,50945\3\556710171\596\0\0\0\0\0\0\^x="3"
0x000103f0 [0x0040] :: SET    \64874,50946\4\900332931\596\0\0\0\0\0\0\^x="4"
0x00010430 [0x00b8] :: EPOCH  \64874,50947\5\301624417\596\0\0\0\96\101\1
0x000104e8 [0x0040] :: SET    \64874,50947\5\3236711807\596\0\0\0\0\0\0\^x="5"
0x00010528 [0x00b8] :: EPOCH  \64874,50948\6\4135141905\596\0\0\0\96\101\1

Notice that the epoch-interval is set to 2 seconds which means we expect at least one EPOCH record to be written between any updates that are separated by >= 2 seconds. But in the above case, we have the SET of ^x=2 done at 64874,50940 and the SET of ^x=4 done at 64874,50946 which are separated by 6 seconds (a lot more than the epoch-interval of 2 seconds) but still no EPOCH is written.

This is a regression introduced in GT.M V6.3-002 (most likely as part of GTM-8708).

Draft Release Note

The epoch_interval setting in a journal file is honored in all cases. That is, if two updates to the journal file are separated by N seconds where N is the epoch_interval setting, then the two updates are guaranteed to be separated by at least one EPOCH records in the journal file. Previously (starting GT.M V6.3-002), this setting was not honored in some cases (e.g. when an idle/free epoch got written) resulting in delaying the EPOCH by as much as 6 seconds.

@nars1 nars1 added this to the r124 milestone Aug 14, 2018
@nars1 nars1 self-assigned this Aug 14, 2018
nars1 added a commit to nars1/YottaDB that referenced this issue Aug 14, 2018
…honored in all cases

Code fixes for GTM-8708 in GT.M V6.3-002 wanted to prevent the update helper writer process from
unnecessarily making calls to grab_crit_immediate() in case the instance had no updates.
It did this by setting the next epoch time (jbp->next_epoch_time) to a huge value (MAXUINT4)
the moment a wcs_flu() call was done to write a regular epoch (JRE gvstat) where it noticed an
idle epoch (JRI gvstat) had already been written (because of inactivity since the last update).
This caused the jbp->next_epoch_time vs jgbl.gbl_jrec_time check in updhelper_writer() to
automatically skip invoking the grab_crit_immediate() during periods of idleness.

But this introduced the YottaDB#341 regression. It is now possible for a regular epoch to not be written
in a timely fashion. This is because once jbp->next_epoch_time gets set to MAXUINT4 as part of
an update, if the next update on the same journal file happens say N seconds later (where N could
be as high as 5 seconds, the hardcoded inactivity time before an idle epoch kicks in), the call
to t_end/tp_tend for the next update finds jbp->next_epoch_time set to MAXUINT4 and resets it to
the current time (which is the time of the next update, not the time of the prior update) plus
the epoch_interval. This means the next_epoch_time is set to a value that is incorrect by at most
N seconds. That is, it is possible for two updates to be separated by as much as N + epoch_interval
seconds without having an intervening EPOCH in the journal file. This violates the definition of
the epoch_interval setting.

The fix for this is to undo all the changes done as part of GTM-8708. So jbp->next_epoch_time is
never set to MAXUINT4. Instead, updhelper_writer.c is enhanced to check if an EPOCH is the most
recent record in the journal buffer (jbp->post_epoch_freeaddr == jbp->rsrv_freeaddr) and if so
we skip checking jbp->next_epoch_time (and invoking grab_crit_immediate()) altogether since we know
there is nothing left to be flushed in the journal file. This achieves the objective of GTM-8708.
And for YottaDB#341, jbp->next_epoch_time is updated in wcs_flu() at the time when we notice an epoch is
the most recent journal record in the journal file. This ensures we honor the epoch_interval setting.
nars1 added a commit to nars1/YottaDBtest that referenced this issue Aug 14, 2018
nars1 added a commit that referenced this issue Aug 14, 2018
… in all cases (#342)

Code fixes for GTM-8708 in GT.M V6.3-002 wanted to prevent the update helper writer process from
unnecessarily making calls to grab_crit_immediate() in case the instance had no updates.
It did this by setting the next epoch time (jbp->next_epoch_time) to a huge value (MAXUINT4)
the moment a wcs_flu() call was done to write a regular epoch (JRE gvstat) where it noticed an
idle epoch (JRI gvstat) had already been written (because of inactivity since the last update).
This caused the jbp->next_epoch_time vs jgbl.gbl_jrec_time check in updhelper_writer() to
automatically skip invoking the grab_crit_immediate() during periods of idleness.

But this introduced the #341 regression. It is now possible for a regular epoch to not be written
in a timely fashion. This is because once jbp->next_epoch_time gets set to MAXUINT4 as part of
an update, if the next update on the same journal file happens say N seconds later (where N could
be as high as 5 seconds, the hardcoded inactivity time before an idle epoch kicks in), the call
to t_end/tp_tend for the next update finds jbp->next_epoch_time set to MAXUINT4 and resets it to
the current time (which is the time of the next update, not the time of the prior update) plus
the epoch_interval. This means the next_epoch_time is set to a value that is incorrect by at most
N seconds. That is, it is possible for two updates to be separated by as much as N + epoch_interval
seconds without having an intervening EPOCH in the journal file. This violates the definition of
the epoch_interval setting.

The fix for this is to undo all the changes done as part of GTM-8708. So jbp->next_epoch_time is
never set to MAXUINT4. Instead, updhelper_writer.c is enhanced to check if an EPOCH is the most
recent record in the journal buffer (jbp->post_epoch_freeaddr == jbp->rsrv_freeaddr) and if so
we skip checking jbp->next_epoch_time (and invoking grab_crit_immediate()) altogether since we know
there is nothing left to be flushed in the journal file. This achieves the objective of GTM-8708.
And for #341, jbp->next_epoch_time is updated in wcs_flu() at the time when we notice an epoch is
the most recent journal record in the journal file. This ensures we honor the epoch_interval setting.
nars1 added a commit to YottaDB/YDBTest that referenced this issue Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant