Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make touch logic in file_metadata.sh more portable #1648

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented Oct 24, 2024

Following up on #1496, it turns out that, in the fixture script that from_path_no_follow runs, which uses touch to create future and past files with extreme timestamps:

  • The date-time strings, intended to take advantage of nanosecond precision where available by specifying fractions of a second, are unlikely to be helping, because:

    1. The from_path_no_follow test does not require the exactly most extreme values to be useful.
    2. Experimentation with stat implementations that show sub-second precision suggests that, for the extreme values representable in ext4--which the values used there were chosen to cover--there is no benefit over using touch with whole seconds. (I was also not able to verify the correctness and relevance of the specific fractional values, but it may be that further research would confirm them.)
    3. Perhaps most significantly, the date-time strings did not specify a time zone and TZ was not set in the script or in the code that set it, so touch seems often to have been converting from some other time zone, usually local time, to UTC.
  • The fallback date-time string used for future is is not fully portable either, because touch on some systems will reject values that are too large to parse as 32-bit time_t. I have observed this on three platforms:

    1. 32-bit x86 GNU/Linux - Ubuntu 18.04 LTS (ESM), but not on Debian 12
    2. 32-bit x86 Windows 10 - 10.0.19044 (LTSC and non-LTSC tested)
    3. 64-bit x86 illumos - OmniOS CE (32-bit touch and 64-bit GNU touch tested)

    As noted, this does not affect all 32-bit GNU/Linux systems. A 32-bit x86 Debian 12 system I tested was unaffected, allowing timestamps far in the future of 2038 to be set with touch. (This most likely relates to Debian 12 being much newer than Ubuntu 18.04 LTS, and not to the Debian vs. Ubuntu distinction. Recent versions of Ubuntu do not support installation on machines with 32-bit x86 processors, so there is no comparably recent version of Ubuntu to test for close comparison to Debian 12.)
     
    On the affected 32-bit GNU/Linux system (Ubuntu 18.04 LTS), as well as on Windows (in Git Bash), the test suite finds GNU touch and, in both cases, the errors are:

    touch: invalid date format ‘2446-05-10 22:38:55.111111111’
    touch: invalid date format ‘2446-05-11 22:38:56’
    

    On illumos, some users place /usr/gnu/bin early in the path, but I did not make that change, so the illumos touch command was used. The errors were:

    touch: bad time specification
    touch: bad time specification
    

    On all three platforms, manual experimentation confirmed that the second error message was not actually due to the format in which the date-time string was given--though the error messages make it seem that way--but instead was due to it being out of range. The wording is presumably due to this being discovered during parsing.

    That these values would have a low ceiling on a 64-bit illumos system is unintuitive. The error message may be affected by how common illumos tools, including the touch command, are 32-bit builds even on a 64-bit system. But the GNU tools, including GNU touch, which are also installed with the system, are 64-bit builds. The following brief experiment demonstrates that time_t is a different-sized type in 32-bit and 64-bit builds running on the same system, but that the 64-bit GNU touch still refuses to set large values:

    ek@Omni:~/tmp$ TZ=UTC /usr/gnu/bin/touch -d '2038-01-19 03:14:08' future
    touch: setting times of 'future': Value too large for defined data type
    ek@Omni:~/tmp$ TZ=UTC /usr/gnu/bin/touch -d '2038-01-19 03:14:07' future
    ek@Omni:~/tmp$
    
    ek@Omni:~/repos/ttsize [main|✔️]$ cat ttsize.c
    #include <stdio.h>
    #include <time.h>
    
    int main(void)
    {
        printf("%zu\n", sizeof(time_t));
    }
    ek@Omni:~/repos/ttsize [main|✔️]$ gcc -std=c99 -pedantic-errors -Wall -Wextra -o ttsize ttsize.c
    ek@Omni:~/repos/ttsize [main|✔️]$ ./ttsize
    8
    ek@Omni:~/repos/ttsize [main|✔️]$ gcc -m32 -std=c99 -pedantic-errors -Wall -Wextra -o ttsize ttsize.c
    ek@Omni:~/repos/ttsize [main|✔️]$ ./ttsize
    4
    
    ek@Omni:~/repos/ttsize [main|✔️]$ file /usr/gnu/bin/touch
    /usr/gnu/bin/touch:     ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, not stripped, no debugging information available
    ek@Omni:~/repos/ttsize [main|✔️]$ file /usr/bin/touch
    /usr/bin/touch: ELF 32-bit LSB executable 80386 Version 1, dynamically linked, not stripped, no debugging information available
    

I can investigate further, including on this 64-bit OmniOS illumos system, if needed. However, I think the above is sufficient to establish that the existing fallback logic is not portable, that it can be improved while still preserving the tangible benefits of the previous approach, and that this can be done without increasing overall complexity if aspects of the previous approach that were ineffective or unnecessary are removed at the same time.

This PR:

  • Removes the old fallback logic, promoting the old fallback date-time values to be the primary ones attempted.
  • Adds TZ=UTC.
  • Uses the most precise non-fractional timestamps.
  • For future, adds a fallback for 32-bit time_t so we get both portability and, where feasible, the ability to test a very high (probably exactly maximal) ext4 timestamp.

The commit message, as well as the rewritten code comment in the fixture script, have some further information.

In addition to checking that from_path_no_follow test does fail, and in the expected way reported above, on all four affected test systems, I have also verified that the change here makes that test pass on those systems. (There are four systems rather than three because I tested on both LTSC and non-LTSC Windows 10 systems. But the difference between those systems is probably not significant in ways relevant to this behavior, so it may be better to think of them as only three test systems.)

A few more notes about how I tested this:

  • I have not manually tested this on macOS, because this particular fixture script always runs even when GIX_TEST_IGNORE_ARCHIVES is not set, and it does not run git so it will not vary based on whether git is Apple Git, so CI should be sufficient to test it.

  • I have rerun the tests, including these changes, on three unaffected systems to ensure they remain unaffected:

    1. A 64-bit x86 Arch Linux system as a control.
    2. The 32-bit Debian 12 system that was found to be unaffected as described above, to make sure this change did not somehow break that.
    3. The 64-bit Alpine Linux system where the test had failed due to #1491 until the fix in #1496, to ensure that this doesn't introduce a regression of #1496.

    The test passed on both all three systems, both when run individually and when run as part of the entire test suite.

  • The generated archive, while .gitignored, could interfere with testing. I made sure to run gix clean commands sufficient to delete it between all successive tests on the same system.

  • Output of some test runs is saved in this gist, in case for some reason it is of interest. Various other tests failed in some runs, but the changes here produced no new failures on any system. The only failures the change here remedies is from_path_no_follow.

Edit: Minor fixes for clarity in case this is referred to later, with strikethrough to show what was changed (though I think this was already read with the intended meanings when reviewed for merging).

1. Set `TZ=UTC`. This is for accuracy even more than portability;
   it is worthwhile on its own. The effect is not implied by any of
   the environment variables we set when running fixtures, and it
   has not been included in the date strings, though that could be
   done. Doing it this way, rather than by adding something to the
   date strings, makes it slightly easier to verify portability by
   examination. Without `TZ=UTC` or specifying the time zone in the
   date string, an unspecified time zone may be used; this may be
   the local time zone, but even if so, that will be incorrect
   since Unix timestamps stored in filesystem metadata are
   implicitly assumed UTC, so the extreme values we hard-code must
   be interpreted that way to have their intended effects.

2. Stop attempting to specify fractions of a second, which often
   (always?) does not work better and which requires more complex
   logic to express because a fallback is needed for some systems.
   The test currently using this fixture does not appear to need or
   significantly benefit from the most extreme possible dates, even
   if fractional values facilitate that, and the output of `stat`
   implementations that show fractional values do not seem to ever
   show different values when they are used with `touch`.

3. Having made what was the fallback logic (introduced in GitoxideLabs#1496)
   the primary approach, add a fallback touch command for `future`
   to handle how some systems with 32-bit `time_t` remain in use
   and have `touch` commands that treat too-large values as hard
   errors. The test systems where this was produced are a 32-bit
   x86 Ubuntu 18.04 LTS (Extended Security Maintenance) and a
   64-bit x86 OmniOS system (where even the 64-bit GNU `touch`
   binary fails with an error if the fallback is not used).

See the rewritten comment for details.
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the detailed description of the problem and the fix, and apologies for the late response! I somehow missed this PR entirely and found it by chance just yesterday.

It's great to see that thanks to your help gitoxide becomes more portable as well!

@Byron Byron merged commit 417bf95 into GitoxideLabs:main Oct 29, 2024
16 checks passed
@EliahKagan EliahKagan deleted the run-ci/touch-next branch October 29, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants