Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new --no-use-system-time flag to use a deterministic timestamp in built PEX #722

Merged
merged 32 commits into from
May 8, 2019
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
8b2fd37
Add deterministic_datetime() function
Eric-Arellano May 2, 2019
b440fbf
Proof of concept #1: monkey patch approach that works on 3.7
Eric-Arellano May 2, 2019
4b53a9e
Add new CLI flag --use-system-time
Eric-Arellano May 2, 2019
226c3bc
Approach #2: use writestr()
Eric-Arellano May 2, 2019
baa20c4
Add test for deterministic timestamp
Eric-Arellano May 3, 2019
a27122c
Unksip 6/7 of the acceptance tests!
Eric-Arellano May 3, 2019
e9a067a
Explain reproducibility
Eric-Arellano May 3, 2019
0295ddc
Setup integration tests to use --no-use-system-time
Eric-Arellano May 3, 2019
80adeff
Fix regression to normal case
Eric-Arellano May 3, 2019
5fe4a59
Set default to true to ensure we arent breaking anything
Eric-Arellano May 3, 2019
829d12d
Test deterministic timestamp against all CI
Eric-Arellano May 3, 2019
fdc1dc0
Skip acceptance tests still failing
Eric-Arellano May 3, 2019
80b5008
Fix typo in link
Eric-Arellano May 3, 2019
794084e
Fix bug with permissions not copying
Eric-Arellano May 3, 2019
952cf9e
Revert "Test deterministic timestamp against all CI"
Eric-Arellano May 3, 2019
a7b8558
Ensure env gets cleaned up, even if test fails
Eric-Arellano May 3, 2019
fda9a16
Merge branch 'master' of github.com:pantsbuild/pex into timestamps
Eric-Arellano May 5, 2019
f7364de
Convert deterministic_datetime from a function to a constant
Eric-Arellano May 5, 2019
0835f60
Revert making deterministic_datetime a constant
Eric-Arellano May 5, 2019
0c9bd41
No longer respect SOURCE_DATE_EPOCH
Eric-Arellano May 6, 2019
9774c2a
Improve comments
Eric-Arellano May 6, 2019
477e2eb
Change rationale for not respecting SOURCE_DATE_EPOCH
Eric-Arellano May 6, 2019
631b3ed
Drop overly verbose comment about SOURCE_DATE_EPOCH
Eric-Arellano May 7, 2019
69106c0
Create new zip_info_from_file() function as part of PermPreservingZip…
Eric-Arellano May 7, 2019
d82cd99
Always use zf.writestr()
Eric-Arellano May 7, 2019
b9d36e8
Remove bad extra whitespace
Eric-Arellano May 7, 2019
ebe15c2
Store DETERMINISTIC_DATETIME as a 6-tuple rather than datetime
Eric-Arellano May 7, 2019
099f146
Invert ternary expression
Eric-Arellano May 7, 2019
23c5a6f
Oops, typo in constant
Eric-Arellano May 7, 2019
8bb8ed4
Deduplicate slicing the time.struct_time
Eric-Arellano May 8, 2019
a528eb4
Don't lie about expected input
Eric-Arellano May 8, 2019
adf2726
Fix typo that was causing failures
Eric-Arellano May 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion pex/bin/pex.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,18 @@ def configure_clp_pex_options(parser):
'likely will not be reproducible, meaning that if you were to run `./pex -o` with the '
'same inputs then the new pex would not be byte-for-byte identical to the original.')

group.add_option(
'--use-system-time', '--no-use-system-time',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions for a better name.

The main use case I want to optimize for is when deterministic time will be the default. For example, atm --use-deterministic-time would probably be a good name, but once we make the switch then --no-use-deterministic-time wouldn't be very descriptive for what the behavior would end up being.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pantsbuild/pants#6094 we made a flag -jar-creation-time which, if not set, defaulted to the current time. Not sure whether this is a good or bad default for pex, but it's some prior art :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illicitonion : I think that we agreed on the other issue that we want this enabled by default, so the default would need to be stable.

@Eric-Arellano : This name looks good I think?

dest='use_system_time',
default=True,
action='callback',
callback=parse_bool,
help='Use the current system time to generate timestamps for the new pex. Otherwise, Pex '
'will first try to read the time from the environment variable SOURCE_DATE_EPOCH and '
'will fallback to midnight on January 1, 1980. By using system time, the generated pex '
'will not be reproducible, meaning that if you were to run `./pex -o` with the '
'same inputs then the new pex would not be byte-for-byte identical to the original.')

parser.add_option_group(group)


Expand Down Expand Up @@ -694,7 +706,11 @@ def main(args=None):
log('Saving PEX file to %s' % options.pex_name, V=options.verbosity)
tmp_name = options.pex_name + '~'
safe_delete(tmp_name)
pex_builder.build(tmp_name, bytecode_compile=options.compile)
pex_builder.build(
tmp_name,
bytecode_compile=options.compile,
deterministic_timestamp=not options.use_system_time
)
os.rename(tmp_name, options.pex_name)
else:
if not _compatible_with_current_platform(options.platforms):
Expand Down
67 changes: 64 additions & 3 deletions pex/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,32 @@
import threading
import zipfile
from collections import defaultdict
from datetime import datetime
from uuid import uuid4


def deterministic_datetime():
"""Return a deterministic UTC datetime object that does not depend on the system time.

First try to read from the standard $SOURCE_DATE_EPOCH, then default to midnight January 1,
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
1980, which is the start of time for MS-DOS time. Per section 4.4.6 of the zip spec at
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT, Zipfiles use MS-DOS time. So, we
use this default to ensure no issues with Zipfiles.

Even though we use MS-DOS to inform the default value, note that we still return a normal UTC
datetime object for flexibility with how the result is used.

For more information on $SOURCE_DATE_EPOCH, refer to
https://reproducible-builds.org/docs/source-date-epoch/."""
return (
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
datetime.utcfromtimestamp(int(os.environ['SOURCE_DATE_EPOCH']))
if "SOURCE_DATE_EPOCH" in os.environ
else datetime(
year=1980, month=1, day=1, hour=0, minute=0, second=0, microsecond=0, tzinfo=None
)
)


def die(msg, exit_code=1):
print(msg, file=sys.stderr)
sys.exit(exit_code)
Expand Down Expand Up @@ -98,7 +121,7 @@ def _chmod(self, info, path):

@contextlib.contextmanager
def open_zip(path, *args, **kwargs):
"""A contextmanager for zip files. Passes through positional and kwargs to zipfile.ZipFile."""
"""A contextmanager for zip files. Passes through positional and kwargs to zipfile.ZipFile."""
with contextlib.closing(PermPreservingZipFile(path, *args, **kwargs)) as zip:
yield zip

Expand Down Expand Up @@ -363,7 +386,45 @@ def __str__(self):
def delete(self):
shutil.rmtree(self.chroot)

def zip(self, filename, mode='w'):
def zip(self, filename, mode='w', deterministic_timestamp=False):
with open_zip(filename, mode) as zf:
for f in sorted(self.files()):
zf.write(os.path.join(self.chroot, f), arcname=f, compress_type=zipfile.ZIP_DEFLATED)
full_path = os.path.join(self.chroot, f)
if not deterministic_timestamp:
zf.write(
full_path,
arcname=f,
compress_type=zipfile.ZIP_DEFLATED
)
else:
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
# This code mostly reimplements ZipInfo.from_file(), which is not available in Python
# 2.7. See https://github.com/python/cpython/blob/master/Lib/zipfile.py#L495.
st = os.stat(full_path)
isdir = stat.S_ISDIR(st.st_mode)
# Determine arcname, i.e. the archive name.
arcname = f
arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
while arcname[0] in (os.sep, os.altsep):
arcname = arcname[1:]
if isdir:
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
arcname += '/'
# Setup ZipInfo with deterministic datetime.
zinfo = zipfile.ZipInfo(
filename=arcname,
# NB: the constructor expects a six tuple of year-month-day-hour-minute-second.
date_time=deterministic_datetime().timetuple()[:6]
)
zinfo.external_attr = (st.st_mode & 0xFFFF) << 16
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
if isdir:
zinfo.file_size = 0
zinfo.external_attr |= 0x10 # MS-DOS directory flag
else:
zinfo.file_size = st.st_size
# Determine the data to write.
if isdir:
data = b''
else:
with open(full_path, 'rb') as open_f:
data = open_f.read()
# Write to the archive.
zf.writestr(zinfo, data, compress_type=zipfile.ZIP_DEFLATED)
5 changes: 3 additions & 2 deletions pex/pex_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,11 +428,12 @@ def freeze(self, bytecode_compile=True):
self._precompile_source()
self._frozen = True

def build(self, filename, bytecode_compile=True):
def build(self, filename, bytecode_compile=True, deterministic_timestamp=False):
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
"""Package the PEX into a zipfile.

:param filename: The filename where the PEX should be stored.
:param bytecode_compile: If True, precompile .py files into .pyc files.
:param deterministic_timestamp: If True, will not use system time for the zipfile timestamps.

If the PEXBuilder is not yet frozen, it will be frozen by ``build``. This renders the
PEXBuilder immutable.
Expand All @@ -450,7 +451,7 @@ def build(self, filename, bytecode_compile=True):
with open(filename + '~', 'ab') as pexfile:
assert os.path.getsize(pexfile.name) == 0
pexfile.write(to_bytes('%s\n' % self._shebang))
self._chroot.zip(filename + '~', mode='a')
self._chroot.zip(filename + '~', mode='a', deterministic_timestamp=deterministic_timestamp)
if os.path.exists(filename):
os.unlink(filename)
os.rename(filename + '~', filename)
Expand Down
9 changes: 4 additions & 5 deletions tests/test_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -1351,7 +1351,10 @@ def assert_reproducible_build(args):
# from the random seed, such as data structures, as Tox sets this value by default. See
# https://tox.readthedocs.io/en/latest/example/basic.html#special-handling-of-pythonhashseed.
def create_pex(path, seed):
run_pex_command(args + ['-o', path, '--no-compile'], env=make_env(PYTHONHASHSEED=seed))
run_pex_command(
args + ['-o', path, '--no-compile', '--no-use-system-time'],
env=make_env(PYTHONHASHSEED=seed)
)
create_pex(pex1, seed=111)
# We sleep to ensure that there is no non-reproducibility from timestamps or
# anything that may depend on the system time. Note that we must sleep for at least
Expand All @@ -1375,7 +1378,6 @@ def create_pex(path, seed):
assert filecmp.cmp(pex1, pex2, shallow=False)


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_no_args():
assert_reproducible_build([])

Expand All @@ -1391,7 +1393,6 @@ def test_reproducible_build_sdist_requirements():
assert_reproducible_build(['pycparser==2.19', '--no-build'])


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_m_flag():
assert_reproducible_build(['-m', 'pydoc'])

Expand All @@ -1410,11 +1411,9 @@ def test_reproducible_build_c_flag():
assert_reproducible_build([project_dir, '-c', 'my_app'])


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_python_flag():
assert_reproducible_build(['--python=python2.7'])


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_python_shebang_flag():
assert_reproducible_build(['--python-shebang=/usr/bin/python'])
24 changes: 24 additions & 0 deletions tests/test_pex_builder.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Copyright 2014 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

import datetime
import os
import stat
import zipfile

import pytest

Expand Down Expand Up @@ -177,3 +179,25 @@ def build_and_check(path, copy):

build_and_check(td2, False)
build_and_check(td3, True)


def test_pex_builder_deterministic_timestamp():
def assert_all_files_have_timestamp(timestamp):
pb = PEXBuilder()
with temporary_dir() as td:
target = os.path.join(td, 'foo.pex')
pb.build(target, deterministic_timestamp=True)
with zipfile.ZipFile(target) as zf:
assert all(zinfo.date_time == timestamp for zinfo in zf.infolist())

assert_all_files_have_timestamp((1980, 1, 1, 0, 0, 0))
# Also test that $SOURCE_DATE_EPOCH is respected.
requested_date = (2019, 5, 1, 10, 10, 10)
utc_timestamp = int((
datetime.datetime(*requested_date) - datetime.datetime(1970, 1, 1)
).total_seconds())
os.environ["SOURCE_DATE_EPOCH"] = str(utc_timestamp)
try:
assert_all_files_have_timestamp(requested_date)
finally:
os.environ.pop("SOURCE_DATE_EPOCH")