Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new --no-use-system-time flag to use a deterministic timestamp in built PEX #722

Merged
merged 32 commits into from
May 8, 2019
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
8b2fd37
Add deterministic_datetime() function
Eric-Arellano May 2, 2019
b440fbf
Proof of concept #1: monkey patch approach that works on 3.7
Eric-Arellano May 2, 2019
4b53a9e
Add new CLI flag --use-system-time
Eric-Arellano May 2, 2019
226c3bc
Approach #2: use writestr()
Eric-Arellano May 2, 2019
baa20c4
Add test for deterministic timestamp
Eric-Arellano May 3, 2019
a27122c
Unksip 6/7 of the acceptance tests!
Eric-Arellano May 3, 2019
e9a067a
Explain reproducibility
Eric-Arellano May 3, 2019
0295ddc
Setup integration tests to use --no-use-system-time
Eric-Arellano May 3, 2019
80adeff
Fix regression to normal case
Eric-Arellano May 3, 2019
5fe4a59
Set default to true to ensure we arent breaking anything
Eric-Arellano May 3, 2019
829d12d
Test deterministic timestamp against all CI
Eric-Arellano May 3, 2019
fdc1dc0
Skip acceptance tests still failing
Eric-Arellano May 3, 2019
80b5008
Fix typo in link
Eric-Arellano May 3, 2019
794084e
Fix bug with permissions not copying
Eric-Arellano May 3, 2019
952cf9e
Revert "Test deterministic timestamp against all CI"
Eric-Arellano May 3, 2019
a7b8558
Ensure env gets cleaned up, even if test fails
Eric-Arellano May 3, 2019
fda9a16
Merge branch 'master' of github.com:pantsbuild/pex into timestamps
Eric-Arellano May 5, 2019
f7364de
Convert deterministic_datetime from a function to a constant
Eric-Arellano May 5, 2019
0835f60
Revert making deterministic_datetime a constant
Eric-Arellano May 5, 2019
0c9bd41
No longer respect SOURCE_DATE_EPOCH
Eric-Arellano May 6, 2019
9774c2a
Improve comments
Eric-Arellano May 6, 2019
477e2eb
Change rationale for not respecting SOURCE_DATE_EPOCH
Eric-Arellano May 6, 2019
631b3ed
Drop overly verbose comment about SOURCE_DATE_EPOCH
Eric-Arellano May 7, 2019
69106c0
Create new zip_info_from_file() function as part of PermPreservingZip…
Eric-Arellano May 7, 2019
d82cd99
Always use zf.writestr()
Eric-Arellano May 7, 2019
b9d36e8
Remove bad extra whitespace
Eric-Arellano May 7, 2019
ebe15c2
Store DETERMINISTIC_DATETIME as a 6-tuple rather than datetime
Eric-Arellano May 7, 2019
099f146
Invert ternary expression
Eric-Arellano May 7, 2019
23c5a6f
Oops, typo in constant
Eric-Arellano May 7, 2019
8bb8ed4
Deduplicate slicing the time.struct_time
Eric-Arellano May 8, 2019
a528eb4
Don't lie about expected input
Eric-Arellano May 8, 2019
adf2726
Fix typo that was causing failures
Eric-Arellano May 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion pex/bin/pex.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,17 @@ def configure_clp_pex_options(parser):
'likely will not be reproducible, meaning that if you were to run `./pex -o` with the '
'same inputs then the new pex would not be byte-for-byte identical to the original.')

group.add_option(
'--use-system-time', '--no-use-system-time',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions for a better name.

The main use case I want to optimize for is when deterministic time will be the default. For example, atm --use-deterministic-time would probably be a good name, but once we make the switch then --no-use-deterministic-time wouldn't be very descriptive for what the behavior would end up being.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pantsbuild/pants#6094 we made a flag -jar-creation-time which, if not set, defaulted to the current time. Not sure whether this is a good or bad default for pex, but it's some prior art :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illicitonion : I think that we agreed on the other issue that we want this enabled by default, so the default would need to be stable.

@Eric-Arellano : This name looks good I think?

dest='use_system_time',
default=True,
action='callback',
callback=parse_bool,
help='Use the current system time to generate timestamps for the new pex. Otherwise, Pex '
'will use midnight on January 1, 1980. By using system time, the generated pex '
'will not be reproducible, meaning that if you were to run `./pex -o` with the '
'same inputs then the new pex would not be byte-for-byte identical to the original.')

parser.add_option_group(group)


Expand Down Expand Up @@ -694,7 +705,11 @@ def main(args=None):
log('Saving PEX file to %s' % options.pex_name, V=options.verbosity)
tmp_name = options.pex_name + '~'
safe_delete(tmp_name)
pex_builder.build(tmp_name, bytecode_compile=options.compile)
pex_builder.build(
tmp_name,
bytecode_compile=options.compile,
deterministic_timestamp=not options.use_system_time
)
os.rename(tmp_name, options.pex_name)
else:
if not _compatible_with_current_platform(options.platforms):
Expand Down
51 changes: 48 additions & 3 deletions pex/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,18 @@
import threading
import zipfile
from collections import defaultdict
from datetime import datetime
from uuid import uuid4


# We use the start of MS-DOS time, which is what zipfiles use (see section 4.4.6 of
# https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT). Note that ZipInfo expects a
# time.struct_time, which we set here.
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
DETERMINISTIC_DATETIME = datetime(
year=1980, month=1, day=1, hour=0, minute=0, second=0, tzinfo=None
).timetuple()


def die(msg, exit_code=1):
print(msg, file=sys.stderr)
sys.exit(exit_code)
Expand Down Expand Up @@ -82,6 +91,34 @@ def teardown(self):
class PermPreservingZipFile(zipfile.ZipFile, object):
"""A ZipFile that works around https://bugs.python.org/issue15795"""

@classmethod
def zip_info_from_file(cls, filename, arcname=None, date_time=None):
"""Construct a ZipInfo for a file on the filesystem.

Usually this is provided directly as a method of ZipInfo, but it is not implemented in Python
2.7 so we re-implement it here. The main divergance we make from the original is adding a
parameter for the datetime (a time.struct_time), which allows us to use a deterministic
timestamp. See https://github.com/python/cpython/blob/master/Lib/zipfile.py#L495."""
st = os.stat(filename)
isdir = stat.S_ISDIR(st.st_mode)
if arcname is None:
arcname = filename
arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
while arcname[0] in (os.sep, os.altsep):
arcname = arcname[1:]
if isdir:
arcname += '/'
if date_time is None:
datetime = time.localtime(st.st_mtime)
zinfo = zipfile.ZipInfo(filename=arcname, date_time=date_time[:6])
zinfo.external_attr = (st.st_mode & 0xFFFF) << 16 # Unix attributes
if isdir:
zinfo.file_size = 0
zinfo.external_attr |= 0x10 # MS-DOS directory flag
else:
zinfo.file_size = st.st_size
return zinfo

def _extract_member(self, member, targetpath, pwd):
result = super(PermPreservingZipFile, self)._extract_member(member, targetpath, pwd)
info = member if isinstance(member, zipfile.ZipInfo) else self.getinfo(member)
Expand All @@ -98,7 +135,7 @@ def _chmod(self, info, path):

@contextlib.contextmanager
def open_zip(path, *args, **kwargs):
"""A contextmanager for zip files. Passes through positional and kwargs to zipfile.ZipFile."""
"""A contextmanager for zip files. Passes through positional and kwargs to zipfile.ZipFile."""
with contextlib.closing(PermPreservingZipFile(path, *args, **kwargs)) as zip:
yield zip

Expand Down Expand Up @@ -363,7 +400,15 @@ def __str__(self):
def delete(self):
shutil.rmtree(self.chroot)

def zip(self, filename, mode='w'):
def zip(self, filename, mode='w', deterministic_timestamp=False):
with open_zip(filename, mode) as zf:
for f in sorted(self.files()):
zf.write(os.path.join(self.chroot, f), arcname=f, compress_type=zipfile.ZIP_DEFLATED)
full_path = os.path.join(self.chroot, f)
zinfo = zf.zip_info_from_file(
filename=full_path,
arcname=f,
date_time=DETERMINISTIC_DATETIME if deterministic_timestamp else None
)
with open(full_path, 'rb') as open_f:
data = open_f.read()
zf.writestr(zinfo, data, compress_type=zipfile.ZIP_DEFLATED)
5 changes: 3 additions & 2 deletions pex/pex_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,11 +428,12 @@ def freeze(self, bytecode_compile=True):
self._precompile_source()
self._frozen = True

def build(self, filename, bytecode_compile=True):
def build(self, filename, bytecode_compile=True, deterministic_timestamp=False):
Eric-Arellano marked this conversation as resolved.
Show resolved Hide resolved
"""Package the PEX into a zipfile.

:param filename: The filename where the PEX should be stored.
:param bytecode_compile: If True, precompile .py files into .pyc files.
:param deterministic_timestamp: If True, will use our hardcoded time for zipfile timestamps.

If the PEXBuilder is not yet frozen, it will be frozen by ``build``. This renders the
PEXBuilder immutable.
Expand All @@ -450,7 +451,7 @@ def build(self, filename, bytecode_compile=True):
with open(filename + '~', 'ab') as pexfile:
assert os.path.getsize(pexfile.name) == 0
pexfile.write(to_bytes('%s\n' % self._shebang))
self._chroot.zip(filename + '~', mode='a')
self._chroot.zip(filename + '~', mode='a', deterministic_timestamp=deterministic_timestamp)
if os.path.exists(filename):
os.unlink(filename)
os.rename(filename + '~', filename)
Expand Down
6 changes: 1 addition & 5 deletions tests/test_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -1353,7 +1353,7 @@ def assert_reproducible_build(args):
# https://tox.readthedocs.io/en/latest/example/basic.html#special-handling-of-pythonhashseed.
def create_pex(path, seed):
result = run_pex_command(
args + ['-o', path, '--no-compile'],
args + ['-o', path, '--no-compile', '--no-use-system-time'],
env=make_env(PYTHONHASHSEED=seed)
)
result.assert_success()
Expand Down Expand Up @@ -1381,7 +1381,6 @@ def create_pex(path, seed):
assert filecmp.cmp(pex1, pex2, shallow=False)


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_no_args():
assert_reproducible_build([])

Expand All @@ -1397,7 +1396,6 @@ def test_reproducible_build_sdist_requirements():
assert_reproducible_build(['pycparser==2.19', '--no-build'])


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_m_flag():
assert_reproducible_build(['-m', 'pydoc'])

Expand All @@ -1416,11 +1414,9 @@ def test_reproducible_build_c_flag():
assert_reproducible_build([project_dir, '-c', 'my_app'])


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_python_flag():
assert_reproducible_build(['--python=python2.7'])


@pytest.mark.skip("Acceptance test for landing https://github.com/pantsbuild/pex/issues/716.")
def test_reproducible_build_python_shebang_flag():
assert_reproducible_build(['--python-shebang=/usr/bin/python'])
10 changes: 10 additions & 0 deletions tests/test_pex_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import os
import stat
import zipfile

import pytest

Expand Down Expand Up @@ -177,3 +178,12 @@ def build_and_check(path, copy):

build_and_check(td2, False)
build_and_check(td3, True)


def test_pex_builder_deterministic_timestamp():
pb = PEXBuilder()
with temporary_dir() as td:
target = os.path.join(td, 'foo.pex')
pb.build(target, deterministic_timestamp=True)
with zipfile.ZipFile(target) as zf:
assert all(zinfo.date_time == (1980, 1, 1, 0, 0, 0) for zinfo in zf.infolist())