Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: strip embedded file name #31080

Open
danbst opened this issue Nov 1, 2017 · 13 comments
Open

Python: strip embedded file name #31080

danbst opened this issue Nov 1, 2017 · 13 comments
Assignees
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: python

Comments

@danbst
Copy link
Contributor

danbst commented Nov 1, 2017

Issue description

Python compile .pyc files with source filename built into them. For example, here is result of one decompilation:

$ uncompyle6 $b1/__init__.pyc
# uncompyle6 version 2.13.2
# Python bytecode 2.7 (62211)
# Decompiled from: Python 2.7.13 (default, Dec 17 2016, 20:05:07)
# [GCC 6.4.0]
# Embedded file name: /tmp/nix-build-python2.7-backports.shutil_get_terminal_size-1.0.0.drv-0/backports.shutil_get_terminal_size-1.0.0/dist/tmpbuild/backports.shutil-get-terminal-size/backports/__init__.py
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
# okay decompiling /nix/store/rqwkwjv1jkm4dykymcvvmni0zy2i472y-python2.7-backports.shutil_get_terminal_size-1.0.0/lib/python2.7/site-packages/backports/__init__.pyc

$ uncompyle6 $b2/__init__.pyc
# uncompyle6 version 2.13.2
# Python bytecode 2.7 (62211)
# Decompiled from: Python 2.7.13 (default, Dec 17 2016, 20:05:07)
# [GCC 6.4.0]
# Embedded file name: /build/configparser-3.5.0/dist/tmpbuild/configparser/backports/__init__.py
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
# okay decompiling /nix/store/glny1ajfrqwyvjpn1rgnjqdjsndkc3vl-python2.7-configparser-3.5.0/lib/python2.7/site-packages/backports/__init__.pyc

Both files were produced from same source code and same nixpkgs version, but differ in source file name. We should strip this embedded file name either during compilation (patch Python) or in post-fixup phase.

There is an example of stripping done in codewarrior0/pyinstaller@a770058

Debian overcomes this issue by compiling .pyc on a user machine.

Steps to reproduce

with import <nixpkgs> { };
pythonFull.withPackages (ps: with ps; [
    ipython
    pylint
])

This results into a collision. Source .py files differ indeed, but only in comments.

Technical details

  • Nixpkgs version: 18.03.pre.799435b
  • Sandboxing enabled: no

cc @FRidh

@danbst
Copy link
Contributor Author

danbst commented Nov 1, 2017

My fix is to remove offending .pyc file. It also does some magic related to backports namespace

diff --git a/pkgs/top-level/python-packages.nix b/pkgs/top-level/python-packages.nix
index 0be1ff17ee..b584c90316 100644
--- a/pkgs/top-level/python-packages.nix
+++ b/pkgs/top-level/python-packages.nix
@@ -80,6 +80,19 @@ let
         else throw "Unsupported kind ${kind}");
     in fetcher (builtins.removeAttrs attrs ["format"]) );

+  backportsCommon = ''
+    ## See "absolutely essential rule" in https://pypi.python.org/pypi/backports
+    cat > "$(find $out -regex ".*backports/__init__.py")" <<EOF
+    # A Python "namespace package" http://www.python.org/dev/peps/pep-0382/
+    # This always goes inside of a namespace package's __init__.py
+
+    from pkgutil import extend_path
+    __path__ = extend_path(__path__, __name__)
+    EOF
+    ## .pyc collision fix
+    find $out -regex ".*backports/__init__.pyc" -delete
+  '';
+
 in {

   inherit python bootstrapped-pip pythonAtLeast pythonOlder isPy26 isPy27 isPy33 isPy34 isPy35 isPy36 isPyPy isPy3k mkPythonDerivation buildPythonPackage buildPythonApplication;
@@ -1116,6 +1129,8 @@ in {
       sha256 = "713e7a8228ae80341c70586d1cc0a8caa5207346927e23d09dcbcaf18eadec80";
     };

+    postFixup = backportsCommon;
+
     meta = {
       description = "A backport of the get_terminal_size function from Python 3.3’s shutil.";
       homepage = https://github.com/chrippa/backports.shutil_get_terminal_size;

But still would be great of generic stripping is implemented. Also, ignoreCollisions = true is still another pragmatic solution.

@orivej
Copy link
Contributor

orivej commented Nov 1, 2017

Thank you for the investigation of the source of the collision. However, embedded file names are necessary for a lot of things, such as displaying source code in the backtraces and finding resources near files. The only issue is with __init__.py files of namespace packages, which is solved by ignoreCollisions = true.

@FRidh FRidh self-assigned this Nov 1, 2017
@orivej
Copy link
Contributor

orivej commented Nov 1, 2017

@danbst How does python use these filenames in .pyc files?

@danbst
Copy link
Contributor Author

danbst commented Nov 1, 2017

@orivej

However, embedded file names are necessary for a lot of things, such as displaying source code in the backtraces and finding resources near files.

Hm. I've never seen these embedded names in callstacks. Those refer to build-time files, which don't even belong to /nix/store, essentially dangling links.

This is more of reproducibility. Hydra-built packages and locally-built should result into same derivation. .pyc files from Hydra contain /build... references, .pyc files locally contain /tmp/nix-build-... references.

How does python use these filenames in .pyc files?

I don't know, haven't investigated that deep.

The only issue is with init.py files of namespace packages, which is solved by ignoreCollisions = true

Actually, what I tried was to remove the ignoreCollisions = true here. Also, python.withPackages doesn't offer this parameter, but still suffers (see example in the OP post)

BTW, now I think my patch for backports is a good compromise. I should insert that postFixup phase into every backports package to mitigate collision for __init_.py (not .pyc) file.

@bjornfor
Copy link
Contributor

bjornfor commented Nov 1, 2017

This is more of reproducibility. Hydra-built packages and locally-built should result into same derivation. .pyc files from Hydra contain /build... references, .pyc files locally contain /tmp/nix-build-... references.

I believe this is because Hydra use Nix 1.12 (unstable) which builds in "/build", whereas Nix 1.11 builds in "/tmp/nix-build-*".

@FRidh
Copy link
Member

FRidh commented Nov 1, 2017

The collision is indeed due to #2412 (comment).

If we would want to strip this filename then I think we should patch the interpreter like we did for SOURCE_DATE_EPOCH. We may also choose to rewrite it considering we know $out. That would not solve the collision.

@danbst
Copy link
Contributor Author

danbst commented Nov 1, 2017

Great to know I wasn't first who discovered this issue! So, there are 2 issues here:

  1. Patch .pyc to refer to /nix/store paths instead of build paths. As far as I see co_filename is manipulated in modulefinder library. I haven't found other usecases, still it's good as a general cleanup.

  2. Collision for namepsace modules. There is a trivial fix, which I've posted above, that patches namespace package so it a) doesn't provide .pyc for namespace __init__ and b) ensures __init__.py won't collide with other namespace packages. We better not provide .pyc file here, but rather generate all missing .pyc in buildEnv.postBuild.

@FRidh but I see you marked problem 2 as "wontfix" in a linked issue. What do you think about my solution? This is kind of collision that can be solved at distro level.

@orivej
Copy link
Contributor

orivej commented Nov 1, 2017

The second point is OK, could you submit a pull request?

As for the first point, it should be simpler to compile .pyc after copying .py files into the store.

@danbst
Copy link
Contributor Author

danbst commented Nov 2, 2017

@orivej I guess there are reasons why files aren't compiled in store directly. For example, it's not possible to hook into setup.py to do exactly that.

Probably remove all .pyc in postInstall phase and then run compiler for whole $out? @FRidh you may know why that doesn't happen.

@orivej
Copy link
Contributor

orivej commented Nov 2, 2017

Probably remove all .pyc in postInstall phase and then run compiler for whole $out?

Yes, this is what I imagine (for all outputs).

@danbst
Copy link
Contributor Author

danbst commented Nov 2, 2017

@FRidh @orivej I've tried to do what I've described, but got into trouble. Some backported packages do declare namespace, which makes NOT creating __init__.py/pyc in first place. I'm a bit WTF on this, because absent __init__.py should declare namespace only in Python3... So investigations still required...

@FRidh
Copy link
Member

FRidh commented Nov 25, 2017

Regarding the namespace packages, there are two solutions I could think of:

  1. have an attribute, e.g. namespaces = [ "backports" ];. buildPythonPackage could remove the __init__.py, and buildEnv uses the info to create a __init__.py when it does the integration;
  2. create a hook that removes __init__.py and provides a __init__.py itself.

@stale
Copy link

stale bot commented Jun 5, 2020

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: python
Projects
None yet
Development

No branches or pull requests

4 participants