Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[tool]: archive format #3891

Merged
merged 73 commits into from
May 8, 2024

Conversation

charles-cooper
Copy link
Member

@charles-cooper charles-cooper commented Mar 24, 2024

What I did

add -f archive, which converts the given file and imported modules to a base64 encoded zip file.

also add -f solc_json, -f integrity and --base64

TODO:

  • compile from archive
  • tests

How I did it

How to verify it

Commit message

this commit adds several output formats to aid with build
reproducibility and source code verification:
- `-f archive`
- `-f solc_json`
- `-f integrity`
- `--base64`

`-f archive` creates a "vyper archive" using the zipfile format. it
emits the metadata associated with the build (settings, search path,
compiler version, integrity hash) in the `MANIFEST/` folder inside the
archive. `--base64` is only usable with `-f archive` and produces a
base64-encoded archive (which is easier to copy-paste).

both the base64 and binary versions of the archive round-trip. that is,
if you provide an archive directly to the vyper compiler (e.g.
`vyper contract.zip` or `vyper contract.zip.b64`), it should produce
exactly the same output as running `vyper contract.vy` on the local
machine with the same settings+environment that produced the archive.

`-f solc_json` is for compatibility with standard json that a lot of
verifiers and tooling currently uses. it uses the same "output bundle"
machinery as `-f archive`, but spits out in "standard-json" format
(consumable by `--standard-json`).

both of these use an `OutputBundle` abstraction, which abstracts
collecting the inputs to the build. these include
- settings (whatever is on the Settings object)
- search path
- compiler version
- integrity hash

importantly, `OutputBundle` recovers and anonymizes search paths used
during compilation. this is done to minimize leaking of user information
in archives. however, it comes with a tradeoff -- because of how the
anonymization works, it is possible to have a build where search paths
are not recoverable (specifically, if a module "escapes" its package
with too many `..`, the resulting anonymized path will be bogus).
several methods were tried to prevent this, but in the end this method
was chosen, which prioritizes minimizing leakage over handling edge
cases.

`-f integrity` produces an "integrity hash", which is basically a
checksum over the source file inputs. it is intended to let consumers
of the compiler know when any input in the dependency tree has changed
and recompilation is necessary. it is conservative by design; it works
by recursively hashing source text as opposed to AST or any other
semantic representation of source code. it can also be used by tooling
as a check to determine if the source tree in an archive is the same as
expected. this would likely be an additional check in addition to
bytecode comparison, since there could differences in source code (e.g.
comments) which affect the integrity hash but not the bytecode.

the integrity hash computation currently depends on all frontend
analysis to complete. in theory, since it only depends on source code,
it could be refactored into another preliminary pass in the compiler,
whose sole job is to resolve (and hash) imports. however, it would be
additional maintenance work. we could revisit if the performance of this
method becomes reported as an issue (note: current numbers are that this
method operates at roughly 2500 lloc per second).

currently, there are two places where build reproducibility might fail -
in checking the integrity hash of an archive or during archive
construction itself (if there is a compile-time failure, this could
happen for example if the user is trying to send a reproduction of an
error). it was decided that the most user-friendly thing to do is to
emit a warning in these cases, rather than adding additional compilation
flags that control whether to bail out or not.

the tentative canonical suffix for vyper archive (the zipfile version)
is `.vyz`, although this is subject to change (several alternatives were
also considered, including `.den` - as in "a den of vypers"!).

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

add an `-f archive` format, which traverses the import graph and
collects all the files into a zipfile, which is then base64 encoded to
stdout. this should be useful for passing around reproducible archives.
@charles-cooper charles-cooper changed the title feat: archive format feat[tool]: archive format Mar 24, 2024
@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2024

Codecov Report

Attention: Patch coverage is 63.74269% with 124 lines in your changes are missing coverage. Please review.

Project coverage is 53.91%. Comparing base (3af5390) to head (274dc62).

❗ Current head 274dc62 differs from pull request most recent head 6315ec1. Consider uploading reports for the commit 6315ec1 to get more accurate results

Files Patch % Lines
vyper/cli/vyper_compile.py 7.89% 35 Missing ⚠️
vyper/cli/compile_archive.py 35.13% 24 Missing ⚠️
vyper/compiler/output_bundle.py 87.31% 13 Missing and 4 partials ⚠️
vyper/compiler/input_bundle.py 50.00% 15 Missing and 1 partial ⚠️
vyper/compiler/settings.py 64.10% 7 Missing and 7 partials ⚠️
vyper/semantics/types/module.py 42.85% 10 Missing and 2 partials ⚠️
vyper/cli/vyper_json.py 16.66% 5 Missing ⚠️
vyper/semantics/analysis/base.py 66.66% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3891       +/-   ##
===========================================
- Coverage   90.86%   53.91%   -36.96%     
===========================================
  Files         103      100        -3     
  Lines       14908    15156      +248     
  Branches     3304     3358       +54     
===========================================
- Hits        13546     8171     -5375     
- Misses        941     6352     +5411     
- Partials      421      633      +212     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vyper/compiler/settings.py Fixed Show fixed Hide fixed
@charles-cooper charles-cooper added this to the v0.4.0 milestone Mar 26, 2024
@charles-cooper charles-cooper mentioned this pull request Mar 26, 2024
33 tasks
@@ -1,4 +1,5 @@
import contextlib
import dataclasses

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note

Module 'dataclasses' is imported with both 'import' and 'import from'.
@charles-cooper
Copy link
Member Author

the output should be the same, right?

(venv) custom[feat/archives*] % vyper tmp.zip -f archive  | vyper /dev/stdin
0x346100325761000c610024565b6100286100366038396100286038f35b604051600255565b5f60405261003061001c565b565b5f80fd5f3560e01c6361bc221a811861002057346100245760015460405260206040f35b5f5ffd5b5f80fd8418288000a1657679706572830004000012
(venv) custom[feat/archives*] % vyper tmp.zip -f solc_json  | vyper /dev/stdin
0x61000361000f6000396100036000f35f5ffd84038000a1657679706572830004000011

no, solc_json output should be used with vyper --standard-json

we could try detecting json input and skip the --standard-json thing altogether, but i think people already use vyper-json and we might as well keep the separation

@charles-cooper
Copy link
Member Author

charles-cooper commented May 5, 2024

(venv) custom[feat/archives*] % vyper test4.vy
Error compiling: test4.vy
vyper.exceptions.ModuleNotFound: ethereum.ercs.erc20

  contract "test4.vy:3", line 3:0 
       2
  ---> 3 from ethereum.ercs import erc20
  -------^
       4

(venv) custom[feat/archives*] % vyper -f archive test4.vy
Error compiling: test4.vy
vyper.exceptions.ModuleNotFound: ethereum.ercs.erc20

  contract "test4.vy:3", line 3:0 
       2
  ---> 3 from ethereum.ercs import erc20
  -------^
       4

Exception ignored in: <function ZipFile.__del__ at 0x101ad62a0>
ValueError: I/O operation on closed file.
(venv) custom[feat/archives*] % cat test4.vy             
# main.vy

from ethereum.ercs import erc20

a: erc20

that's a nice find

EDIT: fixed in 5d0533b fixed for real in 3c8a0ed

@charles-cooper
Copy link
Member Author

(venv) custom[feat/archives*] % vyper test4.vy
Error compiling: test4.vy
vyper.exceptions.ModuleNotFound: ethereum.ercs.erc20

  contract "test4.vy:3", line 3:0 
       2
  ---> 3 from ethereum.ercs import erc20
  -------^
       4

(venv) custom[feat/archives*] % vyper -f archive test4.vy
Error compiling: test4.vy
vyper.exceptions.ModuleNotFound: ethereum.ercs.erc20

  contract "test4.vy:3", line 3:0 
       2
  ---> 3 from ethereum.ercs import erc20
  -------^
       4

Exception ignored in: <function ZipFile.__del__ at 0x101ad62a0>
ValueError: I/O operation on closed file.
(venv) custom[feat/archives*] % cat test4.vy             
# main.vy

from ethereum.ercs import erc20

a: erc20

i looked into this a bit more since it's kind of weird that self._buf would be cleaned up before self.archive. you can reproduce by raising an exception anywhere in the zipfile construction, e.g.

@@ -233,6 +237,7 @@ class VyperArchiveWriter(OutputBundleWriter):
             self.archive.writestr("MANIFEST/cli_settings.txt", "")
 
     def write_integrity(self, integrity_sum: str):
+        raise ValueError
         self.archive.writestr("MANIFEST/integrity", integrity_sum)
 
     def write_compilation_target(self, targets: list[str]):

turns out this is likely a bug in the runtime, see python/cpython#81954

manually ordering the cleanup like follows fixes the issue:

diff --git a/vyper/compiler/output_bundle.py b/vyper/compiler/output_bundle.py
index a54cec106..42d658eb2 100644
--- a/vyper/compiler/output_bundle.py
+++ b/vyper/compiler/output_bundle.py
@@ -217,6 +217,10 @@ class VyperArchiveWriter(OutputBundleWriter):
         method = _get_compression_method()
         self.archive = zipfile.ZipFile(self._buf, mode="w", compression=method, compresslevel=9)
 
+    def __del__(self):
+        del self.archive
+        del self._buf
+
     def write_sources(self, sources: dict[str, CompilerInput]):
         for path, c in sources.items():
             self.archive.writestr(_anonymize(path), c.contents)

@cyberthirst
Copy link
Collaborator

cyberthirst commented May 6, 2024

the output should be the same, right?

(venv) custom[feat/archives*] % vyper tmp.zip -f archive  | vyper /dev/stdin
0x346100325761000c610024565b6100286100366038396100286038f35b604051600255565b5f60405261003061001c565b565b5f80fd5f3560e01c6361bc221a811861002057346100245760015460405260206040f35b5f5ffd5b5f80fd8418288000a1657679706572830004000012
(venv) custom[feat/archives*] % vyper tmp.zip -f solc_json  | vyper /dev/stdin
0x61000361000f6000396100036000f35f5ffd84038000a1657679706572830004000011

no, solc_json output should be used with vyper --standard-json

we could try detecting json input and skip the --standard-json thing altogether, but i think people already use vyper-json and we might as well keep the separation

so why do we output any bytecode whatsoever?

@cyberthirst
Copy link
Collaborator

(venv) custom[feat/archives*] % vyper test4.vy -f solc_json  > tmp.json
(venv) custom[feat/archives*] % vyper tmp.json
AssertionError
(venv) custom[feat/archives*] % cat test4.vy 
c: uint256%         

@charles-cooper charles-cooper merged commit 75c75c5 into vyperlang:master May 8, 2024
157 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants