-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate what might be causing partial matches from single-file Etherscan contracts #936
Comments
I found the solution. Etherscan was replacing my Unix newlines ( Also, Etherscan strips trailing newlines (confirmed) and possibly other whitespace at the end of the contract. Newlines and other whitespace might be stripped from the beginning of the contract string too. |
@kuzdogan if I remember correctly Sourcify already has a function to test the keccak256 changing whitespaces and trailing newlines? EDIT: found it sourcify/packages/lib-sourcify/src/lib/validation.ts Lines 302 to 311 in 1dd0040
|
@marcocastignoli I did get a perfect match manually from Etherscan API output by replacing the newlines and regenerating the metadata from solc. Sourcify was not able to get a verified match on its own from importing from Etherscan, so I think this is still a bug. We should investigate the variators. (I also wrote my own application that tries variations in a similar way, and it got a perfect match on my own contract by using unix line endings + a newline at the end.) I have since learned more about the metadata hash. There are even more issues. Etherscan doesn't know the filename OR if the metadata entered by the user is even correct for single-file uploads. Examples:
These are all reasons why an Etherscan single file upload might result in a partial match. I was able to verify a smart contract on Etherscan with a different filename (e.g. (Probably everyone already knows this... but it's fun to put the thoughts together myself :) |
Sorry I forgot to mention that that function is not currently called inside the etherescan verification process, I'm trying right now to implement it. |
Forgot to reply to the message itself, thanks a lot for your research, actually very few people know about this, not me for example.
Is this application open source? |
I wrote it in about 20 minutes just for this issue. It is just a simple Python script that looks at permutations of transformations to the source code: change line endings to unix/Windows, prepend/append newline. It turns out the way I wrote it is just like Sourcify's variators. My version opens a subprocess and launches solc for each check, so it's not as clean as running solcjs directly. Instead of looking at the keccak256 hash of the source file, I was checking for a perfect bytecode match of the entire smart contract binary. |
@kuzdogan integrating this to Sourcify will take more time than I expected. I'll close this issue and create a new one explaining how we can implement it. |
@marcocastignoli Why? Isn't it simply putting the Etherscan verification through file variations? |
Potentially yes, but in order to be optimized we need some refactoring |
Alright let's open a new issue without closing this and link the intro text to that. I'll edit it if you can't |
@sealer3 do you have an example of a contract that gets perfectly verified in this case?
|
@marcocastignoli Yes. Here is one: https://sepolia.etherscan.io/address/0x6F1D75a53a8805DcA5347aE5F3eDE035CAE3CBC1 You can try for yourself by removing the carriage returns and adding a newline at the end before compiling. I submitted the correct metadata to Etherscan and named the file according to the contract name, so a Sourcify that tries the variations should be able to perfectly match it. |
experiment to create files varations before running verification searching for perfect matches
Ok I pushed this commit (5f79dc2) in a new branch with a proof of concept of this feature in which it successfully verifies the contract mentioned above. It's not optimized because it runs the compilation multiple times. In order to find all the files that are part of the same variation I had to group the generated variations per combination so now the @kuzdogan in this proof of concept I'm verifiying two times, the first one while searching for the perfect match with the variations, the second one when I found the right variation (giving a perfect match) passing through the standard sourcify process for session verification ( |
Thinking about this further, I think this should be something not limited to Etherscan verification. Currently, we take the metadata.json the user provides when verifying as the source of truth. Meaning we generate the variations of the source files and try to match them to the However, it could be that we can generate the metadata with the hash that matches the one in the bytecode. Here we're able to spot one common variation causing full matches to be lost. I know one possible case, for example, when Truffle adds I think this should be its own module that not only tries matching the hashes of the source files (as we do now), but also generates variations of the metadata files to match the CBOR metadata hash. |
Summarizing and tidying up what I'm thinking: Currently how we verify is as follows:
This lets us easily find matching sources for a given metadata but we don’t do variations on the metadata itself. Hence we can only get partial matches, in some cases.
|
…bytecode WIP * add tryToFindOriginalMetadata to CheckedContract: tries to recontruct the original metadata by iterating on all files variations * call tryToFindOriginalMetadata after `matchWithDeployedBytecode` if match is not perfect
…hash by updating the source section of the metadata and generating the metadata CID
* I translated the ipfsHash.cpp file from the solidity repo * now that everything is synchronous I simplified the code that replace the ipfs and bzz hash in the urls array in metadata.sources * I added some comments to clearify the storebyhash grouped by variation
* test in verification.spec.ts
Closing this, implemented with #976 |
While the PR looks good for the case provided here, I wasn't able to find any other examples for a contract that would normally be a "partial" match but gets a "perfect" match thanks to the variations. I am comparing verifying on staging (before the PR is merged) and running the PR locally. We are still getting lots of partial matches from "single file" contract verifications. We should still look for ways to increase our chances of verifying perfectly. Either we need to try more contracts, or we should further investigate why we can't generate a "perfect match" variation. Opening again |
Possible paths of research:
I imagine it will be impossible to match many of the contracts because the metadata (including filenames) is simply not accessible. The real filename might have nothing to do with anything inside the contract, and so it becomes almost impossible to guess. For instance, the contract could be generic while the filename is specific to the contract's application: a generic contract whose Solidity name is I wonder if in the future there will be discoveries of perfect matches like the way people run their computers to discover other things, like new large primes. Each one is a puzzle. |
That said, I think this will be a long process and I think it make sense to implement every solution when it is mature enough. If the features implemented in #976 work, then I would merge it without closing this issue. |
* #936 file variation while importing from etherscan experiment to create files varations before running verification searching for perfect matches * #936 generate the metadata with the hash that matches the one in the bytecode WIP * add tryToFindOriginalMetadata to CheckedContract: tries to recontruct the original metadata by iterating on all files variations * call tryToFindOriginalMetadata after `matchWithDeployedBytecode` if match is not perfect * restore etherscan standard verification * #936 tests working * update package-json lib-sourcify * #936 fix linting issues * wait 1 second between every etherscan request * #936 update non-session verify from etherscan controler * #936 better function naming `getMetadataFromCompiler` * #936 instead of recompiling each variation, recalculate the metadata hash by updating the source section of the metadata and generating the metadata CID * #936 fix error on non existance of the path in sources * #936 add coments to group by variations to make it more clear * #936 replace the ipfs calculate hash * I translated the ipfsHash.cpp file from the solidity repo * now that everything is synchronous I simplified the code that replace the ipfs and bzz hash in the urls array in metadata.sources * I added some comments to clearify the storebyhash grouped by variation * fix default ipfs gateway * #936 add test for tryToFindOriginalMetadata * #936 add swarmBzzr0Hash and test metadata with both ipfs and swarm * Add comments * #936 refactor tryToFindOriginalMetadata adding types * test in verification.spec.ts * #976 add license type in MetadataSources * #976 change test name for wrong end of line * Update packages/lib-sourcify/test/verification.spec.ts Co-authored-by: Kaan Uzdoğan <[email protected]> * #976 try to find original metadata for each verification method * refactor `verifyDeployed` so that it uses the new function ´tryToFindOriginalMetadataAndMatch´ --------- Co-authored-by: Marco Castignoli <[email protected]>
* instead of verifying if an address is valid, tries to checksum it
* instead of verifying if an address is valid, tries to checksum it
Looking into verifying contracts from Etherscan, I notice almost always we can verify perfectly if the contracts are with standard-json input, which is great.
This leaves me wondering if we can find a pattern in single file or multi-part contracts that would enable perfect matches. Because opposingly, we almost always have partial matches with those.
My initial naive guess was the prepended "Submitted to Etherscan at..." part in the code but apparently that's just added on the UI and not in the API.
A way to find out this might be:
Example single file: https://etherscan.io/address/0x3446dd70b2d52a6bf4a5a192d9b0a161295ab7f9#code https://etherscan.io/address/0x4691937a7508860f876c9c0a2a617e7d9e945d4b#code
View in Huly HI-391
The text was updated successfully, but these errors were encountered: