-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored solc --link to use LinkerObject #12756
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I'll review it in detail later but here's some early feedback. For now just a few superficial things that stood out.
Also, test are failing. Please take a look at the failed CI runs. I think you must have removed some imports that are actually used.
Two more things:
|
@cameel Thanks for the review, I have made another commit addressing your initial issues so hopefully this fixes any issues! |
solc/CommandLineInterface.cpp
Outdated
@@ -931,45 +932,61 @@ void CommandLineInterface::link() | |||
// be just the cropped or '_'-padded library name, but this changed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this comment any more, at least not at this point in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was still relevant but looking closer at the PR now, it removes support for old-style placeholders and without it it's not necessary.
Are we fine with that though? Modern compiler versions do not produce them so I'd be fine with that but it's technically a breaking change.
solc/CommandLineInterface.cpp
Outdated
continue; | ||
|
||
size_t bytecodeIndex; | ||
// Bytecode can be on line 0 or 3, depending on if file header is included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to account for the output formatting of the non-file-based commandline interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes, because this is required to allow the tests to pass although, I'm away that it's not the most elegant solution. I also considered using a regex to find the index instead of hard coding it but, this seemed simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do one of these here:
- Assume a rigid file structure and validate it more strictly. For example when you skip files shorter than 3 lines, verify that these lines do not contain the bytecode (i.e. a string consisting only of hex chars without whitespace and possibly placeholders between them)
- Detect the bytecode and ignore everything else. This will be more robust against things like someone just taking the compiler output (with multiple bytecode pieces in it), putting it in a file and running the linker on it.
The current version is neither of these.
Given how the linker has been working so far (basically accepting anything, regardless of it's bytecode or not), I think (2) would be better.
solc/CommandLineInterface.cpp
Outdated
continue; | ||
|
||
size_t bytecodeIndex; | ||
// Bytecode can be on line 0 or 3, depending on if file header is included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do one of these here:
- Assume a rigid file structure and validate it more strictly. For example when you skip files shorter than 3 lines, verify that these lines do not contain the bytecode (i.e. a string consisting only of hex chars without whitespace and possibly placeholders between them)
- Detect the bytecode and ignore everything else. This will be more robust against things like someone just taking the compiler output (with multiple bytecode pieces in it), putting it in a file and running the linker on it.
The current version is neither of these.
Given how the linker has been working so far (basically accepting anything, regardless of it's bytecode or not), I think (2) would be better.
solc/CommandLineInterface.cpp
Outdated
if ( | ||
end - it < placeholderSize || | ||
*(it + 1) != '_' || | ||
*(it + placeholderSize - 2) != '_' || | ||
*(it + placeholderSize - 1) != '_' | ||
) | ||
solThrow( | ||
CommandLineExecutionError, | ||
"Error in binary object file " + src.first + " at position " + to_string(it - src.second.begin()) + "\n" + | ||
'"' + string(it, it + min(placeholderSize, static_cast<int>(end - it))) + "\" is not a valid link reference." | ||
); | ||
|
||
string foundPlaceholder(it, it + placeholderSize); | ||
if (librariesReplacements.count(foundPlaceholder)) | ||
string const& name = library.first; | ||
// Bytecode we need is on the third line there's a blank line at the start | ||
bytecode = sourceLines.at(bytecodeIndex); | ||
size_t libraryStartOffset = bytecode.find("__$"); | ||
|
||
for (size_t i = 0; i < placeholderSize; i++) | ||
bytecode[libraryStartOffset + i] = '0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code used to throw on invalid placeholders. Now it just looks for __$
and assumes that anything it founds must be a valid placeholder. But it could be something else ($__
missing) or it could be truncated and you'll go past the end of the bytecode
array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also reminds me that we have no tests covering this code at all. This is because we can't use our usual command-line tests for this - linker is a bit unusual compared to other input modes in that it modifies files in place and never prints them to stdout.
Recently test/solc/CommandLineInterface.cpp was added though so we could test it there. It would be something similar to cli_input
test but with different options and would need to create input files and then read output from created ones (we have helpers for most of this, see other tests in the file).
Or we could refactor link()
into a static method (taking libraries
and sourceCodes
as arguments) and then test that method. This would make the test much simpler.
In any case, we need at least three tests: input with placeholders, input with no placeholders, input with broken placeholders.
solc/CommandLineInterface.cpp
Outdated
} | ||
|
||
evmasm::LinkerObject linkerObject= { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evmasm::LinkerObject linkerObject= { | |
evmasm::LinkerObject linkerObject = { |
solc/CommandLineInterface.cpp
Outdated
map<u256, std::pair<std::string, std::vector<size_t>>>(), | ||
std::map<std::string, evmasm::LinkerObject::FunctionDebugData>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should be able to use empty initializer lists here and the compiler should infer the types:
map<u256, std::pair<std::string, std::vector<size_t>>>(), | |
std::map<std::string, evmasm::LinkerObject::FunctionDebugData>() | |
{}, | |
{} |
solc/CommandLineInterface.cpp
Outdated
linkerObject.link(librariesReplacements); | ||
sourceLines.at(bytecodeIndex) = toHex(linkerObject.bytecode); | ||
|
||
std::string resolvedBytecode = boost::algorithm::join(sourceLines, "\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string resolvedBytecode = boost::algorithm::join(sourceLines, "\n"); | |
string resolvedBytecode = boost::algorithm::join(sourceLines, "\n"); |
solc/CommandLineInterface.cpp
Outdated
else | ||
serr() << "Reference \"" << foundPlaceholder << "\" in file \"" << src.first << "\" still unresolved." << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being removed?
solc/CommandLineInterface.cpp
Outdated
@@ -931,45 +932,61 @@ void CommandLineInterface::link() | |||
// be just the cropped or '_'-padded library name, but this changed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was still relevant but looking closer at the PR now, it removes support for old-style placeholders and without it it's not necessary.
Are we fine with that though? Modern compiler versions do not produce them so I'd be fine with that but it's technically a breaking change.
I added the recommended changes and I've added in some unit tests to cover this functionality too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's more feedback. Sorry for a ton of comments but most of these are just small style corrections, improvement suggestions and error message tweaks. Here's a summary of the more important ones:
- More test cases needed.
- This removes support for old-style placeholders.
- This changes how newlines are handled (they're now always rewritten to
\n
). - Maybe we should allow leading/trailing whitespace on bytecode lines as the old implementation did?
solc/CommandLineInterface.cpp
Outdated
@@ -653,7 +654,8 @@ void CommandLineInterface::processInput() | |||
assemble(m_options.assembly.inputLanguage, m_options.assembly.targetMachine); | |||
break; | |||
case InputMode::Linker: | |||
link(); | |||
solAssert(m_options.input.mode == InputMode::Linker, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have recently made the description optional in asserts:
solAssert(m_options.input.mode == InputMode::Linker, ""); | |
solAssert(m_options.input.mode == InputMode::Linker); |
solc/CommandLineInterface.cpp
Outdated
{ | ||
solAssert(m_options.input.mode == InputMode::Linker, ""); | ||
return isxdigit(c) || c == '_' || c == '$'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, we have recently added wrappers for functions like isdigit()
to ensure we always use the C locale (#12740). I see from the docs that isxdigit()
is not affected by locale though so I guess using it directly is fine.
solc/CommandLineInterface.cpp
Outdated
evmasm::LinkerObject linkerObject = { | ||
fromHex(bytecode), | ||
linkReferences, | ||
{}, | ||
{} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evmasm::LinkerObject linkerObject = { | |
fromHex(bytecode), | |
linkReferences, | |
{}, | |
{} | |
}; | |
evmasm::LinkerObject linkerObject = {fromHex(bytecode), linkReferences, {}, {}}; |
solc/CommandLineInterface.cpp
Outdated
solThrow( | ||
CommandLineExecutionError, | ||
"Error in binary object file " + src.first + " unbounded placeholder without start point __$ at " + to_string(libraryEndOffset) + "\n" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solThrow( | |
CommandLineExecutionError, | |
"Error in binary object file " + src.first + " unbounded placeholder without start point __$ at " + to_string(libraryEndOffset) + "\n" | |
); | |
solThrow( | |
CommandLineExecutionError, | |
"Error in binary object file " + src.first + " unbounded placeholder without start point __$ at " + to_string(libraryEndOffset) + "\n" | |
); |
solc/CommandLineInterface.cpp
Outdated
if (startMarkerFound && endMarkerFound) | ||
{ | ||
if ((libraryEndOffset - libraryStartOffset) != placeholderSize - 3) | ||
{ | ||
solThrow( | ||
CommandLineExecutionError, | ||
"Error in binary object file " + src.first + " truncated placeholder found at " + to_string(libraryStartOffset) + | ||
" placeholder addresses should be " + to_string(placeholderSize) + " characters long (including address markers __$ and $__)\n" | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need braces if there's just one instruction inside if
or for
. The code is more concise without them:
if (startMarkerFound && endMarkerFound) | |
{ | |
if ((libraryEndOffset - libraryStartOffset) != placeholderSize - 3) | |
{ | |
solThrow( | |
CommandLineExecutionError, | |
"Error in binary object file " + src.first + " truncated placeholder found at " + to_string(libraryStartOffset) + | |
" placeholder addresses should be " + to_string(placeholderSize) + " characters long (including address markers __$ and $__)\n" | |
); | |
} | |
} | |
if (startMarkerFound && endMarkerFound) | |
if ((libraryEndOffset - libraryStartOffset) != placeholderSize - 3) | |
solThrow( | |
CommandLineExecutionError, | |
"Error in binary object file " + src.first + " truncated placeholder found at " + to_string(libraryStartOffset) + | |
" placeholder addresses should be " + to_string(placeholderSize) + " characters long (including address markers __$ and $__)\n" | |
); |
There are also a few other places like this in the PR.
test/solc/CommandLineInterface.cpp
Outdated
|
||
CommandLineInterface::link(fileReader, libraries, hasError, cerr); | ||
BOOST_TEST(!hasError); | ||
BOOST_TEST(fileReader.sourceUnits().find("bytecodeInput.bin")->second.find(hashString) != string::npos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just compare the result directly? Less chance of false positives by mistake and it's not that long anyway:
BOOST_TEST(fileReader.sourceUnits().find("bytecodeInput.bin")->second.find(hashString) != string::npos); | |
map<string, string> expectedSources = {{ | |
"bytecodeInput.bin", | |
"565b6040516100579190610158565b60405180910390f35b600073" + hashString + "63b3de648b8360405" | |
}}; | |
BOOST_TEST(fileReader.sourceUnits() == expectedSources); |
test/solc/CommandLineInterface.cpp
Outdated
fileReader.setSourceUnits({ | ||
{"bytecodeInput.bin", "565b6040516100579190610158565b60405180910390f35b600073$__b3157e226f2c4dddae135e6cab4ed4e74763b3de648b8360405"} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fileReader.setSourceUnits({ | |
{"bytecodeInput.bin", "565b6040516100579190610158565b60405180910390f35b600073$__b3157e226f2c4dddae135e6cab4ed4e74763b3de648b8360405"} | |
}); | |
fileReader.setSourceUnits({ | |
{"bytecodeInput.bin", "565b6040516100579190610158565b60405180910390f35b600073$__b3157e226f2c4dddae135e6cab4ed4e74763b3de648b8360405"} | |
}); |
test/solc/CommandLineInterface.cpp
Outdated
|
||
bool hasError = false; | ||
|
||
string expectedMessage ="Error in binary object file bytecodeInput.bin unbounded placeholder without start point __$ at 54\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string expectedMessage ="Error in binary object file bytecodeInput.bin unbounded placeholder without start point __$ at 54\n"; | |
string expectedMessage = "Error in binary object file bytecodeInput.bin unbounded placeholder without start point __$ at 54\n"; |
{ | ||
frontend::FileReader fileReader({},{},{}); | ||
fileReader.setSourceUnits({ | ||
{"bytecodeInput.bin", "565b6040516100579190610158565b60405180910390f35b600073__$b3157e226f2c4dddae135e6cab4ed4e74763b3de648b8360405"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a few more test cases? I think that the code might not work correctly with some of them:
- Old-style placeholder:
565b604051610057919__lib.sol:L_____________________________48b8360405
- Two starts, single end:
565b6040516100579190__$b36__$b3157e226f2c4dddae135e6cab4ed4e747$__63b3de648b8360405
- Placeholder in a placeholder:
565b6040516100579190__$b36__$b3157e226f2c4dddae135e6cab4ed4e747$__63b3de$__648b8360405
- Double-sided marker:
I think this was valid in the old implementation and would be treated as an old-style placeholder.
565b604051610057919__$__157e226f2c4dddae135e6cab4ed4e7__$__48b8360405
- Marker without
$
:Also technically a valid old-style placeholder.565b604051610057919___b3157e226f2c4dddae135e6cab4ed4e747___48b8360405
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be supporting the placeholder in a placeholder or double-sided marker cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double-sized one could be a valid old-style placeholder so yes, unless we drop support for old-style placeholders. The __
on the inside basically counts as part of the library identifier in that case.
But I'm mentioning that case mostly because it's a corner case that looks like it could easily confuse regex based detection.
As for nested ones, if the outer __
pair has the correct distance between markers, then again, it could be a valid old-style placeholder. But if it's the inner one that has the right length then the outer one should be treated as unmatched markers and result in an error.
solc/CommandLineInterface.cpp
Outdated
static bool isBytecodeChar(char c) | ||
{ | ||
solAssert(m_options.input.mode == InputMode::Linker, ""); | ||
return isxdigit(c) || c == '_' || c == '$'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the detection will unfortunately need to be a bit more complex to handle old-style placeholders since they could contain arbitrary chars. So it would be something like this on the outside of __$
and $__
and anything in between.
Maybe let's try this:
- The line has to contain
__
. - All chars before the first and after the last
__
must be valid hex chars. - There can be anything between the first
__
and last__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we can make our lives easier and actually drop support for these old-style placeholders.
@chriseth Would that be acceptable without a breaking release?
By the way, please don't get discouraged by the comments, I think this is overall going in the right direction :) Especially I like that we now will have this covered with some tests! |
@cameel No probblem, i understand, i'll get these changes done, always worth doing right! |
Seems to be waiting for review @cameel |
@leonardoalt yeah, I'm happy with it so just waiting for review now. |
@cameel Are you happy for me to merge this one now? It's been open for a while just wanted to see if you wanted any more changes. |
@bmoo0 Hello, sorry for not going back to this one sooner but there's just been too much happening. We still have a huge heap of PRs to get through and on top of that right now most of the team (including me) is still at Devconnect. So for now, just a few short comments. I discussed this with @chriseth earlier and the problem is really that this refactor complicates the mechanism more than we expected it would. There were two major conclusions from that discussion:
What do you think? If you agree with these changes I could try to go over the PR again and give some hints on what specifically should be carried over and what would not be necessary when we make the change breaking. |
Sounds good to me. I can actually break it out into two PRs if that's better. One containing the tests branched from the develop branch and the other adding the breaking change which can stem from the breaking branch, if that makes things simpler. |
specify that arguments are encoded using abi.encode in the example code
…tructions before simple terminations.
Great! Tests should go into |
It seems the PR needs a rebase to resolve the current conflicts. |
@@ -918,12 +917,64 @@ void CommandLineInterface::serveLSP() | |||
solThrow(CommandLineExecutionError, "LSP terminated abnormally."); | |||
} | |||
|
|||
static bool isBytecodeChar(char c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping this helper may still be nice.
{ | ||
return isxdigit(c) || c == '_' || c == '$'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but:
isxdigit(std::locale) | checks if a character is classified as a hexadecimal digit by a locale(function template)
Perhaps we'd be better off with just a manual check and could make it constexpr
that case.
Since this PR seems stale is going to be split up anyway, I'm closing for now. The code is still available in the branch. |
Hey sorry I was away for awhile count' work on this. I think the tests can still be merged if you think they're useful I just accidentally pulled too many changes into my branch and need to look up how to only merge my new tests. |
I have refactored solc/CommandLineInterface.cpp Fixes #10308