Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent bytecode created for identical contract input using standard-json compiler input #9573

Closed
epheph opened this issue Aug 5, 2020 · 27 comments · Fixed by #9672
Closed
Labels

Comments

@epheph
Copy link

epheph commented Aug 5, 2020

Description

During deployment of Augur, one of the contracts would not verify on Etherscan properly: the deployment process created one bytecode for "ZeroXTrade", while giving the Standard Json to Etherscan and having them compile created bytecode with VERY slightly different bytecode (of different lengths!).

Due to past issues with contract verification, Augur flattens all contracts prior to deployment. The Standard Json that is provided to solidity lists many contracts to be compiled, but all of them are completely flattened and self-contained (with significant code-duplication between them).

The issue is that the bytecode created for a self-contained contract changes depending on what OTHER self-contained contracts are compiled at the same time. In the example below, I have linked two different inputs that are identical except for an additional self-contained contract being compiled at the same time which demonstrates this issue.

Environment

  • Compiler version: 0.5.17
  • Target EVM version (as per compiler settings): 0.5.17
  • Framework/IDE (e.g. Truffle or Remix): n/a
  • EVM execution environment / backend / blockchain client:
  • Operating system: Linux

Steps to Reproduce

standard-json examples that show the issue:

https://gist.github.com/epheph/20428a1ff7163bce1d47be7d7b623344

The only difference between these two files is the "Augur.sol" section (which jq is discarding the output of).

If you run each of these through solc 0.5.17:

$ cat 1-contract.json  | solc-0.5.17 --standard-json \
      | jq -r '.contracts["trading/ZeroXTrade.sol"].ZeroXTrade.evm.bytecode.object' | egrep -o '^.{64}'
60806040526000805461ffff1916905534801561001b57600080fd5b506147c2

$ cat 2-contracts.json  | solc-0.5.17 --standard-json \
     | jq -r '.contracts["trading/ZeroXTrade.sol"].ZeroXTrade.evm.bytecode.object' | egrep -o '^.{64}'
60806040526000805461ffff1916905534801561001b57600080fd5b506147d1

Look at the last characters, c2 vs d1. There are many other slight differences, but this is the first.

ZeroXTrade.sol makes NO reference to Augur.sol, so the fact that Augur.sol is compiled in the same solidity run should not change the ZeroXTrade's bytecode.

/cc @nuevoalex

@chriseth
Copy link
Contributor

chriseth commented Aug 5, 2020

Thanks for reporting this!

Looking at the assembly output, two things are interesting:

  • the source references to internal routines are different, which means that the internal routines themselves are probably different
  • the differences in generated code look like different decisions being made in the optimizer

@chriseth
Copy link
Contributor

chriseth commented Aug 5, 2020

To fix this properly, we have to reset the yul string repository between contracts, which would invalidate the inline assembly part of the AST which is stored in parsed form and thus uses yul strings.

@chriseth
Copy link
Contributor

chriseth commented Aug 5, 2020

Actually the difference is only in the internal routines, so resetting the yul string repository might be all that is needed.

@chriseth
Copy link
Contributor

chriseth commented Aug 5, 2020

It turns out that the problem is that AST IDs are encoded into function names which leads to the optimizing handling them in a different order resulting in a different way functions are inlined.

@chriseth
Copy link
Contributor

chriseth commented Aug 5, 2020

Current plan of attack:

  • rename all identifiers to look nicer (we wanted to do that anyway) and remove AST IDs in the process
  • modify the function inliner such that it inlines functions in an order that does not depend (very much) on the names of the functions: Start with leaves of the call graph ( Handle "leaf functions" first in full inliner. #9314 ) and use the block hasher to make it deterministic

@alcuadrado
Copy link
Member

This bug has really big implications for Buidler's new compilation pipeline. We are now forced to disable some optimizations we've been working on.

Can someone point to the first version of solc with this problem? That would be incredibly valuable.

@chriseth
Copy link
Contributor

The problem should be present whenever you have some Yul code and the yul optimizer is activated. It got activated together with the regular optimizer starting from 0.6.0. You have Yul code whenever you use ABIEncoderV2. I think there are also some other situations here yul code is generated, but it might not have that problem.

Actually I think this only happens if you have multiple files in one compilation unit that contain structs or contracts with the same name. So if you don't just flatten everything ( ;) ), you should be rather safe. The reason is that the names of the Yul functions always have the data type name and its AST ID. If the data type name is already unique, a change in the AST ID should not matter much.

@alcuadrado
Copy link
Member

alcuadrado commented Aug 23, 2020

So if you don't just flatten everything ( ;) ), you should be rather safe.

I think that's not always true. I believe our new compilation pipeline can easily trigger this bug.

Here's an example of what we were planning to do, and why it would fail:

  • Suppose you have files A.sol, B.sol and C.sol. B.sol imports A.sol.
  • Starting with a clean project, you'd compile A.sol, B.sol and C.sol together.
  • Then, you modify B.sol and it gets compiled alongside A.sol.
  • The user then deploys each a contract from B.sol, and forgets to verify it, nor is willing to commit the compilation output to git. Note that this is extremely common. I can't stress enough how common this is.
  • For some reason, like migrating to another pc, the project's cache/artifacts get cleared.
  • The user now wants to verify the contract from B.sol, so they compile the project. i.e. A.sol, B.sol and C.sol together.
  • B.sol's output may be different, and if so, the verification would fail.

@chriseth
Copy link
Contributor

@alcuadrado maybe I didn't understand, but are you saying that a user deploys a contract from a source that is not stored permanently and then cannot verify the contract? If this is so extremely common, then deployment tools should make sure they only work from a clean git repo that is synced to origin or even better, store the sources and the metadata somewhere.

Or did I misunderstand you - I don't see how this is related to importing or flattening, for example.

@fvictorio
Copy link
Contributor

@chriseth the sources (meaning, the Solidity files) are stored permanently. What is not stored permanently is the JSON input used by solc. In the first scenario, this JSON had the contracts [A, B, C], but in the second scenario it only had [B, C].

The point is that, because of this bug, this optimization (ignoring A in a re-compilation because it doesn't affect B and C) cannot be done. Does that make sense?

@chriseth
Copy link
Contributor

@fvictorio as I said, my assumption is that this bug does not manifest itself as long as the files do not contain structs or contracts of the same name, but I will try to verify this assumption.

In general, the optimizer settings, file names, remappings and so on are crucial for verification and thus have to be stored. It would be great if deployment tools could help store the metadata file. If the metadata file is not requested as one of the output artefacts, then it is very much more difficult to source-verify a contract.

@fvictorio
Copy link
Contributor

Oh, yes, I forgot to add that detail: in that example, files A and B have contracts with the same name (and optimization is enabled, etc.) I was able to reproduce this bug that way.

So the thing we want to do is to be able to make the optimization of only compiling B and C if they were modified but A wasn't. With this bug, we won't be able to do that: we'll always need to include A even if it's not necessary. I think the point @alcuadrado was making is that this is an example of this bug manifesting itself while there's no flattening involved and limiting a potentially useful feature.

I agree that this, in theory, this bug could be worked around by the deployment tool, but that doesn't change our situation with respect to the compilation pipeline.

@chriseth
Copy link
Contributor

@fvictorio ok, you are right, I confused the flattening in the initial example with the main problem of two files sharing identifiers of the same name. I assume that if you don't flatten then this will be a very rare case, but I guess we'll just fix this issue and then we are all happy - as long as everyone uses the latest version of Solidity :)

@chriseth
Copy link
Contributor

Disabling the FullInliner results in the same bytecode, so it might be that we only have to fix that component.

@ekpyron I think using the blockHasher does not really help here because it will consider two functions to be different if they call two different but equivalent functions.

@chriseth
Copy link
Contributor

It turns out the actual fix is much easier: The function inliner processes functions according to their YulString sorting order. If they are processed based on their name (or position in the source, which is equivalent) instead, the bug is gone, and even the metadata is fully equal.

@fvictorio
Copy link
Contributor

@chriseth can we assume that this bug will never happen if the optimizer is disabled?

fvictorio added a commit to NomicFoundation/hardhat that referenced this issue Aug 25, 2020
See issue ethereum/solidity#9573
When the solc optimizer is enabled, there might be inconsistencies in
the generated bytecodes. To avoid this we need to make sure that the
files involved are always the same for a given group. We don't need to
do this if the solc optimizer is disabled.
@chriseth
Copy link
Contributor

@fvictorio this specific bug needs the yul optimizer to be enabled.

@chriseth
Copy link
Contributor

It turns out that if you update the example code to Solidity 0.7.0, the bug is not present. I compared the yul code generated and it seems that there are some functions that are duplicated it 0.5.17: One for "memory" and one for "memory pointer", while the distinction between "pointer" or "not pointer" is only relevant for storage and thus it was removed for memory at some point.

I implemented and verified a fix, but we cannot test it due to lack of example input.

@chriseth
Copy link
Contributor

If someone encounters an example that fails for 0.7.0, please report it!

@gitpusha
Copy link
Contributor

Hi. This bug caused me a lot of trouble recently when I tried to verify some contracts that a colleague had compiled using solc-js 0.6.10 and deployed. I understand that this problem should be fixed with solidity 0.7.x. @chriseth - concerning my previously deployed contracts, should I have any other (security?) concerns with regards to this bug other than that verification might be difficult (if we havent stored the metadata)?

@chriseth
Copy link
Contributor

The cause of the bug was just that optimizer steps were applied in a different order to the individual functions, so no reason for concern.

@elenadimitrova
Copy link
Collaborator

Leaving a note here that this affects Argent wallet contracts that are on solc 0.6.12 and using ABIEncoderV2 pragma. When compiling these as part of the overall set of contracts they generate different bytecode than when compiled on their own. Example bytecode differences for ApprovedTransfer contract:

Compiled as part of larger contracts set
https://gist.github.com/elenadimitrova/9073bc073ee72a7264cafe2982823f6b#file-approvedtransfer_bytecode_compiledwithall

Compiled alone
https://gist.github.com/elenadimitrova/4f58d3795d009a05be2c2f56ca77d97c#file-approvedtransfer_bytecode_compiledalone

Affected contracts are: ApprovedTransfer, TokenExchanger, TransferManager, LimitStorage and RelayerManager.
By compiling ApprovedTransfer and RelayerManager independently we've managed to produce the same bytecode as Etherscan for verification however this wasn't successful for TransferManager and TokenExchanger contracts which makes me think there is a larger issue here. I'd like this issue to be reopened and addressed as part of an update to 0.6 (as it is not an issue with 0.7).

@vittominacori
Copy link

Hi @fvictorio @alcuadrado is this issue solved?
I always had that issue and it means that after developing with hardhat I must compile contracts with truffle before deploy otherwise I won't be able to verify them on etherscan.

For instance if I compile SampleContract in this repository I obtain the following (different) bytecode:

Is there any hardhat configuration I'm missing?

@scnale
Copy link

scnale commented Dec 20, 2022

I don't think you can reasonably expect hardhat and truffle to produce the same bytecode since they both implement their own build pipeline. I.e. they define the standard input JSON that is passed to the compiler, which is not necessarily the same.

@fvictorio
Copy link
Contributor

@vittominacori from what I can tell, the difference is just in the metadata (the last bytes) generated by Hardhat and Truffle. I don't quite understand why they are different though. (Btw, I obtained this by actually compiling your project, because the gist files you shared are super different in length, that can't be right.)

This issue is about a difference in the executable part of the bytecode, so I don't think you are running into a regression.

@vittominacori
Copy link

@vittominacori from what I can tell, the difference is just in the metadata (the last bytes) generated by Hardhat and Truffle. I don't quite understand why they are different though. (Btw, I obtained this by actually compiling your project, because the gist files you shared are super different in length, that can't be right.)

This issue is about a difference in the executable part of the bytecode, so I don't think you are running into a regression.

Yes, sorry. Maybe I had old node modules installed.
I've just executed again with a fresh install and I updated the gist files.
They are still different but now I'm able to verify code on etherscan. Thanks

Here the env

  • Hardhat 2.12.4
  • Truffle v5.7.0 (core: 5.7.0)
  • Ganache v7.5.0
  • Solidity - 0.8.17 (solc-js)
  • Node v16.18.1
  • npm 8.19.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants