-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve tree shaking for smaller bundle sizes #3446
Comments
Thanks, this is a fantastic analysis 🤩 🙏, we'll definitely come back to this! Yes, you are totally right, we haven't optimized for tree shaking at all yet, which is a big topic we had already on the radar some time ago but somewhat went over it. This fits super well though right now. We have this on the plate anyhow to continue to improve general UX and browser UX in particular. And at the same time our breaking release round is not too far away, we'll likely reserve some 6 weeks or so (no not have to deal with two work branches in parallel for too long) in August/September. Guess this would be really good fit to take this upon during this time, with likely/best some preparation before. As it sounds this might need some heavy refactoring. We will see and need to decide how far we want to go here on what stuff, likely separate distinct decisions to make (e.g. around static constructors) and to weight here, if backwards-compatibility or tree shaking weights stronger. I have some tendency right now to at least theoretically be willing to go into some somewhat heavier "shake ups" (so: substantially refactor here and there), since this is so ground-reaching that this will have a very lasting effect on the library quality if we do this - even somewhat - right (so: e.g. "harvest" 70% of the potential in the first round). Anyhow: happy to stick to this - also mid-term - on the sideline. Will do some first experiments during next week! Again: thanks, this write-up from above is a great starting point! ❤️ 🙂 |
Thanks again, I am now through with reading a second time one a deeper level. I will open up separate issues on things I find worth pursuing and take it from there! 😃 |
I would also conceptualize these kind of work as being (for the most part) breaking, have therefore added a |
Worth to note is a also that there had been two issues on this issue already over the years, #1673 from 2022 and #2718 from 2023, the latter from Paul with some valuable guidance on how to specifically test with resulting build/bundle sizes with |
Making it breaking simplifies some of the suggestions here as well. A few of them attempt to not be breaking but are simpler if they can just be breaking |
If I do not misread our code or have a wrong understanding of "dependency injection" I would think this is what we do with KZG, see here. Or did we miss a code part where we do not appy this? Then let us know!
The EIP topic is definitely a big one! I was first-round-thinking relatively enthusiastic about the topic and had some idea and thought-about proposal to do a three inheritance EVM expansion like: Some relativization came after closer looking into EIP-specific code parts by searching mainly for There it turns out that - for the very most part - EIPs just add. functionality and do not replace (there are exceptions but they are rare). So for the most part it is (not sure about the added dots, side note, my editor does this) So after having a closer look I very much have the impression the win is little and not worth the effort, especially since EIPs additionally also need a lot of context from the EVM object or local context from the methods the code is running within. And there seems to be little gain to do this optimization dedicatedly for older. HFs. We do have some level of "problem" for future EIPs right now (mainly Verkle and EOF) in the form that we merge relatively much code early on which is rather for research/early testing and not yet for production. Here a If we stay on this topic (EIP integration/separation in EVM) it might still be worth to open a dedicated issue, how to do this in a clean way is generally a somewhat larger topic and there might be some gains "beyond bundle size". Not super sure if we want to prioritize though, this would need a really solid proposal if we want to pursue. |
I mixed up kzg wasm with rustbn-wasm |
In addition to what @roninjin10 described, Without More info: https://webpack.js.org/guides/tree-shaking/#mark-the-file-as-side-effect-free |
Thanks, have given this it's own issue since it seems pretty important to me that we not get over it! |
Compiling down a list of current code respectively bundle sizes of our packages so that we have a later reference. MethodThis is using code like below to do a plain import of a respective library and create a bundle with Note that this testing method is not capturing all practical use cases things should eventually be optimized for (at least for some prominent ones) and only provide a rough overview picture as well as some "benchmark" to later compare against! npm i -g esbuild
cd packages/trie Create a file import { Trie } from '@ethereumjs/trie'
const t = new Trie()
console.log(t.root()) The bundle can then be (re-)created with: npm run build && esbuild --bundle t1.mjs --outfile=out1.mjs Note: this build command is giving out CJS code I only discovered later on, adding Note2: for the higher level packages (Blockchain and above) using ResultsRLPimport { RLP } from '@ethereumjs/rlp'
const nestedList = [[], [[]], [[], [[]]]]
const encoded = RLP.encode(nestedList)
const decoded = RLP.decode(encoded)
console.log(decoded) Size: 6.1 KB Utilimport { hexToBytes } from '@ethereumjs/util'
console.log(hexToBytes('0x1234')) Size: 73.6 KB Quick additional test copy-pasting all examples together we have referenced in the "Usage" section of the README https://github.com/ethereumjs/ethereumjs-monorepo/tree/master/packages/util#usage : Size: 105.9 KB (so might still leave some room for optimizations, but overall not such a dramatic increase if broader Util used) Commonimport { Chain, Common, Hardfork } from '@ethereumjs/common'
const c = new Common({ chain: Chain.Mainnet, hardfork: Hardfork.Prague })
console.log(c.chainId()) Size: 172.6 KB Tximport { Chain, Common, Hardfork } from '@ethereumjs/common'
import { FeeMarketEIP1559Transaction } from '@ethereumjs/tx'
import { bytesToHex } from '@ethereumjs/util'
const common = new Common({ chain: Chain.Mainnet, hardfork: Hardfork.London })
const txData = {
data: '0x1a8451e600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
gasLimit: '0x02625a00',
maxPriorityFeePerGas: '0x01',
maxFeePerGas: '0xff',
nonce: '0x00',
to: '0xcccccccccccccccccccccccccccccccccccccccc',
value: '0x0186a0',
v: '0x01',
r: '0xafb6e247b1c490e284053c87ab5f6b59e219d51f743f7a4d83e400782bc7e4b9',
s: '0x479a268e0e0acd4de3f1e28e4fac2a6b32a4195e8dfa9d19147abe8807aa6f64',
chainId: '0x01',
accessList: [],
type: '0x02',
}
const tx = FeeMarketEIP1559Transaction.fromTxData(txData, { common })
console.log(bytesToHex(tx.hash())) // 0x6f9ef69ccb1de1aea64e511efd6542541008ced321887937c95b03779358ec8a Size: 214.0 KB Blob example from README with KZG: Size: 633.6 KB Trieimport { Trie } from '@ethereumjs/trie'
import { bytesToUtf8, MapDB, utf8ToBytes } from '@ethereumjs/util'
async function test() {
const trie = await Trie.create({ db: new MapDB() })
await trie.put(utf8ToBytes('test'), utf8ToBytes('one'))
const value = await trie.get(utf8ToBytes('test'))
console.log(value ? bytesToUtf8(value) : 'not found') // 'one'
}
test() Size: 300.8 KB Blockimport { Block } from '@ethereumjs/block'
import { Chain, Common, Hardfork } from '@ethereumjs/common'
const common = new Common({ chain: Chain.Mainnet, hardfork: Hardfork.London })
const block = Block.fromBlockData(
{
header: {
baseFeePerGas: BigInt(10),
gasLimit: BigInt(100),
gasUsed: BigInt(60),
},
},
{ common }
)
// Base fee will increase for next block since the
// gas used is greater than half the gas limit
console.log(Number(block.header.calcNextBaseFee())) // 11
// So for creating a block with a matching base fee in a certain
// chain context you can do:
const blockWithMatchingBaseFee = Block.fromBlockData(
{
header: {
baseFeePerGas: block.header.calcNextBaseFee(),
gasLimit: BigInt(100),
gasUsed: BigInt(60),
},
},
{ common }
)
console.log(Number(blockWithMatchingBaseFee.header.baseFeePerGas)) // 11 Size: 547.6 KB (also tested with a tx included but this (astonishingly) not makes a significant difference, only < 1KB added) Ethashimport { Ethash } from '@ethereumjs/ethash'
import { Block } from '@ethereumjs/block'
import { hexToBytes, MapDB } from '@ethereumjs/util'
const cacheDB = new MapDB()
const ethash = new Ethash(cacheDB)
const validblockRlp =
'0xf90667f905fba0a8d5b7a4793baaede98b5236954f634a0051842df6a252f6a80492fd888678bda01dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347948888f1f195afa192cfee860698584c030f4c9db1a0f93c8db1e931daa2e22e39b5d2da6fb4074e3d544094857608536155e3521bc1a0bb7495628f9160ddbcf6354380ee32c300d594e833caec3a428041a66e7bade1a0c7778a7376099ee2e5c455791c1885b5c361b95713fddcbe32d97fd01334d296b90100000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000000000008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000400000000000000000000000000000000000000000000000000000008302000001832fefd882560b84559c17b9b9040001020304050607080910111213141516171819202122232410000000000000000000200000000000000000003000000000000000000040000000000000000000500000000000000000006000000000000000000070000000000000000000800000000000000000009000000000000000000010000000000000000000100000000000000000002000000000000000000030000000000000000000400000000000000000005000000000000000000060000000000000000000700000000000000000008000000000000000000090000000000000000000100000000000000000001000000000000000000020000000000000000000300000000000000000004000000000000000000050000000000000000000600000000000000000007000000000000000000080000000000000000000900000000000000000001000000000000000000010000000000000000000200000000000000000003000000000000000000040000000000000000000500000000000000000006000000000000000000070000000000000000000800000000000000000009000000000000000000010000000000000000000100000000000000000002000000000000000000030000000000000000000400000000000000000005000000000000000000060000000000000000000700000000000000000008000000000000000000090000000000000000000100000000000000000001000000000000000000020000000000000000000300000000000000000004000000000000000000050000000000000000000600000000000000000007000000000000000000080000000000000000000900000000000000000001000000000000000000010000000000000000000200000000000000000003000000000000000000040000000000000000000500000000000000000006000000000000000000070000000000000000000800000000000000000009000000000000000000010000000000000000000100000000000000000002000000000000000000030000000000000000000400000000000000000005000000000000000000060000000000000000000700000000000000000008000000000000000000090000000000000000000100000000000000000001000000000000000000020000000000000000000300000000000000000004000000000000000000050000000000000000000600000000000000000007000000000000000000080000000000000000000900000000000000000001000000000000000000010000000000000000000200000000000000000003000000000000000000040000000000000000000500000000000000000006000000000000000000070000000000000000000800000000000000000009000000000000000000010000000000000000000a09c7b47112a3afb385c12924bf6280d273c106eea7caeaf5131d8776f61056c148876ae05d46b58d1fff866f864800a82c35094095e7baea6a6c7c4c2dfeb977efac326af552d8785012a05f200801ba01d2c92cfaeb04e53acdff2b5d42005ff6aacdb0105e64eb8c30c273f445d2782a01e7d50ffce57840360c57d94977b8cdebde614da23e8d1e77dc07928763cfe21c0'
const validBlock = Block.fromRLPSerializedBlock(hexToBytes(validblockRlp), {
setHardfork: true,
skipConsensusFormatValidation: true,
})
const result = await ethash.verifyPOW(validBlock)
console.log(result) // => true Size: 570.2 KB (!!!, another indication that the direct Ethash depedency for blockchain needs to be killed with fire and minimally and sufficient tree shaking refactor needs to be done!) BlockchainUsing full example from README here: Size: 653.8 KB StateManagerimport { Account, Address } from '@ethereumjs/util'
import { DefaultStateManager } from '@ethereumjs/statemanager'
import { hexToBytes } from '@ethereumjs/util'
const main = async () => {
const stateManager = new DefaultStateManager()
const address = new Address(hexToBytes('0xa94f5374fce5edbc8e2a8697c15331677e6ebf0b'))
const account = new Account(BigInt(0), BigInt(1000))
await stateManager.checkpoint()
await stateManager.putAccount(address, account)
await stateManager.commit()
await stateManager.flush()
// Account at address 0xa94f5374fce5edbc8e2a8697c15331677e6ebf0b has balance 1000
console.log(
`Account at address ${address.toString()} has balance ${
(await stateManager.getAccount(address))?.balance
}`
)
}
main() Size: 1.4 MB Ok. Here we are getting to the dramatic level of things. EVMimport { hexToBytes } from '@ethereumjs/util'
import { EVM } from '@ethereumjs/evm'
const main = async () => {
const evm = await EVM.create()
const res = await evm.runCode({ code: hexToBytes('0x6001') }) // PUSH1 01 -- simple bytecode to push 1 onto the stack
console.log(res.executionGasUsed) // 3n
}
main() With the current code base this gives the following error: This can be mitigated by going into the referenced Size: 2.5 MB Even more dramatic but also not too unexpected, also given the fact that the Blockchain dependency is in here (already). Verkleimport { VerkleTree } from '@ethereumjs/verkle'
import { bytesToUtf8, utf8ToBytes } from '@ethereumjs/util'
const tree = new VerkleTree()
async function test() {
await tree.put(utf8ToBytes('test'), utf8ToBytes('one'))
const value = await tree.get(utf8ToBytes('test'))
console.log(value ? bytesToUtf8(value) : 'not found') // 'one'
}
test() Size: 1015.4 KB 🤔 Note: Extremely important that this can be tree-shaked out or maybe/likely even better just does not get in by default in the first place! VM// ./examples/runTx.ts
import { Address } from '@ethereumjs/util'
import { Chain, Common, Hardfork } from '@ethereumjs/common'
import { LegacyTransaction } from '@ethereumjs/tx'
import { VM } from '@ethereumjs/vm'
const main = async () => {
const common = new Common({ chain: Chain.Mainnet, hardfork: Hardfork.Shanghai })
const vm = await VM.create({ common })
const tx = LegacyTransaction.fromTxData({
gasLimit: BigInt(21000),
gasPrice: BigInt(1000000000),
value: BigInt(1),
to: Address.zero(),
v: BigInt(37),
r: BigInt('62886504200765677832366398998081608852310526822767264927793100349258111544447'),
s: BigInt('21948396863567062449199529794141973192314514851405455194940751428901681436138'),
})
const res = await vm.runTx({ tx, skipBalance: true })
console.log(res.totalGasSpent) // 21000n - gas cost for simple ETH transfer
}
main() Size: 2.8 MB I'll spare some drama comments, but: yeah. 🙂 Ok, so first round conclusion from this: a lot of work to do (a lot more than I anticipated actually) but extremely rewarding and beneficial with potential gains to be realized distributed basically all over the place. From some rough estimates for various measures and some grown intuition I would say that it should be possible to get near or below the 1 MB mark for the VM and that we should take as some first round goal and for orientation. |
I'm assuming the biggest culprit for EVM/VM/verkle is the WASM deps that are bundled with them. I don't think we ever escape it for |
I wouldn’t be shocked if we could get common down to 1/10 the size if we made it tree shakeable |
After this is finished, I think it makes sense to write detailed comparison in changelog / announcements. Something like:
As for:
The example should't consume 73.6 KB - it should be like 500 bytes, when tree-shaken properly. Something is not right |
Great example of how to find things that aren't tree shaking. Just 2 tips for debugging:
|
Hi @paulmillr, the numbers here might be a bit misleading, this is tree shaking "for the whole package" (for the most part, if not otherwise stated), so taking the So this is rather more useful for the higher level libraries, so e.g. for VM, to see how much dependencies and code from the lower level internal dependencies (Tx, Block, Trie, StateManager,...) are drawn in. So for Util e.g. if I point this to e.g. the example in Yes, we will for sure publish a before/after comparison once we are ready, atm we are just in the middle of the work! 🙂 |
(ah, don't be misleaded by the name of the bundle, this is meaningless and has nothing to do with evm) |
Ok, let's close this now! We have this topic totally on the radar now and most from this issue is realized in some way or the other! 🤩 Still worth keep watching out for opportunities for sure, then to be tackled as independet and standalone issues. |
(and I will for sure provide numbers along the releases itself (so: what we saved, bundle sizes before/after)) |
To make Ethereumjs better for frontend, it's important we maximize code splitting.
Overview
What is tree shaking?
Tree shaking is when a bundler such as vite or esbuild are able to detect and delete unused code.
If the above code is a library and we only import
baz
, a bundler will be smart enough to delete foo and bar as unused. It will also be able to detectproduction
!==production
and delete that if block.When can a bundler tree shake
A bundler can only tree shake when it can safely and reliably know code is unused. The following cannot be tree shaken notably
require
andmodule.exports
rather than esm)All types treeshake so you can use a type (for example if a class is dependency injected) as long as you don't provide a default.
Package analysis
We will go package by package analyzing tree shaking opportunities. The tool used to do this is bundle.js
@ethereumjs/common
Doesn't tree shake at all as we can tell from the specific import being same size as entire package
Unnecessary static methods
Problem: All static methods on the Common class cannot tree shake
Solution: Move static methods to exports
This cannot tree shake even if custom is completely unused
This would tree shake
Class methods cannot treeshake
Problem: All class methods cannot be tree shaken
Solution: Consider moving class methods to a util function or seperate class. Especially if it's niche or unlikely to be used
Recommendation: I personally don't see problematic methods so I would skip this one
Individual chains and hardforks are not tree shakeable
Problem: many uses of Ethereumjs common will use a single chain or a single hardfork. But all chain or hardfork specific data is not tree shakable
Solution: Seperate all chain data into a single chain config
This is IMO the biggest opportunity the common package has. example, another example
There are many solutions as long as the chain data ends up seperate (never on same class, never on same object). One is an abstract class
@ethereumjs/evm
Does not tree shake at all
Class includes default implementations of dependency injection
Problem: Because the class instance includes a non tree shakeable default state manager and potentially other classes that are dependency injected, if a custom implementation is provided the unused default code is still included
Solution: Remove defaults in a new class called BaseEVM
The defaults are useful to reduce boilerplate so a pattern I propose is we start also creating more tree shakeable subclasses.
KZG related wasm doesn't tree shake
Problem: The KZG related wasm is huge and there are no options to replace with a mock or lazy load it
Solution: Dependency inject KZG
Since this is more verbose this also fits with the pattern of having tree shakeable and convenient constructors seperately
Unused code paths based on common cannot tree shake
Problem: There is code that runs only if a specific EIP is enabled and they cannot tree shake
Solution: Use strategy pattern to inject these via the EIP
What the solution would look like is injecting a strategy function that is on the Common or hardfork or EIP object
Logging and performance related tools do not tree shake
Problem: Rarely used code related to things like debug logging and performance cannot be tree shaken.
Solution: Require them to be dependency injected in
create
so the code is tree shaken if unusedThe text was updated successfully, but these errors were encountered: