-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go cat'ing from js is significantly slower the other combinations + heap out of memory #203
Comments
From the call I didn't got it actually run out of memory. That's extremely bad and it will need to be fixed. @litzenberger can you sync up with @alanshaw when you get online tomorrow? He would like to add that benchmark. Once that is up, we can see what clinic will tell us. |
Yes, will write this test now. |
Added js -> go and go -> js. Here are the results running locally: js -> go
go -> js
|
Clinic reports for js -> go |
There are several bottlenecks/slowness in the benchmarks, and there are a lot of things we can optimize. We did not see the increase of memory and releated crash on our test with 64MB of RAM. In fact, memory usage was flat. Maybe there is something else that is different in our test vs your integration? @jacobheun @alanshaw can you check? AnalysisFirst and foremost, there is a Node Core issue in nodejs/node#25741 . This is related to a change we made in 10.x to avoid saturating the thread pool when reading big files, and it is backfiring us on this benchmark (it was not intended). I'm working on that on the Node.js core side of things: I can work on a PR to expedite the resolution. Ref: https://upload.clinicjs.org/public/d0491b68df5b070fa226600ac4ce0389d04f35ab4e7f2aa0c718d5ae615101e4/34583.clinic-flame.html#selectedNode=8234&zoomedNode=8234&exclude=83c0&merged=true. Secondary, there is an issue with the bn.js library in the tree. It uses new Buffer somewhere, and the dependency tree must be updated. Ref: https://upload.clinicjs.org/public/d0491b68df5b070fa226600ac4ce0389d04f35ab4e7f2aa0c718d5ae615101e4/34583.clinic-flame.html#selectedNode=7057&zoomedNode=7057&exclude=83c0&merged=true. Thirdly, we are spending a lot of time in faux asynchronous activity due to libp2p/js-libp2p-mplex#89. Ref: https://upload.clinicjs.org/public/ad48c42cc98c35249690f2a8bb8b7a05880de6e8b3f781a5c720c42e06d0489c/34730.clinic-bubbleprof.html#x71-c42. |
I think the likely culprit here is due to the tests we were running when we hit the heap issue were dialing out to the live ipfs network. My guess is that when we're connected to multiple nodes bitswap starts spamming the network with requests. Perhaps we should add a test that connects the go and js nodes to some other passive nodes to see if there's a significant change?
I am going to start looking into getting this resolved this week. |
Should we add such test ourselves? bitswap never got into our radar. How would you design such a benchmark? |
Bitswap is ultimately called by the Block Service anytime a block is moved around the network. My initial thought was to connect some number, maybe 2 so we can have at least 1 of each, of passive peers to both the go and js node. Then perform the extract benchmark. Adding and cat'ing the file should cause network calls to happen to each of the connected peers, unless I am mistaken. This might be too indirect of a benchmark though. Another approach could be to call the block api directly after connecting multiple nodes, but I'm not sure how adequate that would be for a test. @alanshaw do you have any thoughts on another approach for this? |
Using Node.js v8.15.0 I'm seeing very similar results for the same tests: ✓ go -> js: 67.1 MB (5219ms)
✓ js -> go: 67.1 MB (8891ms) <-
✓ js -> js2: 67.1 MB (7646ms)
✓ go -> go2: 67.1 MB (2550ms)
✓ go -> js: 134 MB (11176ms)
✓ js -> go: 134 MB (17821ms) <-
✓ js -> js2: 134 MB (13542ms)
✓ go -> go2: 134 MB (6134ms)
✓ go -> js: 537 MB (43629ms)
✓ js -> go: 537 MB (76079ms) <-
✓ js -> js2: 537 MB (53180ms)
✓ go -> go2: 537 MB (23463ms) |
Oh, I understood we couldn't run ipfs in Node 8 anymore. Maybe I understood that wrong and I didn't try. Then definitely it's not nodejs/node#25741 that is causing this. We're still affected by that on Node v10, it's just not causing this specific issue. |
These tests don't give us any other info by default other than the time it took... |
Last week we encountered an issue running the ipfs/interop tests against the latest 0.34 js-ipfs release. When running the exchange files test, Windows hit a JS heap out of memory issue transfering a 134MB file, you can see the ci error here.
While investigating the issue I noticed that when cat'ing large files, js -> go, is significantly the slowest combination. Considering the go implementation is faster, I'd assume that js -> js and go -> js would likely be slower, but they are not.
The heap issue may not be directly related, but this is an area we should investigate.
Of note, the interop tests were making calls out to the greater network because the nodes were connecting to the bootstrap nodes. This was disabled to avoid connecting to the network on test runs, which mitigated the issue for that test suite. If the heap allocation cannot be reproduced with direct tests, adding additional peers may be needed.
Here is the ci log for posterity:
The text was updated successfully, but these errors were encountered: