Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: stream blockchain #1210

Conversation

MorningLightMountain713
Copy link
Contributor

Allows any Fluxnode to stream the blockchain, at breakneck speeds, with some heavy caveats.

Designed for UPnP nodes.

Background:

The uncompressed Flux blockchain stands at around 36Gb. Compressed with gzip, around 26Gb. (Approx 30% reduction) It is not uncommon to see blockchain downloads in excess of 1 hour via CDN, dependent on an operators internet connection.

Copying the chain file from one node to another is cumbersome, and the chain goes stale.

Use case:

Your average node owner at home, who has a node or two running already via UPnP, and wants to fire up another. They may or may not have good internet, either way - this will save them time and won't cost them any data usage with their ISP. They can just stream the chain off one of their existing nodes, and the chain is up to date already.

Authentication thoughts:

I have been in two minds about node owner authentication on this endpoint. I decided against it for the following reasons:

  • This endpoint is only available via RFC1918 private address space. So only accessible on the LAN. The upstream is guaranteed to be public address space (no source NAT) and denied, as FluxOS won't confirm if it can't see public addresses for the websockets.
  • There is no Personally Identifiable Information (PII) in the transfer. The tarfile only has relative directories of blocks, chainstate, and determ_zelnodes.
  • The whole point of this is to make it easier on node operators - if they have to authenticate prior to using the endpoint, it is orders of magnitude more difficult.

The Feature:

Of note - since we are on express version > 4.16, body-parser is no longer required. (It's part of express now) so have updated this and removed the package. Have added extra requirement tar-fs

The pull introduces a new endpoint /flux/streamchain Yes this is related to the blockchain, which would be the daemon, but it didn't make sense to go under the daemon endpoint, as we are dealing with files here, not fluxd rpcs.

This endpoint when called via POST, will stream the chain to you live, and can be called via cURL. POST is used so as to dissuade misktaken calls via a browser.

Example:

curl -X POST http://<Node IP>:<Node port>/flux/streamchain -o flux_explorer_bootstrap.tar

Leverages the fact that a lot of nodes run on the same hypervisor, wherethey share a brige or v-switch. In this case - the transfer is as fast as your SSD. Real life testing showed speeds of 3.2Gbps on an Evo 980+ SSD. Able to download the entire chain UNCOMPRESED in 90 seconds.

Even in a traditional LAN, most consumer grade hardware is 1Gbps - this should be easily achievable in your average home network.

During normal operation, the flux daemon (fluxd) must NOT be running, this is sot he database is in a consistent state when it is read. However, during testing, and WITHOUT compression, as long as the chain transfer is reasonably fast, there is minimal risk of a db compaction happening, and corrupting the new data.

This method can transfer data compressed (using gzip) or uncompressed. It is recommended to only stream the data uncompressed. If using compression on the fly, this uses a lot of CPU and will slow the transfers down by 10-20 times, while only saving ~30% on file size. If the daemon is still running during this time, IT WILL CORRUPT THE NEW DATA. (tested)

Due to this, if compression is used, the daemon MUST not be running.

There is an unsafe mode, where a user can transfer the chain while the daemon is still running. Of note, the data being copied will not be corrupted, only the new chain. I have used this over a dozen times without any issue, but use at your own risk.

Only allows one stream at a time - will return 503 if stream in progress. If passing in options, the Content-Type header must be set to application/json

In a future pull - I'll look to tidy up some of the express post handlers, they don't need listeners anymore.

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 7, 2024

I've found a bug in tar-fs where if the stream is ended early via a user not adding -o filename and curl immediately closing the connection, it will throw an uncatchable errror, which crashes the Node process, and causes a restart.

A normal exit from curl with ctrl + c is fine as the tar pack has been processed already.

This is pretty edge case, and I have a issue open on the providers repo - will see what they come back with.

@MorningLightMountain713
Copy link
Contributor Author

I've found a bug in tar-fs where if the stream is ended early via a user not adding -o filename and curl immediately closing the connection, it will throw an uncatchable errror, which crashes the Node process, and causes a restart.

A normal exit from curl with ctrl + c is fine as the tar pack has been processed already.

This is pretty edge case, and I have a issue open on the providers repo - will see what they come back with.

I've fixed the issue in the upstream repo, just waiting to see if the author wants a pull or will just update himself

@TheTrunk TheTrunk requested review from TheTrunk and Cabecinha84 and removed request for TheTrunk February 8, 2024 15:18
@MorningLightMountain713
Copy link
Contributor Author

Looks like my pull will get merged upstream.

mafintosh/tar-fs#109

Will wait for the next release. If it takes too long, can just inline tar-fs as it's not that much code. The heavy lifting is done in tar-stream

David White added 12 commits February 20, 2024 19:27
Since express version 4.16, you no longer need to import body-parser,
it's part of express, so have updated and removed.

Added tar-fs requirement to stream tarfile

Added streamChain method, with restrictions on when the chain can be
streamed. usually, requires fluxd to NOT be running on subject host.
@MorningLightMountain713
Copy link
Contributor Author

Looks like my pull will get merged upstream.

mafintosh/tar-fs#109

Will wait for the next release. If it takes too long, can just inline tar-fs as it's not that much code. The heavy lifting is done in tar-stream

My fix just got merged upstream for tar-fs. I can resolve merge conflicts here and we could review and merge? (If you think this is a reasonable feature)

@Cabecinha84
Copy link
Member

Yes please.

@TheTrunk
Copy link
Member

I think this should have node admin privileges. So it can be streamed to outside but only the admin of the node can use this feature?
Or another way introducing userconfig api key? That every user will be able to set its own api key which is easier to work with rather than signing messages.

@MorningLightMountain713
Copy link
Contributor Author

I think this should have node admin privileges. So it can be streamed to outside but only the admin of the node can use this feature? Or another way introducing userconfig api key? That every user will be able to set its own api key which is easier to work with rather than signing messages.

Interesting. Yeah it makes sense to put some auth on there and open it up. I'll take a look over the weekend and see what I can come up with.

Cheers

@MorningLightMountain713
Copy link
Contributor Author

I'll close this and reopen on flux repo - need to write some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants