-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce bandwidth-requirement on Bisq app startup #25
Comments
I just compiled in some more data |
Regarding the following items from the description above:
Do you intend here to check these binary blobs into the main Bisq repository, or something else? I would really like to avoid adding more binary data to the repository (as we're already doing with all the stuff in |
If checking the blobs in is the intended solution, @freimair, I'd like us to look into doing this properly with Git LFS instead, and at the same time migrating the See:
/cc @wiz as running our own LFS server would be ops territory. Just FYI at this point. |
Also, from the introduction:
I'm unfamiliar with this problem, and reading this doesn't help me understand what's really going on. Why would a Bisq node need to send so much data to its seednode(s)? I could understand that it might need to receive quite a bit of data, but I'm clearly missing something. I read through the rest of the description a couple times and I don't think this is ever made clear. ELI5, please :) |
yes, I intend to check these binary blobs into the main Bisq repository. It is exactly about the stuff in
All in all, this project aims for making small steps towards a more reliable service. Rethinking the storage synchronization and locations is a whole other can of worms. Btw. just checked. We have 110k objects now, at the time of project creation it has been 104k -> approx. +5% in 25 days. |
The large binary objects in p2p/src/main/resources/ are updated on every Bisq release with the latest network data to avoid the need for new Bisq clients to download all of this information from the network, which would easily overload seed nodes and generally bog down the client. This approach works well enough for its purposes, but comes with the siginficant downside of storing all of this binary data in Git history forever. The current version of these binary objects total about 65M, and they grow with every release. In aggregate, this has caused the total size of the repository to grow to 360M, making it cumbersome to clone over a low-bandwith connection, and slowing down various local Git operations. To avoid further exacerbating this problem, this commit sets these files up to be tracked via Git LFS. There's nothing we can do about the 360M of files that already exist in history, but we can ensure it doesn't grow in this unchecked way going forward. For an understanding of how Git LFS works, see the reference material at [1], and see also the sample project and README at [2]. We are using GitHub's built-in LFS service here, and it's important to understand that there are storage and bandwidth limits in place. We have 1G total storage and 1G per month of bandwidth on the free tier. If we exceed this, we will need to purchase a "data pack" from GitHub, which will get us to 50G storage and bandwith. These are reasonably priced and not the end of the world if it becomes necessary. In an attempt to avoid this, however, the Travis CI build configuration has been updated to cache Git LFS files, such that they are not re-downloaded on every CI build, as this would very quickly cause us to exceed the free tier bandwith limit (see [3] and [4]). With that out of the way, the variable determining whether we exceed the monthly limit is how many clones we have every month. Only the latest version of LFS-tracked files are downloaded on a given clone, and at the current 65M that means we have about 15 clones per month. This is not very many, and will almost certainly mean that we exceed the limit. Our options at that point are to buy a data pack or to run our own LFS server. We would almost certainly do the former to start. Tracking these files via LFS means that developers will need to have Git LFS installed in order to properly synchronize the files. If a developer does not have LFS installed, cloning will complete successfully and the build would complete successfully, but the app would fail when trying to actually load the p2p data store files. For this reason, the build has been updated to proactively check that the p2p data store files have been properly synchronized via LFS, and if not, the build fails with a helpful error message. The docs/build.md instructions have also been updated accordingly. It is important that we make this change now, not only to avoid growing the repository in the way described above as we have been doing now for many releases, but also because we are now considering adding yet more binary objects to the repository, as proposed at bisq-network/projects#25. [1]: https://git-lfs.github.com [2]: https://github.com/cbeams/lfs-test [3]: https://docs-staging.travis-ci.com/user/customizing-the-build/#git-lfs [4]: travis-ci/travis-ci#8787 (comment)
The large binary objects in p2p/src/main/resources/ are updated on every Bisq release with the latest network data to avoid the need for new Bisq clients to download all of this information from the network, which would easily overload seed nodes and generally bog down the client. This approach works well enough for its purposes, but comes with the siginficant downside of storing all of this binary data in Git history forever. The current version of these binary objects total about 65M, and they grow with every release. In aggregate, this has caused the total size of the repository to grow to 360M, making it cumbersome to clone over a low-bandwith connection, and slowing down various local Git operations. To avoid further exacerbating this problem, this commit sets these files up to be tracked via Git LFS. There's nothing we can do about the 360M of files that already exist in history, but we can ensure it doesn't grow in this unchecked way going forward. For an understanding of how Git LFS works, see the reference material at [1], and see also the sample project and README at [2]. We are using GitHub's built-in LFS service here, and it's important to understand that there are storage and bandwidth limits in place. We have 1G total storage and 1G per month of bandwidth on the free tier. If we exceed this, we will need to purchase a "data pack" from GitHub, which will get us to 50G storage and bandwith. These are reasonably priced and not the end of the world if it becomes necessary. In an attempt to avoid this, however, the Travis CI build configuration has been updated to cache Git LFS files, such that they are not re-downloaded on every CI build, as this would very quickly cause us to exceed the free tier bandwith limit (see [3] and [4]). With that out of the way, the variable determining whether we exceed the monthly limit is how many clones we have every month. Only the latest version of LFS-tracked files are downloaded on a given clone, and at the current 65M that means we have about 15 clones per month. This is not very many, and will almost certainly mean that we exceed the limit. Our options at that point are to buy a data pack or to run our own LFS server. We would almost certainly do the former to start. Tracking these files via LFS means that developers will need to have Git LFS installed in order to properly synchronize the files. If a developer does not have LFS installed, cloning will complete successfully and the build would complete successfully, but the app would fail when trying to actually load the p2p data store files. For this reason, the build has been updated to proactively check that the p2p data store files have been properly synchronized via LFS, and if not, the build fails with a helpful error message. The docs/build.md instructions have also been updated accordingly. It is important that we make this change now, not only to avoid growing the repository in the way described above as we have been doing now for many releases, but also because we are now considering adding yet more binary objects to the repository, as proposed at bisq-network/projects#25. [1]: https://git-lfs.github.com [2]: https://github.com/cbeams/lfs-test [3]: https://docs-staging.travis-ci.com/user/customizing-the-build/#git-lfs [4]: travis-ci/travis-ci#8787 (comment)
The proposal looks well-formed, so I've removed the I am simply not well-informed enough about the details and alternatives to give a meaningful thumbs-up on approving this, but mine is just one voice. Like any other proposal, we should be looking for a broader consensus of interested and informed parties. If you are one of these people (@stejbac?), please provide feedback. The approach here looks pragmatic enough, but it would be good to see other informed opinions. From a budgeting perspective, it appears to me this is 100% dev team budget, so @ripcurlx, I'll leave it to you to weigh in on. |
And regarding my comments about Git LFS above, see bisq-network/bisq#4114, which will be treated separately from this project. |
This is a critical issue that reproduces on slow network connections often now |
For me this is a critical issue atm for some of our users, but as mentioned the group of people affected by this is growing every day. So from my side it would be a 👍 to start working on this project. |
@ripcurlx, I'll add the It would be great to see more engagement on approval, but even though we've gotten only a few comments here, it sounds like there's consensus we should go head. I'll add the |
@freimair, please move this to |
The large binary objects in p2p/src/main/resources/ are updated on every Bisq release with the latest network data to avoid the need for new Bisq clients to download all of this information from the network, which would easily overload seed nodes and generally bog down the client. This approach works well enough for its purposes, but comes with the significant downside of storing all of this binary data in Git history forever. The current version of these binary objects total about 65M, and they grow with every release. In aggregate, this has caused the total size of the repository to grow to 360M, making it cumbersome to clone over a low-bandwith connection, and slowing down various local Git operations. To avoid further exacerbating this problem, this commit sets these files up to be tracked via Git LFS. There's nothing we can do about the 360M of files that already exist in history, but we can ensure it doesn't grow in this unchecked way going forward. For an understanding of how Git LFS works, see the reference material at [1], and see also the sample project and README at [2]. The following command was used to track the files: $ git lfs track "p2p/src/main/resources/*BTC_MAINNET" Tracking "p2p/src/main/resources/AccountAgeWitnessStore_BTC_MAINNET" Tracking "p2p/src/main/resources/BlindVoteStore_BTC_MAINNET" Tracking "p2p/src/main/resources/DaoStateStore_BTC_MAINNET" Tracking "p2p/src/main/resources/ProposalStore_BTC_MAINNET" Tracking "p2p/src/main/resources/SignedWitnessStore_BTC_MAINNET" Tracking "p2p/src/main/resources/TradeStatistics2Store_BTC_MAINNET" We are using GitHub's built-in LFS service here, and it's important to understand that there are storage and bandwidth limits there. We have 1G total storage and 1G per month of bandwidth on the free tier. We will certainly exceed this, and so must purchase at least one "data pack" from GitHub, possibly two. One gets us to 50G storage and bandwith. In an attempt to avoid unnecessary LFS bandwidth usage, this commit also updates the Travis CI build configuration to cache Git LFS files, such that they are not re-downloaded on every CI build (see [3] and [4] below). With that out of the way, the variable determining whether we exceed the monthly limit is how many clones we have every month, and there are many, though it's not clear how many are are Travis CI and how many are users / developers. Tracking these files via LFS means that developers will need to have Git LFS installed in order to properly synchronize the files. If a developer does not have LFS installed, cloning will complete successfully and the build would complete successfully, but the app would fail when trying to actually load the p2p data store files. For this reason, the build has been updated to proactively check that the p2p data store files have been properly synchronized via LFS, and if not, the build fails with a helpful error message. The docs/build.md instructions have also been updated accordingly. It is important that we make this change now, not only to avoid growing the repository in the way described above as we have been doing now for many releases, but also because we are now considering adding yet more binary objects to the repository, as proposed at bisq-network/projects#25. [1]: https://git-lfs.github.com [2]: https://github.com/cbeams/lfs-test [3]: https://docs-staging.travis-ci.com/user/customizing-the-build/#git-lfs [4]: travis-ci/travis-ci#8787 (comment)
the implementation is currently being prepared to be tested in the production network. @sqrrm will upgrade his explorer-seednode to run the new code (that would be v1.3.5 + the changes of this project) so that a few devs can use it productively and see if anything bad shows up. The plan is to do so for one release cycle. If nothing bad shows up, we will proceed with the rather complex upgrade process. |
Is there any update on this? |
the project has been completed by bisq-network/bisq#4586 |
During startup, Bisq is required to send >4MB of data to seednodes in order to get its initial data. This is an issue because
The primary goal of this project is to reduce the amount of data to be sent on startup.
Why/why now?
Details: Problem statement, Numbers, Proposal, ...
Click to unfold
Problem statement
On startup, a Bisq application first requests up-to-date network data from two seednodes. Once data comes in, the Bisq app jumps from the loading screen to the trading UI. However, if no data arrives, Bisq stays at the loading screen forever.
There are two main reasons why this happens:
Numbers
The numbers below can be transformed to actual request size since each object is represented by a 20 Byte key in the initial outgoing data request basically saying "I already have that, please do not send it".
Data table
Proposed Solution
By adding the info "I am Bisq v1.2.1" to the list of known objects, we know what objects the client has - namely, objects shipped with the data stores of v1.2.1.
Advantages
Disadvantages
Optional followup projects
Risks
Alternative approaches
Tasks
Criteria for Delivery
Estimates
The text was updated successfully, but these errors were encountered: