-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPIKE][P2P] Configuration - Research LibP2P #17
Comments
@andrewnguyen22 @deblasis for your review |
@deblasis I added a new paper in pokt-network/pocket#305 that talks about eclipse attacks in Geth and it's a great starting point for understanding: https://arxiv.org/pdf/1908.10141.pdf. @jessicadaugherty I would say that there are 4 things we can take from LibP2P:
Even if we don't use (1) and (2), we could still potentially leverage (3) & (4). That should be part of the scope of work. |
Updated the description above. |
Thanks @Olshansk! |
@deblasis In addition to presenting the research to the team at a protocol hour at some point, could you also post aa "short" version of your research here once its done? I was thinking of something similar to the SMT evaluation here: pokt-network/pocket#199 (comment). |
TL;DR: As anticipated to @Olshansk I think we should move forward with LibP2PI have spent some time "playing with it", exploring the source code and reading the available documentation, some of the issues and PRs.
This thought has been bugging me quite a bit and I didn't want to commit to something that would have shown its limitations pretty quickly. The library appears:
LimitationsOut-of-the-box, perhaps it doesn't offer what we are looking for. It requires some adaptations and I was looking specifically at how to minimize the potential blast radius of the changes, so that keeping the codebase aligned with The default implementation for Basically, it has been developed with in mind the requirement of being able to Our use case is simpler:
Luckily, the library is pretty modular and CodeThey used some of the patterns that we use already in our codebase:
I think it shouldn't be too much of a learning curve and we can actually contribute back as we might eventually bump into unforeseen limitations/bugs. Peer discoveryThe library provides interfaces that can be implemented to @Olshansk this is also to say that perhaps you gathered incorrectly that the library "provides peer discovery" as you hinted at in one of our conversations. Substantially, if you take the core library there's no logic for it, only the interface // PeerRouting is a way to find address information about certain peers.
// This can be implemented by a simple lookup table, a tracking server,
// or even a DHT.
type PeerRouting interface {
// FindPeer searches for a peer with given ID, returns a peer.AddrInfo
// with relevant addresses.
FindPeer(context.Context, peer.ID) (peer.AddrInfo, error)
} Dynamic peer removal/addition churnThe routing table is an implementation detail that depends on the DHT, therefore it has to be developed accordingly, as a starting point, we have the default implementations that have to be altered depending on our algorithm(s) of choice. Kademlia DHT uses https://github.com/libp2p/go-libp2p-kbucket as routing table. Session management
The core library implements also a // ConnManager tracks connections to peers, and allows consumers to associate
// metadata with each peer.
//
// It enables connections to be trimmed based on implementation-defined
// heuristics. The ConnManager allows libp2p to enforce an upper bound on the
// total number of open connections.
//
// ConnManagers supporting decaying tags implement Decayer. Use the
// SupportsDecay function to safely cast an instance to Decayer, if supported. and also introspection, so that every peer can tell (internally or externally) how many sessions are currently open and with which other peers. Transport layer security (not to be confused with network security)Security is implemented, again, as a plug-in: // Security configures libp2p to use the given security transport (or transport
// constructor).
//
// Name is the protocol name.
//
// The transport can be a constructed security.Transport or a function taking
// any subset of this libp2p node's:
// * Public key
// * Private key
// * Peer ID
// * Host
// * Network
// * Peerstore TLS, specifically is quite trivial to implement/use out-of-the-box. Final thoughtsI would like the team to be aligned on our next steps. Especially if there are concerns of any kind. We could also use libp2p just as a framework, providing primitives, domain specific patterns, nomenclature, etc. Additional research materialI was putting together a presentation that also included some Gemini thoughts, given the circumstances (Gemini being descoped), I realise that perhaps a better use of everyone's time is if I can ask you guys to have a read ad this 👆 and come up with some questions that I will gladly focus on specifically during a protocol hour. Everything else, I guess we can start adding some tasks to cover the integration points and it looks like you guys are already on it. 🚀 |
From my rough notes:
These things are very much dependent on the implementation details of the algorithms that we are going to use/develop. From my readings, I spotted a common pattern: The routing table is used in such a way that's possible to determine a measure of "distance" or closeness between peers. This is achieved either by using bitwise XOR (Kademlia and geth) or by hashing parts of the address to define groups (hatgroups) of peers like in the case of Gemini for example, or other similar approaches. What comes into play as a potential attack vector is all the orchestration related to peer discovery and churn. Peer IDsTrust is often reliant on some form of identification, especially in the domain of network communication.
If we simply trust a peer because it says "hi" to us ( If we evict good peers from our Cryptographic signature verification aims at solving the identification problem but the fact that we might have malicious (byzantine) actors in the network is often "forgotten naively" or maybe it's just a very hard problem to solve. When derivatives of private keys like the hash of the public key are used to determine closeness between peers, the assumption is that all peers are good actors. In my opinion we should be defensive in that sense. There's no free lunch and perhaps additional security could involve more network hops (ideally O(n)), more coordination between peers (more complexity) or any other unforeseen tradeoff. I'd probably consider bounties for white-hat/ethical hackers at some point, after OSS has done its magic with more altruistic personas. These are probably conversations to be held at some point when our P2P stack matures, with a PoC in our calendar, this is probably out-of-scope for now (we are considering happy paths as a starting point) but we need to convey the message, loud and clear, that we are navigating almost uncharted waters, we only know that they are infested by sharks and that there are pirates and sirens. To summarise: it's not going to be an easy feat to pull out but I think that the team can do it. Suggested mitigation strategiesMy hunch is that we should embed some trust in the identities of the peers that would make it harder aka expensive for attackers to either
|
@deblasis Appreciate the notes and research, in particular the jokes are 💯. My immediative feedback is that the narrative style covers a lot but organizing it more into topics and splitting things by problems/solutions or short/long term would have made it easier to understand in what direction the ideas are heading. As discussed offline, getting more feedback on the technicals (networking layer, codebase, projects that use libp2p, testing/code practices, etc...) would have been really helpful. I wanted to summarize next steps in terms of research to close this out, we need to answer these questions:
|
🙏I appreciate the feedback @Olshansk, the truth is that I wasn't ready yet to present my findings, I was preparing a presentation that also partially covered Gemini using it as an example to swap the routing algorithm in LibP2P but then we moved on from that. Regardless, gathering feedback from you and the team was beneficial. This is a team decision, therefore I am very happy to dig deeper for the team's and my own benefit. Follow-up:For the records, my analysis has been performed on commit d8d2efaf First of all, some stats for the data nerds:Source: go-libp2p-stats.zip
Test coverageTL;DR, IMHO very good ✅ I sampled the lower values and it appears that they are mostly in mocks and files that have many interfaces and maybe just one function/method that has no tests.
Methodologygo test -v -shuffle=on -coverprofile=module-coverage.txt -coverpkg=./... ./...
go tool cover -html=module-coverage.txt -o=./cover.html The HTML at this point can be viewed in the browser, I extracted the above data from the DOM and formatted it in markdown Testing procedureThey run tests as part of CI/CD and also they use the The library per-se is not testing any routing or fancy stuff like that because that's handled in the respective packages (for example This highlights the fact that the core package is very modular and offers the low-level primitives without focusing on the algorithms that application developers like us will integrate/develop separately. HotspotsNumber of commits by file, generated with Repository: /Users/alex/CODE/OSS/go-libp2p Commits Path
This question requires a deep understanding of the library which unfortunately I don't have -yet-, so take these with a grain of salt.
├── config ✅
It's hard to look at the roadmap and think what we need by milestone. I'd go as far as M1 for now also because we still haven't decided if we are going to use LibP2P. We know is that by the end of 2023 Q1 we need to build the word Network in Pocket Network basically. Sounds easy! (Last famous words) I'll try to decompose the features/tasks starting from the bottom up: M1These should give us a "Basic LocalNet" using LibP2P and also prepare the ground for what comes next
In parallel (if there's capacity) I would also do:
My intuition regarding the latter is that we could leverage the concept of service/content discovery that is widely used in the context of P2P file sharing within the library. It's a hunch I have but it requires some extra thought. M2
M3
M4 |
I apologize for the latency on the round trip time it took me to ACK and RESPOND to this message. This is an amazing analysis and exactly what I was hoping to see!
Sorry about that. As discussed offline, let’s hold off the Gemini research right now.
Noted. I very much appreciate you sharing WIP in public! Will aim to do more of that myself as well. 📝
Amazing! I would have never guessed go has native tools to get git stats like this. 🧑🎓
I added a comment with the
Make sure to check out https://www.youtube.com/watch?v=M5gy_-nzcR8. Fancy way to test non fancy code :)
Amazing ⭐
This is a great way to summarize it and I like the use of emojis to make it easier to read. 😃
This 💯seems like the way to go. Offline we discussed about copy-pasting the interface, but now that I have read through all of this, I’m thinking of simply embedding it. See how I imported & embedded
Yea, we’ll definitely use k8s operators for it along with tilt.dev. @okdas is working on it in #186.
Tests are super useful, but I personally would focus more on debugging and visibility. Try to imagine a situation where we need to figure how who is seeing whom and see what’s happening. For example, imagine a way of exporting the address book into neo4j at different heights and/or timestamps and then visualizing it?
Without too much thought, I really like the direction of this idea. Will think / mull on it for a bit and follow up next week. 🤔 |
@deblasis Do you think there's anything else for us to do here? I feel like you've provided sufficient context for us to start using their interface as a foundation, and we'll learn more along the way. I think we can close this out unless there was more work you wanted to put here specifically. |
@Olshansk: It's super simple, just trying to capture some context and the next steps. If it needs more work LMK. Other than that I believe that this issue can be closed. Link: https://docs.google.com/document/d/1cAWdu8tfeVdMc0xBmUzWUQ53hVTgj_Otz7mdURaTZu4/edit?usp=sharing |
@deblasis A month late, but I finally read the document and really appreciate the summary. In particular, the actionable next steps. I referenced it in #438, which is an umbrella ticket capturing the follow-up work on the P2P module. Going to close this out as complete. |
Objective
Research LibP2P (in parallel with #16) to determine if and/or how we should use this library in V1 P2P with a focus on:
Origin Document
There has been prior research about LibP2P that determined this was not the best solution for peer discovery and churn as part of the P2P research work: V1 P2P discovery
However, as we continue to tackle RainTree tech debt and prepare to integrate the Persistence module with P2P, we should revisit this research to confirm that LibP2P2's Kademlia algorithm is not the right solution for Peer Discovery and Churn with a focus on:
Additionally, even if we do not leverage LibP2P for Peer Discovery and Churn, we should revisit this library prior to finishing the complete P2P module in case there are other elements of the library we wish to integrate, and if LibP2P will be compatible with the solution that we do land on for Peer Discovery and Churn with a focus on:
Goals
Deliverable
Non-goals / Non-deliverables
Creator: @jessicadaugherty
Editor: @Olshansk
The text was updated successfully, but these errors were encountered: