-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add field to store/add to include commP #353
Comments
@Gozala the client-side calculation is so that a client has a full chain-of-custody, that starts on their own machine. The only thing I am aware of as prior-art is https://github.com/rvagg/js-fil-utils. @rvagg can you chime in wrt completeness / whether you'd recommend it to be used today as-is? A test corpus is available here: https://github.com/filecoin-project/go-fil-commp-hashhash/tree/master/testdata |
Wouldn't same exact CAR file produce same exact |
Hence you want to train your users ( at least these that care about Fil ) to key everything they upload off |
if the client lies about commp, then the data won’t land in a deal. when the miner notices it doesn’t match they invalidate the deal and can notify the broker (spade) why and they can create a new deal that just drops that data. the purpose of the cryptography being used here is to give trustless guarantees between actors who don’t already have the underlying data to calculate the proof. the moment you accept that some actor is “responsible” for validation that isn’t the miner or the client you’ve accept the permanent need for an actor to sit between them with a full copy of the data. there may be some communication necessary between actors to repair the deal, but it must be accomplished using the proofs to avoid actors between the client and the miner having to validate each proof themselves against the deal data because this will always necessitate an additional copy (additional infra) between the client and filecoin forever, or as long as you use that means of deal creation. |
Better - the chain consensus itself ( specifically the PoRep part ) does this. |
@mikeal so if I understand you correctly we basically don't do anything with the What do we do if we have two users uploading same CAR but specify different P.S.: I'm less worried about someone lying, and more about a bug in the client and deploying updates to clients. |
@yusefnapora found this relevant repo https://github.com/rvagg/rust-fil-commp-generate |
@Gozala @yusefnapora note: this contains a pretty old copy of the currently used rust code: https://github.com/filecoin-project/rust-fil-proofs/blob/128f7209/filecoin-proofs/src/api/mod.rs#L322-L353 |
@ribasushi thanks for the pointer! I think creating a wasm wrapper library from there is probably a best way to go about it. Also probably a good self-contained project for hacking on rust/wasm |
So https://github.com/rvagg/js-fil-utils does commp, it's written for Node.js files but that's just cause that's all I cared about at the time. It's written for uint8array iterables so you'd just rip out the Node.js imports and make it take a plain asynciterable and replace the WASM might be overkill? It's really not that complicated a process, maybe WASM gives you a perf boost but the most expensive thing is calling out to sha2-256. Have a look at the JS code, there's 3 main components to it - zero padding to get up to a ^2, fr32 padding to space out 2 bits every 254, and a merkle tree using a slightly truncated/zeroed sha2-256 result. |
Depends on #712 |
I've started modernizing https://github.com/rvagg/js-fil-utils but as I've added tests I uncovered that produces commP had been inconsistent with https://github.com/filecoin-project/go-fil-commp-hashhash/tree/master/cmd/stream-commp#usage-example. After which I end porting some of the code from https://github.com/filecoin-project/go-fil-commp-hashhash/blob/master/commp.go instead. Resulting library had been published here https://github.com/web3-storage/data-segment/ and it has test vectors in place (@ribasushi I can share what bugs I've found / fixed if you want to backport them) ProblemWhile integrating commP into our client lib it became clear that computing it in browser (using web native crypto APIs) had been unacceptably slow ~26s for 46MiB file, while same code with same source input in node was 0.02s. My hypothesis is that browsers rate-limit sha-256 compute to prevent malicious actors from doing crypto mining. I did more investigation on the matter and here some data on the same 46MiB input
I think numbers support my hypothesis, profiler info gathered from browser devtools provided no evidence of it. It could be that hopping between native thread and js thread is a culprit instead. ConclusionsI think it might be warranted to reconsider idea of pre-computing piece cid (a.k.a commp) in the client before and including it with The tradeoff is that client may never submit a |
Something we should flag as an ongoing concern that we need to keep an eye on: doing proofs in the browser means we’re being chased by browser security features that are trying to fight crypto mining. |
Could you elaborate on this (or if already mentioned somewhere just share a link to why)? I think we need to make it as easy as possible today to have what users need for getting data into Filecoin - not enough users care enough about the gory details of Filecoin. Think we should just ship using the efficient CommP generation solution in the client and make it required (though if trivial we can structure sending the CommP to our API using a content claim and make it not required in the future) |
All that said I think I do struggle rationalizing introducing barriers in a name of reducing cost of computation on our end. It seems like we should provide be giving users a choice to:
By decoupling store/add from commP we make above choice possible. By precluding commP we remove such a choice and consequently create a barrier for adoption. |
Got it, thanks! I think the point on us needing to eventually verify it is a key one. But let's just opt for the simple solution for now that's future-proof: just either do it in the client or ourselves (if the latter would love to understand the projected compute cost), and not expose users to this choice. But we can put things in UCAN claims so that if users end up asking us for this choice, we can give it to them (and it sounds like it has additional benefits like being able to parallelize in the client upload experience). Outsourcing the computation, creating an open market around it, etc. are less important today. |
Folks... "computing" commp is dirt cheap. Accessing the data is what's expensive. Just compute it in a spot where the data flows through anyway and call it a day. The original hesitation was around pulling the data back out from "at rest", which is what is expensive, not the computation itself. |
So doing the maths relevant to our shard sizes (using ~35s overhead on the biggest shard we can send - 4GiB. On a more typical chunk (~128MiB) it would be ~1s. For most uploads (very small) it will be negligible. It's not great, but it's not the worst. @Gozala how about this?: Decouple such that submitting a commp is a seperate invocation, but (at least for now) have the client compute and submit it at the same time as the That way we allow ourselves flexibility in the event that it becomes untenable to generate in the browser in the future. We can change the client without changing the protocol.
@ribasushi I don't understand!? @Gozala is explicitly raising that this is not the case in the browser, even if it's being done as the data is being generated/sent. |
@alanshaw I was responding to
Would you phrase things this way if you were speaking about md5? This phrasing frames thing as if the "compute" is significant. I was trying to reframe the discussion as a whole 😅 |
@ribasushi it's was a general sentiment I'm making towards attitude of "we don't want to do this on backend because it's expensive". Most services don't care about it they just charge you for their spend & I'm suggesting that is how we should approach things as well. In this specific case, computing commP in the backend will be very fast, probably good idea to do to ensure we don't get invalid one from client anyway. |
We should add a field to
store/add
capability in order to include commP proof with every write@ribasushi is there js code somewhere we could use to compute it on client side ?
@mikeal computing it on the client side is fine, but we can’t trust client & would need to verify, which makes me wonder if doing it on a client is even worth it, as we may only complicate things by having to handle case where client provides incorrect value.
The text was updated successfully, but these errors were encountered: