Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: PDP #227

Draft
wants to merge 63 commits into
base: feat/market
Choose a base branch
from
Draft

[WIP] feat: PDP #227

wants to merge 63 commits into from

Conversation

magik6k
Copy link
Collaborator

@magik6k magik6k commented Sep 30, 2024

Reading:

Roadmap:

  • Database schema
  • PieceCID version correctness
  • HTTP API
    • Basic structure
    • Auth
    • Data ingest
      • POST /piece
      • PUT /piece/upload/..
      • Call notify
        • make optional
      • Store in long-term storage
    • Data retrieval through /piece
      • Auth
    • PDPTool
      • Creating secret/tokens
      • Ping tool
      • Upload tool
      • Retrieve tool
      • Piece set manipulation
    • Piece sets
      • Create
      • Delete
      • Add root (w subroots)
      • Del Root
      • Get root subroots
    • Proof-set prover
  • UI

Copy link

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mostly put in comments to check for understanding before I start writing code. But generally LGTM -- looks like it's compliant with the API we agreed on :)

)

// verifyJWTToken extracts and verifies the JWT token from the request and returns the serviceID.
func (p *PDPService) verifyJWTToken(r *http.Request) (string, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magik6k if I am reading this right, I send a JWT with a claim of "service" = "storacha" and then I need to make sure the JWT is signed by a private key that corresponds to a public key that was previously inserted into the pdp_services table with probably the RPC api below to add PDP Service (i.e the SP called "AddPDPService" locally using their running node -- assuming "AddPDPService" is not available on the public endpoint)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, see

func createJWTToken(serviceName string, privateKey *ecdsa.PrivateKey) (string, error) {

cuhttp/server.go Outdated
@@ -204,5 +215,10 @@ func attachRouters(ctx context.Context, r *chi.Mux, d *deps.Deps) (*chi.Mux, err
}
ipni_provider.Routes(r, ipp)

if sd.EthSender != nil {
pdsvc := pdp.NewPDPService(d.DB, d.LocalStore, must.One(d.EthClient.Get()), sd.EthSender)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is essentially what's adding all the PDP routes correct (i.e. /proof-sets, POST to /piece)

// Collect pieceInfos for subroots
pieceInfos := make([]abi.PieceInfo, len(addRootReq.Subroots))

var totalOffset uint64 = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I correct that reading this code right here provides indirect instructions on how to aggregate all the subroot pieces into a root CID? IOW, I can assume that I can basically want to write aggregation code that will satisfy this code by generating the same RootCID? (which appears to be all the pieces concatenated together).

Also it would appear non-ffi is a fork of go-commp-utils -- is this public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the subroot aggregation is the same as in filecoin pieces (CommP -> CommD)

The non-ffi thing is the same code as the Rust logic, reimplemented by Riba in Go.

There is one added constraint in PDP - pieces must be sorted by size in descending order - this way no inter-piece padding is needed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh good call out good to know!


// Parse request body
var req struct {
PieceCID string `json:"pieceCid"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're not requiring piececid v2 here, not in the put handler. Long term, I think we should require this. From the standpoint of the SP, I imagine them to want to filter at the piece level, and I imagine one of the main filter critieria to be size. They should know exactly what size piece they're about to accept before they accept it.

Unless I'm wrong, a PieceCID v1 doesn't specify that, and absent a size parameter, they could be accepting anything from 32 bytes to 256 MB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, PieceCIDv2 is still todo, can quickly update if that's a blocker

pdp/handlers_upload.go Show resolved Hide resolved
"github.com/filecoin-project/curio/lib/storiface"
)

type PieceParkReader struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what finds and reads PDP pieces for get requests correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, PiecePark is the general "raw piece" storage in Curio, we eventually intend to serve all retrievals from it (instead of from the fr32-padded "unsealed" sector files), but PDP is the first place which will actually do reads directly from it.

@hannahhoward
Copy link

am I correct that the retrieval handler already has been modified to support PDP pieces through the parked piece reader?

@hannahhoward
Copy link

am I correct in believing that the next step on the API is that add root will send a location header to a creation status API for the root, then that will be used to get the root ID which is then possible to pass to the root detail and delete root APIs? Not crazy urgent just want to know.

Copy link
Collaborator Author

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github thinks I have an unsubmitted review here and can't discard it..

@magik6k
Copy link
Collaborator Author

magik6k commented Oct 21, 2024

am I correct that the retrieval handler already has been modified to support PDP pieces through the parked piece reader?

Yes, correct, retrievals work already. No ACL mechanism yet, but it's not going to be hard to implement once we have a spec (you can see that all retrieval requests quite quickly become aware what piece they are reading from + ACLs should be quite cachable)

@magik6k
Copy link
Collaborator Author

magik6k commented Oct 21, 2024

am I correct in believing that the next step on the API is that add root will send a location header to a creation status API for the root, then that will be used to get the root ID which is then possible to pass to the root detail and delete root APIs? Not crazy urgent just want to know.

Other than delete all the things should be there. Operations on PDP proofsets (other than create) you should probably watch through eth events. I can also make the message-sending endpoints output the message hash somehow, but you still want to watch for all the events to not miss any contract changes made by the SP

@magik6k magik6k force-pushed the feat/pdp branch 4 times, most recently from fa8d937 to a3a1c3a Compare October 24, 2024 13:28
@hannahhoward
Copy link

hannahhoward commented Oct 27, 2024

Upon further review, it's possible we have a design issue here that we need a solution for.

Please let me know if I am understanding the sequence of events correctly when a user uploads a pdp piece:

  1. The piece is put in the “Stash”
  2. A parked piece record is created
  3. Asychronously, the piece moves from the stash to permaneant storage (Task: https://github.com/filecoin-project/curio/blob/feat/pdp/tasks/piece/task_park_piece.go)
  4. After that, again asynchronously, the notify task is called, which hits the notify URL. (Task: https://github.com/filecoin-project/curio/blob/feat/pdp/tasks/pdp/notify_task.go)

Why this matters: a basic gaurantee of all hot storage, which storacha tries to offer, is read-on-write. The basic expectation is from the users point of view is that when an upload finishes, it should be retrievable.

One thing that is not clear I'd like to clarify: are steps 3 & 4 likely to run in a couple seconds or much longer? If it's only a few seconds, we can work around this. Fortunately our upload process includes a confirmation step AFTER the upload, and what we can do is essentially block the confirmation until step 4 completes. But that only works if it's a short period of time before the piece is fully retrievable.

If not, we really need to revisit the architecture and find a way to shorten the process.

@hannahhoward
Copy link

Follow-up: a pull based approach for checking for piece cid -- #304

magik6k and others added 3 commits November 1, 2024 12:13
Allows accepting JWTs that are signed with Ed25519 as well as ECDSA
x509 does not parse to pointer type on ed25519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants