-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 datastore support #1261
S3 datastore support #1261
Conversation
@jbenet dont you dare git push each this one, i want to run other tests this month. |
@tv42 do you have any thoughts on query for the s3datastore? thats gonna be the hard part i think. Also, how hard would it be to run a local 's3' server for testing this code? Most of this looks good so far. |
@whyrusleeping I don't expect S3 to bring any special difficulties to Query. I do want to whack the Query API with a mallet; whether that happens as part of the continuation of this or not depends on whether I come up with a good enough replacement API soon enough. If not, s3 Query will look a lot like flatfs Query, except it could support more (unused) features. I'm not aware of any "tiny s3 clones" worth recommending, and didn't want to start writing one on a whim. Making a true S3 clone is ridiculously hard (been there; https://github.com/ceph/s3-tests/ ), though this code uses a small enough of a fraction of features that httptest.Server might be plenty enough. Also, remember that it's currently only 70 lines of code and 83 lines to configure that (most of that validating the JSON, for better error reporting), and maybe 60-100 LoC for a simple Query implementation. And it's pretty much all very simple code. Perhaps some of the sharness stuff could be run in a special mode, where they talk to actual AWS S3? |
Maybe... that's probably okay to do if we make sure they only run when a given env var (or set of env vars) is set |
// All values of DSOpener must have a JSON representation that is | ||
// semantically equivalent to the Datastore section of the config | ||
// file that they were instantiated by, including the "Type" key. | ||
type DSOpener interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea 👍
@tv42 this PR is great. +1 everything LGTM. sorry for the huge delay on my end.
Either sounds good to me.
Yeah i think this works for normal tests. And, there's many S3 mock things out there that already implement the exact API for us (not that it's complicated...)
Yeah, we can have a special set of sharness tests that can be run against S3. I wouldn't want to put this on CI for all PRs. (could make a simple side-repo with the right config that pulls go-ipfs and tests on S3, and make sure to pull to this repo before merging to stable. would need a way to make credentials private or whatever)
|
There were the following issues with your Pull Request
Guidelines are available to help. Your feedback on GitCop is welcome on this issue This message was auto-generated by https://gitcop.com |
There were the following issues with your Pull Request
Guidelines are available to help. Your feedback on GitCop is welcome on this issue This message was auto-generated by https://gitcop.com |
Gitcop is wrong (ipfs/infra#46), and my commits have been amended to have the right trailer. Unit tests pass, sharness passes. This is on top of dev0.4.0 (with a fix for a random bug I found in dev0.4.0) and ready to roll! Paging @jbenet @whyrusleeping |
I have no idea about those circleci & travis errors in https://circleci.com/gh/ipfs/go-ipfs/640 https://travis-ci.org/ipfs/go-ipfs/jobs/71014668 but none of that happens locally and none of it seems to be related to these changes. |
This used to lead to large refcount numbers, causing Flush to create a lot of IPFS objects, and merkledag to consume tens of gigabytes of RAM. License: MIT Signed-off-by: Jeromy <[email protected]>
OS X sed is documented as "-i SUFFIX", GNU sed as "-iSUFFIX". The one consistent case seems to be "-iSUFFIX", where suffix cannot empty (or OS X will parse the next argument as the suffix). This used to leave around files named `refsout=` on Linux, and was just confusing. License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
These secondary copies were never actually queried, and didn't contain the indirect refcounts so they couldn't become the authoritative source anyway as is. New goal is to move pinning into IPFS objects. A migration will be needed to remove the old data from the datastore. This can happen at any time after this commit. License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
Pinner had method GetManual that returned a ManualPinner, so every Pinner had to implement ManualPinner anyway. License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
Platform-dependent behavior is not nice, and negative refcounts are not very useful. License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
There were the following issues with your Pull Request
Guidelines are available to help. Your feedback on GitCop is welcome on this issue This message was auto-generated by https://gitcop.com |
Rebased again to get on top of latest dev0.4.0. |
The generated file went through some changes because of differing go-bindata versions. License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
This gives us a clean slate for the new code, avoiding leftovers. License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
Earlier, it also checked checked the leveldb directory. That part added no crash safety to the application, and just hardcoded assumptions about the datastore. If anything, this should rely on the absolute last item created by fsrepo.Init, and there should be fsync guarantees about ordering. License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
…m for S3 License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
…"private" License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
To test it, set up an S3 bucket (in an AWS region that is not US Standard, for read-after-write consistency), run `ipfs init`, then edit `~/.ipfs/config` to say "Datastore": { "Type": "s3", "Region": "us-west-1", "Bucket": "mahbukkit", "ACL": "private" }, with the right values. Set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` in the environment and you should be able to run `ipfs add` and `ipfs cat` and see the bucket be populated. No automated tests exist, unfortunately. S3 is thorny to simulate. License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT Signed-off-by: Tommi Virtanen <[email protected]>
There were the following issues with your Pull Request
Guidelines are available to help. Your feedback on GitCop is welcome on this issue This message was auto-generated by https://gitcop.com |
Closing in favor of #1488 because that one targets dev0.4.0. This one is from before the dev branch existed. |
For your reviewing pleasure, do not merge yet. Sorry for the noise in the pull request, this is on top of the
pin
branch, start reading from the most recent commit with "pin:" in the commit message. Once pinning is fully merged, I'll rebase this and ask for a merge, but for now living on top of that branch causes less noise in the datastore.Note that S3 objects names are hex, because datastore keys can be binary. So that means S3 objects like
2f626c6f636b732f122014d8c7e2382fe98c052b96b7a86daa9b176fde6324cbfdd37314617dc1058a7d
.