Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage/gcs): add support for Google Cloud Storage #2589

Merged
merged 22 commits into from
Jan 7, 2024

Conversation

erka
Copy link
Contributor

@erka erka commented Dec 25, 2023

fixes #2288

Reference for docs with auth options https://cloud.google.com/docs/authentication#service-accounts

I would suggest that object config should be changed in more flat variant (no s3/azblob/gc object). Probably there will be one store and some configuration switch/builder of each object type.

@erka erka requested a review from a team as a code owner December 25, 2023 21:10
Copy link

codecov bot commented Dec 25, 2023

Codecov Report

Attention: 42 lines in your changes are missing coverage. Please review.

Comparison is base (c2e602a) 71.78% compared to head (8f27806) 71.57%.

Files Patch % Lines
internal/storage/fs/object/store.go 71.84% 21 Missing and 8 partials ⚠️
internal/storage/fs/index.go 85.24% 6 Missing and 3 partials ⚠️
internal/storage/fs/snapshot.go 77.77% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2589      +/-   ##
==========================================
- Coverage   71.78%   71.57%   -0.21%     
==========================================
  Files          87       84       -3     
  Lines        8271     8085     -186     
==========================================
- Hits         5937     5787     -150     
+ Misses       1978     1947      -31     
+ Partials      356      351       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @erka !! Amazing as always

couple minor naming convention updates requested. I think since gs isnt as well known/used as like s3, then maybe we should use something more clear like googlecloud. wdyt? /cc @GeorgeMac

also really love the refactor to use gocloud across the board!

Copy link
Contributor

@GeorgeMac GeorgeMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome as ever. One thing from me around slashes in prefixes.

One last take it or leave it: I think you could reduce all the different stores down to just one store implementation with different configuration, as you have with FS (I could be missing something). This could also be a later refactor.

Copy link
Collaborator

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question and one suggested fix

Copy link
Collaborator

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! thank you @erka ! I can put up a docs PR and we can try to get this out today or tomorrow in a release

@markphelps markphelps enabled auto-merge (squash) December 27, 2023 15:21
@markphelps markphelps disabled auto-merge December 27, 2023 16:09
@erka
Copy link
Contributor Author

erka commented Dec 27, 2023

looks great! thank you @erka ! I can put up a docs PR and we can try to get this out today or tomorrow in a release

@markphelps It looks like this could take some time to review this PR. Probably it could be included in the next release.

@markphelps markphelps requested a review from GeorgeMac December 31, 2023 15:49
@erka
Copy link
Contributor Author

erka commented Jan 2, 2024

@GeorgeMac could you please suggest how to modify this PR? Should I drop go-cloud and just create Google Cloud Storage implementation?

@erka
Copy link
Contributor Author

erka commented Jan 4, 2024

@markphelps @GeorgeMac This prefix is a wild bit. After going back and forth it looks like s3 prefix never works. If you check grpc.NewObjectStore (v1.32) or store.newObjectStore (main) it doesn't have s3.WithPrefix option applied to the s3 store... Right? Could you please double check that?

@GeorgeMac
Copy link
Contributor

GeorgeMac commented Jan 4, 2024

It looks like you're right @erka 🤦 It was never threaded through. Missed that in review.

It has been this way since day zero.

@GeorgeMac
Copy link
Contributor

GeorgeMac commented Jan 4, 2024

@erka I am looking at how the test is setup here for the bug you're experience and realizing that there is another bug in the old S3 implementation, which means it should never have worked in the first place.

The old implementation appears to not be picking up the .flipt.yml file, and your change has made it so it does pick it up.
Because the .flipt.yml file contains an index which only trusts files named default.yml and production.yml, not the file added during the test features.yml which is added during the test.

This is the testdata pushed to s3 (minio): https://github.com/flipt-io/flipt/tree/main/build/testing/integration/readonly/testdata

Apologies, you've dug up some skeletons in the closet 😂

Update: see my mad rant below, I was barking up the wrong tree.

Copy link
Contributor

@GeorgeMac GeorgeMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erka I got the following changes to work as expected for me 👇

What do you think about those. I think they're roughly inline with how things worked previously. Though we will never know as we never actually wired it up properly.

@erka
Copy link
Contributor Author

erka commented Jan 4, 2024

@erka I got the following changes to work as expected for me 👇

What do you think about those. I think they're roughly inline with how things worked previously. Though we will never know as we never actually wired it up properly.

@GeorgeMac thank you for the hint.

Probably this is a good time to think how prefix should work. That missing WithPrefix option give the change to make it right.

@GeorgeMac
Copy link
Contributor

GeorgeMac commented Jan 4, 2024

@erka yeah I think even my suggestion here might be subtly broken. If you add a case here:

// namespace shouldn't exist as it has been filtered out by the prefix
require.Error(t, store.View(func(s storage.ReadOnlyStore) error {
_, err := s.GetNamespace(context.TODO(), "production")
return err
}), "production namespace shouldn't be retrieavable")

If you change the assertion at the bottom too:

	// namespace shouldn't exist as it has been filtered out by the prefix
	require.NoError(t, store.View(func(s storage.ReadOnlyStore) error {
		_, err := s.GetNamespace(context.TODO(), "production")
		require.Error(err, "production namespace shouldn't be retrieavable")
		
		_, err := s.GetNamespace(context.TODO(), "prefix")
		require.NoError(err, "prefix namespace should be retrieavable")
		
		return nil
	}))

It fails because the prefix namespace didn't actually get read.
I tried a few other permutations and found that fs.WalkDir is doing to some rewriting of paths if you return FileInfo instances with Names that include directories (e.g.prefix/prefix_namespace.yml) is going to get rewritten.

I've got some ideas on how we could get this all in line so that prefix can be arbitrary.
Otherwise, we could change it to be an actually directory.
I do worry though that some users may be depending on this prefix as a simple way to just reduce the number of items listed, not as a directory name. 🤔 hmmmm
I might quickly try out the implementation of gcblob.Bucket as it already implements io/fs.
I wonder if that works with arbitrary prefixes...

Update: the gblob implementation looks promising, but when used with a non slash terminated prefix it goes into an infinite loop on Walk constantly returning the root current dir .. Can't win haha.

@GeorgeMac
Copy link
Contributor

Thanks for being so patient with me on this @erka ! I might need to sleep on it for today, but I have one idea I would like to explore which is to skip making an fs.FS and instead use:

// SnapshotFromFiles constructs a StoreSnapshot from the provided slice
// of fs.File implementations.
func SnapshotFromFiles(logger *zap.Logger, files ...fs.File) (*Snapshot, error) {
now := flipt.Now()
s := Snapshot{
ns: map[string]*namespace{
defaultNs: newNamespace("default", "Default", now),
},
evalDists: map[string][]*storage.EvaluationDistribution{},
now: now,
}
for _, fi := range files {
defer fi.Close()
docs, err := documentsFromFile(fi)
if err != nil {
return nil, err
}
for _, doc := range docs {
if err := s.addDoc(doc); err != nil {
return nil, err
}
}
}
return &s, nil
}

My thoughts here is that fs.FS is needlessly overcomplicated for a flat keyspace like a blob store.
The OCI package depends on this, because we just flatten your features yaml files there and just put them in a list.

This just takes a list of fs.File which we can fabricate for each key listed in the blob store with an optional prefix.
The one extra caveat is that we (at-least intended) to support the .flipt.yml too and we would need to filter the list of files with the globs in that file first (might need to refactor the listStateFiles function a bit and export parts of it). All a bit involved, happy to make a PR on your PR tomorrow if you like. Or you're welcome to try that too if you dont mind of course. You've put a lot of hard time into this already :D so no pressure.

@GeorgeMac
Copy link
Contributor

Hey @erka I pushed up a branch over here based on your work:
https://github.com/flipt-io/flipt/compare/gm/gocloud-object?expand=1#diff-d0c365d367f4c2650c25d6810f08164236c3e16a89faa5b86d55bce1c7ccfb5f

What do you think of this? I was going to make a PR onto your branch, but I can't write into your fork.
If you like it, you're always welcome to rebase your branch on the head of this one.
That way we get it back into this PR.

I will open it up as a draft for now just to get CI testing it.

The long and short of the change is it just creates a single store implementation under the object package.
Then it just leans into the fact that gcblob can take the different configurations as URL strings.
It drops trying to make a full fs.FS infavour of using storagefs.NewSnapshotFromFiles() which takes list of files.

Copy link
Contributor

@GeorgeMac GeorgeMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing and I think its good to go. Just need to add this new IT fs/gcs to the GH workflows list:

"fs/git",
"fs/local",
"fs/s3",
"fs/oci",
"fs/azblob",

@erka erka requested a review from GeorgeMac January 6, 2024 17:40
Copy link
Contributor

@GeorgeMac GeorgeMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic. Thanks so much @erka 🙇‍♂️

@GeorgeMac GeorgeMac merged commit 4b6b9b3 into flipt-io:main Jan 7, 2024
29 of 30 checks passed
@markphelps markphelps added the needs docs Requires documentation updates label Jan 7, 2024
@markphelps markphelps removed the needs docs Requires documentation updates label Jan 23, 2024
@erka erka deleted the gcs branch January 28, 2024 20:55
@markphelps markphelps mentioned this pull request Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Google Cloud Storage (FS Object Backend)
4 participants