Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gateway handle list objects versions #7028

Merged
merged 6 commits into from
Dec 3, 2023
Merged

Conversation

nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Nov 22, 2023

Fix #7025

  • Handle getbucketlocation API call based on s3 configured region
  • Configurable option to disable unsupported operations

@nopcoder nopcoder added area/gateway Changes to the gateway include-changelog PR description should be included in next release changelog labels Nov 22, 2023
@nopcoder nopcoder self-assigned this Nov 22, 2023
@nopcoder nopcoder requested review from arielshaqed and a team November 22, 2023 07:47
@itaiad200
Copy link
Contributor

I believe @arielshaqed had a wider gateway's operations scope in #7025, i.e. every S3 operation that is handled by our gateway as a different operation. If it's just this fix, YaY!

@nopcoder
Copy link
Contributor Author

As we love small PRs - so I'll do it incrementally.

From a quick check this solved the bug I found on how we identify different API calls related to the problem described.

The main issue is how to handle unrecognized query string parameters in the S3 protocol.

The S3 protocol relies on HTTP methods and query string parameters to identify the intended operation for each user call. However, the current protocol definition covers around 70 operations. In response to calls with unfamiliar query string parameters, we can either reject them or implement default behaviors in specific cases. The appropriate approach requires careful consideration to maintain compatibility and ensure seamless user experience. The current implementation handle specific cases - like ignore "version" for list, get and etc.

@nopcoder
Copy link
Contributor Author

Switch to draft - will introduce all the required changes

@nopcoder nopcoder marked this pull request as draft November 22, 2023 09:19
@nopcoder nopcoder removed the request for review from a team November 22, 2023 09:58
@arielshaqed
Copy link
Contributor

The S3 protocol relies on HTTP methods and query string parameters to identify the intended operation for each user call. However, the current protocol definition covers around 70 operations. In response to calls with unfamiliar query string parameters, we can either reject them or implement default behaviors in specific cases. The appropriate approach requires careful consideration to maintain compatibility and ensure seamless user experience. The current implementation handle specific cases - like ignore "version" for list, get and etc.

Not sure that I agree. I believe that we should precisely parse the operation, and then apply it. The fact that something like "version" makes our lives harder only means that we need to be more careful here. We should understand all query parameters.

The good news to me is that we only implement a tiny set of operations!

Also we might look again at how MinIO do this.

@nopcoder nopcoder force-pushed the fix/gateway-versining branch from 79e7e22 to 3ef34d9 Compare November 28, 2023 10:02
@nopcoder
Copy link
Contributor Author

re-implemented the solution to report unsupported operation
each operation handler first check if the request supported.

@nopcoder nopcoder marked this pull request as ready for review November 28, 2023 10:04
@nopcoder nopcoder marked this pull request as draft November 28, 2023 10:06
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These huge lists of params to HandleUnsupported scare me. For one thing they mean that if AWS add any new call to S3, lakeFS will claim to support it.
Is there no way of performing a positive check for whether we support an S3 method?

@nopcoder
Copy link
Contributor Author

These huge lists of params to HandleUnsupported scare me. For one thing they mean that if AWS add any new call to S3, lakeFS will claim to support it. Is there no way of performing a positive check for whether we support an S3 method?

My concern is with additional parameters that may be passed to the same call that currently handled, ex: list objects v2 with metadata and handled by the same operation but without metadata support.

Will open an issue to refactor the s3 gateway router to do a better job in matching the handler to the commands based on the method/query params.

In order not to break something added a flag to disable the unsupported checks that were added.

@nopcoder nopcoder marked this pull request as ready for review November 28, 2023 17:24
@nopcoder nopcoder requested a review from arielshaqed November 28, 2023 17:28
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only blocking change is the name of the config variable. (But you know me -- I really like my logs, I just won't block this PR on that.)

esti/s3_gateway_test.go Show resolved Hide resolved
pkg/config/config.go Outdated Show resolved Hide resolved
case ref != "" && pth != "":
o.OperationID = pathBasedOperationID(req.Method)
case ref == "" && pth == "":
o.OperationID = repositoryBasedOperationID(req.Method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a default case here! Even if we keep going in order to fail later, I would really like a log (level TRACE) for each of the four cases. Otherwise debugging a bad request will be that much harder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The o.OperationID = operations.OperationIDOperationNotFound assigned before the switch case, I wanted to log this request as operation not found and let it fall into the handler.
In the previous version, it was odd that we had handler in the switch that returns after w.WriteHeader(http.StatusNotFound) and in case pathBasedOperationID returns OperationIDOperationNotFound we continue to fall and handle it after logging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open? Reminder, there are 2 things here

  • (Smaller) There is no default case. I'd rather not have the default here in a default not up on l.202.
  • (Larger) I'd like to log (trace) the params repo, ref, pth, and which type of operation we picked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open.

Comment on lines +31 to +35
query := req.URL.Query()
if !query.Has("delete") {
_ = o.EncodeError(w, req, nil, gerrors.ERRLakeFSNotSupported.ToAPIErr())
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we HandleUnsupported here? One day we'll add delete-part or something, and even before that this is less uniform so I need to think harder when reading it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing here was that POST on bucket/repo level is postpolicybucket operations (no query parameters) and the only other operation in this level is "deletemultipleobjects (query parameter "delete").
In all other cases HandleUnsupported handled the non-default case, and here we don't handle the default case.
As I didn't wanted to manage a whitelist, at least until we will have a better router for the S3 gateway, I thought this will make more sense.
Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pleased? (Or say no, that also works -- prefer not to leave this open)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really not like to delete someone's object because they did some random POST. But if we cannot lock it down, we cannot lock it down...

@@ -28,6 +28,10 @@ func (controller *GetObject) RequiredPermissions(_ *http.Request, repoID, _, pat
}

func (controller *GetObject) Handle(w http.ResponseWriter, req *http.Request, o *PathOperation) {
// TODO(barak): do we like to reject tagging or return empty tagging?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pick one. I'd say reject, but I definitely don't want get-tags to return the object contents -- which AFAICT the current code does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept the previous implementation - return empty tagging response

Copy link

github-actions bot commented Nov 29, 2023

♻️ PR Preview d47dc98 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

@nopcoder nopcoder requested a review from arielshaqed November 29, 2023 09:36
@nopcoder
Copy link
Contributor Author

@arielshaqed thanks - update the code+doc except one comment

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Just a few comments left open.

case ref != "" && pth != "":
o.OperationID = pathBasedOperationID(req.Method)
case ref == "" && pth == "":
o.OperationID = repositoryBasedOperationID(req.Method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open? Reminder, there are 2 things here

  • (Smaller) There is no default case. I'd rather not have the default here in a default not up on l.202.
  • (Larger) I'd like to log (trace) the params repo, ref, pth, and which type of operation we picked.

Comment on lines +31 to +35
query := req.URL.Query()
if !query.Has("delete") {
_ = o.EncodeError(w, req, nil, gerrors.ERRLakeFSNotSupported.ToAPIErr())
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pleased? (Or say no, that also works -- prefer not to leave this open)

@nopcoder
Copy link
Contributor Author

@arielshaqed I forgot to submit my comments with my last pr update

@nopcoder
Copy link
Contributor Author

  1. middleware - each function there can return unsupported operation. more information in above the comment you left.
  2. delete object - the tldr, we do not support the default and the reject function check for operations based on query parameters. more information above the comment.

@nopcoder nopcoder requested a review from arielshaqed November 30, 2023 13:26
@nopcoder nopcoder force-pushed the fix/gateway-versining branch from c43792f to 551ab3f Compare November 30, 2023 13:34
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do we want a tech debt issue to improve the S3 operation multiplexor?

case ref != "" && pth != "":
o.OperationID = pathBasedOperationID(req.Method)
case ref == "" && pth == "":
o.OperationID = repositoryBasedOperationID(req.Method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open.

Comment on lines +31 to +35
query := req.URL.Query()
if !query.Has("delete") {
_ = o.EncodeError(w, req, nil, gerrors.ERRLakeFSNotSupported.ToAPIErr())
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really not like to delete someone's object because they did some random POST. But if we cannot lock it down, we cannot lock it down...

@nopcoder
Copy link
Contributor Author

nopcoder commented Dec 2, 2023

Open #7100 to address router/mux

@nopcoder
Copy link
Contributor Author

nopcoder commented Dec 3, 2023

@arielshaqed about lock deleteobjects.go - it is locked only for "delete" operation (the only case we check) as the default (no query parameters) is the case we do not support.
No other operation currently mapped to this handler.

@nopcoder nopcoder merged commit 847fd07 into master Dec 3, 2023
33 of 34 checks passed
@nopcoder nopcoder deleted the fix/gateway-versining branch December 3, 2023 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway Changes to the gateway include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 gateway only looks at HTTP verb, not at full S3 action
3 participants