Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject invalid URIs in URI extract functions #18253

Merged
merged 1 commit into from
Jul 14, 2023

Conversation

findepi
Copy link
Member

@findepi findepi commented Jul 12, 2023

It's not by (any document) design to silently ignore invalid input in URI extracting functions (url_extract_fragment, url_extract_host, url_extract_parameter, url_extract_path, url_extract_port, url_extract_protocol, url_extract_query).

This commit follows least surprise principle: invalid input is explicitly rejected. Users wanting to ignore invalid input should use try(...) expression around the invocations.

This commit does not change docs, since now the functions behave as documented.

Release notes

(x) Release notes are required, with the following suggested text:

# General; potentially breaking change
* Fix URI extraction functions (``uri_extract_*``) not to ignore input which is not valid URI.
  Previously the functions undocumented behavior was to return ``NULL`` in such case.

It's not by (any document) design to silently ignore invalid input in
URI extracting functions (url_extract_fragment, url_extract_host,
url_extract_parameter, url_extract_path, url_extract_port,
url_extract_protocol, url_extract_query).

This commit follows least surprise principle: invalid input is
explicitly rejected. Users wanting to ignore invalid input should use
`try(...)` expression around the invocations.

This commit does not change docs, since now the functions behave as
documented.
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% comment


assertTrinoExceptionThrownBy(() -> assertions.function("url_extract_fragment", "'" + url + "'").evaluate())
.hasErrorCode(INVALID_FUNCTION_ARGUMENT)
.hasMessageStartingWith("Cannot parse as URI value '%s': ".formatted(url));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test also try(url_extract_fragment(...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should be testing try in TestUrlFunctions.java

@findepi
Copy link
Member Author

findepi commented Jul 12, 2023

CI #17436

@findepi findepi merged commit 3befdbb into trinodb:master Jul 14, 2023
@findepi findepi deleted the findepi/uriburi branch July 14, 2023 13:58
@github-actions github-actions bot added this to the 423 milestone Jul 14, 2023
@martint
Copy link
Member

martint commented Aug 9, 2023

It's not by (any document) design to silently ignore ...

Actually, it was by design. It's not an accident that the implementations were doing return (uri == null) ? null : .... These functions were added before try() existed, so it was the only way to handle invalid data without the performance penalty of attempting to parse the URL first (e.g., via some hypothetical function like "isValidUrl()") before extracting the components.

This is a backward incompatible language change that deserves more careful consideration, and possibly a migration path that allows restoring the legacy behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants