-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add array_distance
function
#12211
Add array_distance
function
#12211
Conversation
Signed-off-by: Austin Liu <[email protected]> Add `distance` aggregation function Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
from data | ||
---- | ||
5.196152422707 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> select distance(sq.column1, sq.column2) from (values (NULL, 2), (0,0)) as sq;
+---------------------------------+
| distance(sq.column1,sq.column2) |
+---------------------------------+
| 0.0 |
+---------------------------------+
I prefer it to be NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure let's do it this way~
I am unsure how useful this feature would be because it only calculates two scalar column distances. 🤔 |
@austin362667 I think |
…sted Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Got it! Thanks for reviewing. |
Signed-off-by: Austin Liu <[email protected]>
distance
aggregate functionarray_distance
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
We could support column value on the follow up PR
Nice! Please add some doc here |
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Thanks, @austin362667, for the contribution, and @jayzhan211 for the review. We'll support the column value in a follow-up PR. |
Which issue does this PR close?
Partially closes #8782.
Rationale for this change
Add distance functionality to DataFusion. This function is particularly useful for scenarios involving spatial analysis, clustering, or similarity computations, where distance metrics are crucial.
It might be valuable to add scalar UDFs like
list_distance
/array_distance
, similar to DuckDB, along with other methods of distance measurement (e.g., cosine etc)XREF: DuckDB list functions, array functions
What changes are included in this PR?
New function Euclidean
array_distance(arr1, arr2)
is added.Are these changes tested?
Yes, added SQL logic tests.
Are there any user-facing changes?
New function
array_distance(arr1, arr2)
is added.Examples:
No breaking change.