Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: object tracking #566

Merged
merged 244 commits into from
May 15, 2023
Merged
Show file tree
Hide file tree
Changes from 223 commits
Commits
Show all changes
244 commits
Select commit Hold shift + click to select a range
5033129
feat: tracking using openmm
IshSiva Jan 17, 2023
ca70d69
feat: object tracking
gaurav274 Jan 19, 2023
28b6925
merge msater
gaurav274 Jan 19, 2023
123516d
checkpoint
gaurav274 Jan 20, 2023
c9afc12
adding delete operation
aryan-rajoria Jan 22, 2023
34dfbf7
Adding Insert Statement
aryan-rajoria Jan 25, 2023
9fa9857
checkpoint
aryan-rajoria Jan 25, 2023
ebe26d3
supporting multiple entries
aryan-rajoria Jan 25, 2023
bc722dd
implemented for structured data error
aryan-rajoria Jan 25, 2023
0e29858
adding parser visitor for delete
aryan-rajoria Jan 25, 2023
c1a7864
delete executor
aryan-rajoria Jan 26, 2023
5ac631a
delete plan and rules
aryan-rajoria Jan 26, 2023
3238e95
adding delete to plan executor
aryan-rajoria Jan 26, 2023
02a1d28
change position of LogicalDelete
aryan-rajoria Jan 26, 2023
01181b5
logical delimeter
aryan-rajoria Jan 30, 2023
562a7ca
delete test case
aryan-rajoria Jan 30, 2023
9887732
adding test case
aryan-rajoria Feb 2, 2023
d2a1a3d
adding test case
aryan-rajoria Feb 2, 2023
f09c613
adding delete testcase
aryan-rajoria Feb 3, 2023
79a6168
adding predicate to delete executor
aryan-rajoria Feb 3, 2023
5ce1991
adding delete to Image storage
aryan-rajoria Feb 3, 2023
91d7b06
bug fix in delete
aryan-rajoria Feb 9, 2023
0aac934
fixing testcase
aryan-rajoria Feb 9, 2023
ee48803
adding test case for insert statement
aryan-rajoria Feb 9, 2023
fc2f243
remove order_by from statement_binder.py
aryan-rajoria Feb 9, 2023
343a4a2
better variable names, using Batch
aryan-rajoria Feb 9, 2023
121451f
error message for insert
aryan-rajoria Feb 9, 2023
5b47c15
removing order_by and limit from delete
aryan-rajoria Feb 10, 2023
8c75a5e
remove order_by and limit
aryan-rajoria Feb 10, 2023
6772cd0
use f-string
aryan-rajoria Feb 10, 2023
7a10d67
adding to changelog
aryan-rajoria Feb 10, 2023
1a4204f
removing commit messages
aryan-rajoria Feb 14, 2023
e96d3a4
formatting
aryan-rajoria Feb 14, 2023
640e7ed
fixing comments
aryan-rajoria Feb 14, 2023
cb50de3
formatting
aryan-rajoria Feb 14, 2023
533de9e
eva insert f32 values
aryan-rajoria Feb 15, 2023
e8081e4
Merge branch 'delete-operation' of github.com:Aryan-Rajoria/eva into …
jarulraj Feb 16, 2023
97bfac4
checkpoint
jarulraj Feb 16, 2023
0127ad0
checkpoint
jarulraj Feb 16, 2023
fc353c9
checkpoint
jarulraj Feb 16, 2023
10fd83a
checkpoint
jarulraj Feb 16, 2023
d559fe1
checkpoint
jarulraj Feb 16, 2023
445a710
checkpoint
jarulraj Feb 16, 2023
d246715
checkpoint
jarulraj Feb 16, 2023
17247fe
checkpoint
jarulraj Feb 16, 2023
8a2d5a3
fix: should delete range
aryan-rajoria Feb 16, 2023
365ae2f
delete multiple rows
aryan-rajoria Feb 16, 2023
88e74e3
udf bootstrap
aryan-rajoria Feb 16, 2023
0d4cde2
checkpoint
jarulraj Feb 16, 2023
a8918a7
checkpoint
jarulraj Feb 16, 2023
40bc613
checkpoint
jarulraj Feb 17, 2023
5a2faac
checkpoint
jarulraj Feb 17, 2023
5f4a802
checkpoint
jarulraj Feb 17, 2023
e952beb
checkpoint
jarulraj Feb 17, 2023
247b041
checkpoint
jarulraj Feb 17, 2023
08420d1
checkpoint
jarulraj Feb 17, 2023
536ae74
checkpoint
jarulraj Feb 17, 2023
d30fcbd
checkpoint
jarulraj Feb 17, 2023
31aa92d
checkpoint
jarulraj Feb 17, 2023
db930a4
checkpoint
jarulraj Feb 17, 2023
f76e856
checkpoint
jarulraj Feb 17, 2023
0ae4cd7
checkpoint
jarulraj Feb 17, 2023
c7c763e
checkpoint
jarulraj Feb 18, 2023
2711582
checkpoint
jarulraj Feb 18, 2023
86c0513
checkpoint
jarulraj Feb 18, 2023
4fcc190
checkpoint
jarulraj Feb 18, 2023
94d0ae0
checkpoint
jarulraj Feb 18, 2023
0701eb8
checkpoint
jarulraj Feb 18, 2023
b31ad91
checkpoint
jarulraj Feb 18, 2023
2bf76ff
checkpoint
jarulraj Feb 18, 2023
d72d778
checkpoint
jarulraj Feb 18, 2023
d74791c
checkpoint
jarulraj Feb 18, 2023
b783430
checkpoint
jarulraj Feb 18, 2023
2450485
checkpoint
jarulraj Feb 18, 2023
b78280e
checkpoint
jarulraj Feb 18, 2023
166aa32
checkpoint
jarulraj Feb 18, 2023
22d1386
Updated config.yml
jarulraj Feb 18, 2023
ba6bda9
checkpoint
jarulraj Feb 18, 2023
f9f4a2b
checkpoint
jarulraj Feb 18, 2023
429e513
checkpoint
jarulraj Feb 18, 2023
7e525a2
checkpoint
jarulraj Feb 18, 2023
1afe38a
checkpoint
jarulraj Feb 18, 2023
9744432
checkpoint
jarulraj Feb 18, 2023
cb859db
checkpoint
jarulraj Feb 19, 2023
acf36ce
checkpoint
jarulraj Feb 19, 2023
768a8a2
checkpoint
jarulraj Feb 19, 2023
32da4af
checkpoint
jarulraj Feb 19, 2023
13c7710
checkpoint
jarulraj Feb 19, 2023
ea2cee7
checkpoint
jarulraj Feb 19, 2023
eb56fcc
checkpoint
jarulraj Feb 19, 2023
ad339a6
checkpoint
jarulraj Feb 19, 2023
db1f388
checkpoint
jarulraj Feb 19, 2023
80bf59c
checkpoint
jarulraj Feb 19, 2023
a740112
checkpoint
jarulraj Feb 19, 2023
41c2051
checkpoint
jarulraj Feb 19, 2023
9fe2f81
checkpoint
jarulraj Feb 20, 2023
f37b92c
checkpoint
jarulraj Feb 20, 2023
46e2106
checkpoint
jarulraj Feb 20, 2023
35168e7
checkpoint
jarulraj Feb 20, 2023
e16eb3c
checkpoint
jarulraj Feb 20, 2023
101761e
Merge branch 'master' into coverage
jarulraj Feb 20, 2023
6732aef
checkpoint
jarulraj Mar 15, 2023
7f07e13
checkpoint
jarulraj Mar 16, 2023
cb27222
Merge branch 'master' into coverage
jarulraj Mar 17, 2023
f7f0274
checkpoint
jarulraj Mar 17, 2023
463183c
checkpoint
jarulraj Mar 17, 2023
b230ad9
checkpoint
jarulraj Mar 17, 2023
6ae22c8
checkpoint
jarulraj Mar 17, 2023
61f3ca4
checkpoint
jarulraj Mar 18, 2023
9834ebf
checkpoint
jarulraj Mar 18, 2023
51ad946
checkpoint
jarulraj Mar 18, 2023
b9e5ea6
checkpoint
jarulraj Mar 18, 2023
b9314fb
checkpoint
jarulraj Mar 19, 2023
43c04a4
checkpoint
jarulraj Mar 19, 2023
8e4a32f
checkpoint
jarulraj Mar 19, 2023
1481442
checkpoint
jarulraj Mar 19, 2023
3acbc20
checkpoint
jarulraj Mar 19, 2023
3611298
checkpoint
jarulraj Mar 19, 2023
f6b27a0
checkpoint
jarulraj Mar 19, 2023
11eb92d
checkpoint
jarulraj Mar 19, 2023
c1db3ce
checkpoint
jarulraj Mar 19, 2023
25f48a7
checkpoint
jarulraj Mar 19, 2023
562bcbd
checkpoint
jarulraj Mar 19, 2023
01a1bd4
checkpoint
jarulraj Mar 19, 2023
a1ec020
checkpoint
jarulraj Mar 19, 2023
307d3d4
checkpoint
jarulraj Mar 19, 2023
98733a6
checkpoint
jarulraj Mar 19, 2023
80292fc
checkpoint
jarulraj Mar 19, 2023
a80e44e
checkpoint
jarulraj Mar 19, 2023
27a2a47
checkpoint
jarulraj Mar 19, 2023
2494795
checkpoint
jarulraj Mar 19, 2023
a128b98
checkpoint
jarulraj Mar 19, 2023
7c4aec1
checkpoint
jarulraj Mar 19, 2023
94b892b
checkpoint
jarulraj Mar 19, 2023
b846e82
checkpoint
jarulraj Mar 20, 2023
d1c2c1c
checkpoint
jarulraj Mar 20, 2023
6d61eb1
checkpoint
jarulraj Mar 20, 2023
c774038
checkpoint
jarulraj Mar 20, 2023
5419356
checkpoint
jarulraj Mar 20, 2023
4c22deb
checkpoint
jarulraj Mar 20, 2023
93f6c58
checkpoint
jarulraj Mar 20, 2023
49e6f97
checkpoint
jarulraj Mar 20, 2023
1d82644
checkpoint
jarulraj Mar 20, 2023
8bfa3d5
checkpoint
jarulraj Mar 21, 2023
87dc9a9
checkpoint
jarulraj Mar 21, 2023
d9f5f3d
checkpoint
jarulraj Mar 21, 2023
20ef35d
checkpoint
jarulraj Mar 21, 2023
734fa8a
checkpoint
jarulraj Mar 21, 2023
162cc42
checkpoint
jarulraj Mar 21, 2023
508e4a7
checkpoint
jarulraj Mar 21, 2023
31dcd9a
checkpoint
jarulraj Mar 21, 2023
7ac35d9
checkpoint
jarulraj Mar 21, 2023
53c3fec
checkpoint
jarulraj Mar 21, 2023
356310b
checkpoint
jarulraj Mar 21, 2023
c5b6c77
checkpoint
jarulraj Mar 21, 2023
03c9b50
checkpoint
jarulraj Mar 21, 2023
212e001
checkpoint
jarulraj Mar 21, 2023
1cda24c
checkpoint
jarulraj Mar 21, 2023
bbff19b
checkpoint
jarulraj Mar 21, 2023
c104ec3
checkpoint
jarulraj Mar 21, 2023
36d1946
try to run tests in parallel
jarulraj Mar 22, 2023
f15e0a3
checkpoint
jarulraj Mar 22, 2023
bfb344d
checkpoint
jarulraj Mar 22, 2023
7d89836
checkpoint
jarulraj Mar 22, 2023
b586931
checkpoint
jarulraj Mar 22, 2023
6416091
Merge branch 'master' of github.com:georgia-tech-db/eva
gaurav274 Mar 29, 2023
e1d2391
checkpoint
gaurav274 Mar 29, 2023
60c4b87
renove opencv reader
gaurav274 Mar 29, 2023
aa241c9
checkpoint
gaurav274 Mar 29, 2023
3f4266e
bug: fix readers
gaurav274 Mar 30, 2023
b6e0898
fix reader testcases
gaurav274 Mar 30, 2023
6efb528
fix csv reader
gaurav274 Mar 30, 2023
2cbc817
Merge branch 'master' of github.com:georgia-tech-db/eva
gaurav274 Mar 30, 2023
3a19571
Merge branch 'master' into reader-fixes
gaurav274 Mar 30, 2023
ff0615f
bug fixes
gaurav274 Mar 30, 2023
896de3c
remove old udf
gaurav274 Mar 30, 2023
92c6da3
style: fix linter
gaurav274 Mar 30, 2023
32fab3c
style: fix linter
gaurav274 Mar 30, 2023
a5ff3e7
checkpoint
gaurav274 Jan 20, 2023
7ef1816
merge master
gaurav274 Mar 30, 2023
eaaab84
merge reader fixes branch
gaurav274 Mar 31, 2023
2534a2f
checkpoint
gaurav274 Apr 2, 2023
cf765bc
merge master
gaurav274 Apr 2, 2023
1d5533c
resolve conflicts
gaurav274 Apr 2, 2023
9c7f95b
remove openmm related packages
gaurav274 Apr 2, 2023
4c59da3
obj tracking working using norfair
gaurav274 Apr 3, 2023
9338019
refactor
gaurav274 Apr 3, 2023
78e8fbf
Merge branch 'master' of github.com:georgia-tech-db/eva into minor-fixes
gaurav274 Apr 3, 2023
7b0ae63
minor fix testcase
gaurav274 Apr 3, 2023
f995237
install decord from fork
suryatejreddy Apr 3, 2023
1434889
fix
suryatejreddy Apr 3, 2023
a1c0e6a
Merge branch 'master' into obj-tracking
jarulraj Apr 4, 2023
51b5072
bug: fix pip install
gaurav274 Apr 4, 2023
0c11b6a
merge master
gaurav274 Apr 4, 2023
bf8b42f
revert setup.py
gaurav274 Apr 4, 2023
98dd0ce
merge minor fixes
gaurav274 Apr 4, 2023
1c35f09
merge upstream
gaurav274 Apr 4, 2023
a796d73
merge master
gaurav274 Apr 4, 2023
91e782d
Merge branch 'obj-tracking' of github.com:georgia-tech-db/eva into ob…
gaurav274 Apr 4, 2023
dcda756
clean up
gaurav274 Apr 4, 2023
50c1191
EVATracker handles iterating over batch row by row
gaurav274 Apr 4, 2023
1f84029
minor fixes
gaurav274 Apr 5, 2023
b12e603
generalize Extract object to use existing executors
gaurav274 Apr 5, 2023
bff98ae
add reuse testcase and tracker to bootstrap queries
gaurav274 Apr 5, 2023
a16bbd2
handle jumps in input to tracker
gaurav274 Apr 6, 2023
ecd41f1
Merge branch 'master' of github.com:georgia-tech-db/eva
gaurav274 Apr 6, 2023
4e5f9b3
merge master
gaurav274 Apr 6, 2023
b9f2b7e
Merge branch 'master' of github.com:georgia-tech-db/eva
gaurav274 Apr 6, 2023
a221e2b
merge master
gaurav274 Apr 6, 2023
37f0c50
Merge branch 'master' of github.com:georgia-tech-db/eva into obj-trac…
gaurav274 Apr 6, 2023
9339474
address pr comments
gaurav274 Apr 7, 2023
470a358
merge upstream
gaurav274 Apr 7, 2023
4b0c70c
merge master
gaurav274 Apr 9, 2023
b3e6318
bug fixed in tracker udf definition
gaurav274 Apr 9, 2023
e72c6f9
Merge branch 'master' into obj-tracking
xzdandy May 6, 2023
850919b
Merge branch 'master' into obj-tracking
xzdandy May 8, 2023
2e3a5b9
FIX: YoloV5 -> YOLO
xzdandy May 8, 2023
ebc829a
FIX extract object intergration test
xzdandy May 9, 2023
21b65eb
FIX merge error
xzdandy May 9, 2023
3293b24
Fix reuse test case
xzdandy May 9, 2023
78f5f66
Merge branch 'master' into obj-tracking
xzdandy May 11, 2023
a8cab6b
updates
jarulraj May 12, 2023
55b7a2b
updates
jarulraj May 12, 2023
71ad90a
updates
jarulraj May 12, 2023
6ad1221
updates
jarulraj May 12, 2023
81e4676
Merge branch 'master' into obj-tracking
xzdandy May 13, 2023
7d21d9f
improve extract object reuse testcases.
xzdandy May 13, 2023
d15ff37
updates
jarulraj May 13, 2023
504e748
updates
jarulraj May 13, 2023
3570ecd
updates
jarulraj May 13, 2023
fc6d09d
fix testcases
gaurav274 May 13, 2023
3aa1901
python3.11 support disabled
gaurav274 May 13, 2023
6567777
merge bump-python
gaurav274 May 13, 2023
22d253c
merge master
gaurav274 May 13, 2023
9c11863
fix build
gaurav274 May 14, 2023
9d5972c
fix merge issue
gaurav274 May 14, 2023
6ece740
install opencv in headless mode
gaurav274 May 14, 2023
5652f4a
replace yolov5 with yolo
gaurav274 May 14, 2023
467e105
Merge branch 'obj-tracking' of github.com:georgia-tech-db/eva into ob…
xzdandy May 14, 2023
86c0e6e
reorganize the EVA tracker class into abstract class for all trackers in
xzdandy May 14, 2023
2cbd1b4
Linter
xzdandy May 14, 2023
c1bfc66
fix the input/output signature for trackers
xzdandy May 14, 2023
a863093
LINTER
xzdandy May 14, 2023
91d357b
revert tutorial changes
gaurav274 May 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 76 additions & 1 deletion eva/binder/binder_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,11 @@
from typing import TYPE_CHECKING, List

from eva.catalog.catalog_type import TableType
from eva.catalog.catalog_utils import is_string_col, is_video_table
from eva.catalog.catalog_utils import (
get_video_table_column_definitions,
is_string_col,
is_video_table,
)
from eva.expression.function_expression import FunctionExpression
from eva.parser.alias import Alias

Expand Down Expand Up @@ -132,3 +136,74 @@ def resolve_alias_table_value_expression(node: FunctionExpression):
assert len(node.alias.col_names) == len(
node.output_objs
), f"""Expected {len(node.output_objs)} output columns for {node.alias.alias_name}, got {len(node.alias.col_names)}."""


def handle_bind_extract_object_function(
node: FunctionExpression, binder_context: StatementBinderContext
):
"""Handles the binding of extract_object function.
1. Bind the source video data
2. Create and bind the detector function expression using the provided name.
3. Create and bind the tracker function expression.
Its inputs are id, data, output of detector.
4. Bind the EXTRACT_OBJECT function expression and append the new children.
5. Handle the alias and populate the outputs of the EXTRACT_OBJECT function

Args:
node (FunctionExpression): The function expression representing the extract object operation.
binder_context (StatementBinderContext): The context object used to bind expressions in the statement.

Raises:
AssertionError: If the number of children in the `node` is not equal to 3.
"""
assert (
len(node.children) == 3
), f"Invalid arguments provided to {node}. Example correct usage, (data, Detector, Tracker)"

# 1. Bind the source video
video_data = node.children[0]
binder_context.bind(video_data)

# 2. Construct the detector
# convert detector to FunctionExpression before binding
# eg. YoloV5 -> YoloV5(data)
detector = FunctionExpression(None, node.children[1].col_name)
detector.append_child(video_data.copy())
binder_context.bind(detector)

# 3. Construct the tracker
# convert tracker to FunctionExpression before binding
# eg. ByteTracker -> ByteTracker(id, data, labels, bboxes, scores)
tracker = FunctionExpression(None, node.children[2].col_name)
# create the video id expression
columns = get_video_table_column_definitions()
tracker.append_child(
TupleValueExpression(
col_name=columns[1].name, table_alias=video_data.table_alias
)
)
tracker.append_child(video_data.copy())
binder_context.bind(tracker)
# append the bound output of detector
for obj in detector.output_objs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the object detection be run on every frame? norfair supports the period operator, but I'm curious how we can combine that with our operators (e.g., SAMPLE) to run detections only once every few frames and associate tracks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice observation! Let me think about it.

Copy link
Collaborator

@xzdandy xzdandy Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the current implementation already work with SAMPLE? The frames passed to the object detector are after sampling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will work, however incorrectly. Tracker won't know that we are skipping frames.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sending frame_id to the tracker for such use cases. Fixed it!

col_alias = "{}.{}".format(obj.udf_name.lower(), obj.name.lower())
child = TupleValueExpression(
obj.name,
table_alias=obj.udf_name.lower(),
col_object=obj,
col_alias=col_alias,
)
tracker.append_child(child)

# 4. Bind the EXTRACT_OBJECT expression and append the new children.
node.children = []
node.children = [video_data, detector, tracker]

# 5. assign the outputs of tracker to the output of extract_object
node.output_objs = tracker.output_objs
node.projection_columns = [obj.name.lower() for obj in node.output_objs]

# 5. resolve alias based on the what user provided
# we assign the alias to tracker as it governs the output of the extract object
resolve_alias_table_value_expression(node)
tracker.alias = node.alias
6 changes: 6 additions & 0 deletions eva/binder/statement_binder.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
check_groupby_pattern,
check_table_object_is_video,
extend_star,
handle_bind_extract_object_function,
resolve_alias_table_value_expression,
)
from eva.binder.statement_binder_context import StatementBinderContext
Expand All @@ -40,6 +41,7 @@
from eva.parser.select_statement import SelectStatement
from eva.parser.statement import AbstractStatement
from eva.parser.table_ref import TableRef
from eva.parser.types import UDFType
from eva.third_party.huggingface.binder import assign_hf_udf
from eva.utils.generic_utils import get_file_checksum, load_udf_class_from_file
from eva.utils.logging_manager import logger
Expand Down Expand Up @@ -234,6 +236,10 @@ def _bind_tuple_expr(self, node: TupleValueExpression):

@bind.register(FunctionExpression)
def _bind_func_expr(self, node: FunctionExpression):
# handle the special case of "extract_object"
if node.name.upper() == str(UDFType.EXTRACT_OBJECT):
handle_bind_extract_object_function(node, self)
return
# bind all the children
for child in node.children:
self.bind(child)
Expand Down
19 changes: 7 additions & 12 deletions eva/executor/apply_and_merge_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,11 @@ def __init__(self, node: ApplyAndMergePlan):
def exec(self, *args, **kwargs) -> Iterator[Batch]:
child_executor = self.children[0]
for batch in child_executor.exec(**kwargs):
res = self.func_expr.evaluate(batch)
if not res.empty():
if self.do_unnest:
res.unnest()
func_result = self.func_expr.evaluate(batch)
output = Batch.merge_column_wise([batch, func_result])
if self.do_unnest:
output.unnest(func_result.columns)
# we reset the index as after unnest there can be duplicate index
output.reset_index()

# Merge the results to the input.
# This assumes that the batch index is preserved by the function
# call. Since both the batch and the results are sorted, we could
# perform a sorted merge, though the typical small size of the
# batch and results should not significantly impact performance.
merged_batch = Batch.join(batch, res)
merged_batch.reset_index()
yield merged_batch
yield output
4 changes: 4 additions & 0 deletions eva/expression/abstract_expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,10 @@ def get_child(self, index: int):
def children(self):
return self._children

@children.setter
def children(self, children):
self._children = children

def append_child(self, child):
self._children.append(child)

Expand Down
40 changes: 40 additions & 0 deletions eva/optimizer/operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ class OperatorType(IntEnum):
LOGICALEXPLAIN = auto()
LOGICALCREATEINDEX = auto()
LOGICAL_APPLY_AND_MERGE = auto()
LOGICAL_EXTRACT_OBJECT = auto()
LOGICALFAISSINDEXSCAN = auto()
LOGICALDELIMITER = auto()

Expand Down Expand Up @@ -918,6 +919,45 @@ def __hash__(self) -> int:
return hash((super().__hash__(), self.func_expr, self.do_unnest, self.alias))


class LogicalExtractObject(Operator):
def __init__(
self,
detector: FunctionExpression,
tracker: FunctionExpression,
alias: Alias,
do_unnest: bool = False,
children: List = None,
):
super().__init__(OperatorType.LOGICAL_EXTRACT_OBJECT, children)
self.detector = detector
self.tracker = tracker
self.do_unnest = do_unnest
self.alias = alias

def __eq__(self, other):
is_subtree_equal = super().__eq__(other)
if not isinstance(other, LogicalExtractObject):
return False
return (
is_subtree_equal
and self.detector == other.detector
and self.tracker == other.tracker
and self.do_unnest == other.do_unnest
and self.alias == other.alias
)

def __hash__(self) -> int:
return hash(
(
super().__hash__(),
self.detector,
self.tracker,
self.do_unnest,
self.alias,
)
)


class LogicalJoin(Operator):
"""
Logical node for join operators
Expand Down
47 changes: 46 additions & 1 deletion eva/optimizer/rules/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
LogicalDrop,
LogicalDropUDF,
LogicalExplain,
LogicalExtractObject,
LogicalFaissIndexScan,
LogicalFilter,
LogicalFunctionScan,
Expand Down Expand Up @@ -346,7 +347,7 @@ def apply(self, before: LogicalFilter, context: OptimizerContext):
class XformLateralJoinToLinearFlow(Rule):
"""If the inner node of a lateral join is a function-valued expression, we
eliminate the join node and make the inner node the parent of the outer node. This
produces a linear #data flow path. Because this scenario is common in our system,
produces a linear data flow path. Because this scenario is common in our system,
we chose to explicitly convert it to a linear flow, which simplifies the
implementation of other optimizations such as UDF reuse and parallelized plans by
removing the join."""
Expand Down Expand Up @@ -442,6 +443,50 @@ def apply(self, before: LogicalFilter, context: OptimizerContext):
yield root_node


class XformExtractObjectToLinearFlow(Rule):
"""If the inner node of a lateral join is a Extract_Object function-valued
expression, we eliminate the join node and make the inner node the parent of the
outer node. This produces a linear data flow path.
TODO: We need to add a sorting operation after detector to ensure we always provide tracker data in order.
"""

# LogicalApplyAndMerge(tracker)
# LogicalJoin(Lateral) |
# / \ -> LogicalApplyAndMerge(detector)
# A LogicalExtractObject |
# A

def __init__(self):
xzdandy marked this conversation as resolved.
Show resolved Hide resolved
pattern = Pattern(OperatorType.LOGICALJOIN)
pattern.append_child(Pattern(OperatorType.DUMMY))
pattern.append_child(Pattern(OperatorType.LOGICAL_EXTRACT_OBJECT))
super().__init__(RuleType.XFORM_EXTRACT_OBJECT_TO_LINEAR_FLOW, pattern)

def promise(self):
return Promise.XFORM_EXTRACT_OBJECT_TO_LINEAR_FLOW

def check(self, before: LogicalJoin, context: OptimizerContext):
if before.join_type == JoinType.LATERAL_JOIN:
return True
return False

def apply(self, before: LogicalJoin, context: OptimizerContext):
A: Dummy = before.children[0]
logical_extract_obj: LogicalExtractObject = before.children[1]

detector = LogicalApplyAndMerge(
logical_extract_obj.detector, alias=logical_extract_obj.detector.alias
)
tracker = LogicalApplyAndMerge(
logical_extract_obj.tracker,
alias=logical_extract_obj.alias,
do_unnest=logical_extract_obj.do_unnest,
)
detector.append_child(A)
tracker.append_child(detector)
yield tracker


class CombineSimilarityOrderByAndLimitToFaissIndexScan(Rule):
"""
This rule currently rewrites Order By + Limit to a Faiss index scan.
Expand Down
3 changes: 2 additions & 1 deletion eva/optimizer/rules/rules_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ class RuleType(Flag):

# REWRITE RULES TOP DOWN APPLY FIRST (LOGICAL -> LOGICAL)
XFORM_LATERAL_JOIN_TO_LINEAR_FLOW = auto()
XFORM_EXTRACT_OBJECT_TO_LINEAR_FLOW = auto()
TOP_DOWN_DELIMETER = auto()

# REWRITE RULES BOTTOM UP APPLY SECOND (LOGICAL -> LOGICAL)
Expand Down Expand Up @@ -82,7 +83,6 @@ class RuleType(Flag):
LOGICAL_CREATE_INDEX_TO_FAISS = auto()
LOGICAL_APPLY_AND_MERGE_TO_PHYSICAL = auto()
LOGICAL_FAISS_INDEX_SCAN_TO_PHYSICAL = auto()

IMPLEMENTATION_DELIMETER = auto()

NUM_RULES = auto()
Expand Down Expand Up @@ -139,6 +139,7 @@ class Promise(IntEnum):
# REWRITE RULES
EMBED_FILTER_INTO_GET = auto()
EMBED_SAMPLE_INTO_GET = auto()
XFORM_EXTRACT_OBJECT_TO_LINEAR_FLOW = auto()
XFORM_LATERAL_JOIN_TO_LINEAR_FLOW = auto()
PUSHDOWN_FILTER_THROUGH_JOIN = auto()
PUSHDOWN_FILTER_THROUGH_APPLY_AND_MERGE = auto()
Expand Down
2 changes: 2 additions & 0 deletions eva/optimizer/rules/rules_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
PushDownFilterThroughApplyAndMerge,
PushDownFilterThroughJoin,
ReorderPredicates,
XformExtractObjectToLinearFlow,
XformLateralJoinToLinearFlow,
)
from eva.optimizer.rules.rules_base import Rule
Expand All @@ -86,6 +87,7 @@ def __init__(self):

self._stage_one_rewrite_rules = [
XformLateralJoinToLinearFlow(),
XformExtractObjectToLinearFlow(),
]

self._stage_two_rewrite_rules = [
Expand Down
20 changes: 15 additions & 5 deletions eva/optimizer/statement_to_opr_convertor.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
LogicalDrop,
LogicalDropUDF,
LogicalExplain,
LogicalExtractObject,
LogicalFilter,
LogicalFunctionScan,
LogicalGet,
Expand Down Expand Up @@ -57,6 +58,7 @@
from eva.parser.show_statement import ShowStatement
from eva.parser.statement import AbstractStatement
from eva.parser.table_ref import TableRef
from eva.parser.types import UDFType
from eva.utils.logging_manager import logger


Expand All @@ -78,11 +80,19 @@ def visit_table_ref(self, table_ref: TableRef):

elif table_ref.is_table_valued_expr():
tve = table_ref.table_valued_expr
self._plan = LogicalFunctionScan(
func_expr=tve.func_expr,
alias=table_ref.alias,
do_unnest=tve.do_unnest,
)
if tve.func_expr.name.lower() == str(UDFType.EXTRACT_OBJECT).lower():
self._plan = LogicalExtractObject(
detector=tve.func_expr.children[1],
tracker=tve.func_expr.children[2],
alias=table_ref.alias,
do_unnest=tve.do_unnest,
)
else:
self._plan = LogicalFunctionScan(
func_expr=tve.func_expr,
alias=table_ref.alias,
do_unnest=tve.do_unnest,
)

elif table_ref.is_select():
# NestedQuery
Expand Down
4 changes: 4 additions & 0 deletions eva/parser/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,7 @@ class FileFormatType(EVAEnum):
class ShowType(EVAEnum):
UDFS # noqa: F821
TABLES # noqa: F821


class UDFType(EVAEnum):
EXTRACT_OBJECT # noqa: F821
17 changes: 16 additions & 1 deletion eva/udfs/decorators/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,22 @@ def load_io_from_udf_decorators(
Type[UdfIOCatalogEntry]: UdfIOCatalogEntry object created from the input decorator in setup
"""
tag_key = "input" if is_input else "output"
io_signature = udf.forward.tags[tag_key]
io_signature = None
if hasattr(udf.forward, "tags") and tag_key in udf.forward.tags:
io_signature = udf.forward.tags[tag_key]
else:
# Attempt to populate from the parent class and stop at the first parent class
# where the required tags are found.
for base_class in udf.__bases__:
if hasattr(base_class, "forward") and hasattr(base_class.forward, "tags"):
if tag_key in base_class.forward.tags:
io_signature = base_class.forward.tags[tag_key]
break

assert (
io_signature is not None
), f"Cannot infer io signature from the decorator for {udf}."

result_list = []
for io in io_signature:
result_list.extend(io.generate_catalog_entries(is_input))
Expand Down
Loading