-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Is_In
and Not_In
to Filter_Condition
#3790
Add support for Is_In
and Not_In
to Filter_Condition
#3790
Conversation
06ecf9f
to
0c10218
Compare
@@ -1190,7 +1190,6 @@ lazy val parser = (project in file("lib/scala/parser")) | |||
s"-Djava.library.path=$root/target/rust/debug" | |||
}, | |||
libraryDependencies ++= Seq( | |||
"com.storm-enroute" %% "scalameter" % scalameterVersion % "bench", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this parser
project is used at all, I think it was a prototype of the Rust integration, superseded by the new Rust parser. Can we just remove the whole project? cc: @hubertp @jdunkerley
I removed this line in particular as bench
config is not available for this project and cleaning it didn't work because of that. But if possible maybe we should remove it altogether if it is indeed unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this can be removed. But maybe @JaroslavTulach can confirm since he is tinkering with the integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it builds after removing the line feel free to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line yep, I was meaning about removing the whole subproject - it seems to me it's some old Rust parser integration effort (it's not the old parser still in use today, that's syntax
) - which was superseded by yours effort and I don't think it is used anywhere. So I think we should ditch it altogether to avoid the maintenance.
expected_max_time_ms = init.first.total_milliseconds * 2 | ||
check_timing name ~action = | ||
res = Duration.time_execution action | ||
runtime_ms = res.first.total_milliseconds | ||
if runtime_ms > expected_max_time_ms then | ||
Test.fail "Expected `Is_In` on "+name+" to be efficient, but it took "+runtime_ms.to_text+"ms while initialization itself took just "+expected_max_time_ms.to_text+"ms." | ||
|
||
check_timing "integers" <| | ||
t.filter "X" (Filter_Condition.Is_In vec) . at "X" . to_vector . should_equal expected_vector | ||
|
||
check_timing "booleans" <| | ||
t.filter (t.at "X" % 2 == 0) (Filter_Condition.Is_In bool_vec) . at "X" . to_vector . should_equal expected_vector_2 | ||
|
||
check_timing "dates" <| | ||
t.filter date_col (Filter_Condition.Is_In dates_vec) . at "X" . to_vector . should_equal expected_vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a timing check just to have a decent comparison and:
init = 501
integers = 31
booleans = 28
dates = 35
so init*2
seems to still have a decent buffer.
For example if I disable the vectorized variant for booleans it gives:
- [FAILED] should perform `Is_In` efficiently for builtin types [14626ms]
Reason: Expected `Is_In` on booleans to be efficient, but it took 13852ms while initialization itself took just 1472ms.
So we can see that 2*init is a decent buffer - for good implementation it gives enough time, but the wrong implementation exceeds that greatly anyway. Let's see how it behaves on the CI though.
f260e78
to
c319934
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing major a few comments to improve but looks good.
distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Base_Generator.enso
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/column/operation/map/bool/BooleanIsInOp.java
Show resolved
Hide resolved
* <p>It indicates whether the vector contained a null value and contains a hashmap of the vector | ||
* elements for faster contains checks. | ||
*/ | ||
public record CompactRepresentation(HashSet<Object> coercedValues, boolean hasNulls) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do a generic version? I don't think will make any real difference but wondered if nicer to have HashSet<String>
etc for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried. It was a bit more complex than we'd like, but maybe the changes could open venues for writing more type-aware operations in the future.
I first made the SpecializedIsInOp
generic as asked ( 2430d46 ), to do so I needed to introduce some type markers to make stuff compile.
Then in 5eb9c3f I have removed the temporarily added TypedStorage<T>
marker interface and just made Storage
know its type Storage<T>
. This required some more work though.
Please tell me if you think it's worth keeping both of these changes or only one of them or none - I'll amend as necessary and then get the PR merged finally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 re sbt change
4c3bdee
to
5eb9c3f
Compare
…and numeric columns
Marking a bug in BoolStorage - TODO fix it
ef188f2
to
bc5b520
Compare
5913848
to
ae84e89
Compare
Pull Request Description
Implements https://www.pivotaltracker.com/story/show/183389945
Important Notes
Checklist
Please include the following checklist in your PR:
Scala,
Java,
and
Rust
style guides.
./run ide build
and./run ide watch
.