-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demonstrate usage of foreign arrow function #9150
Conversation
Including arrow language in the distribution by default. Added a basic example for creating an Arrow array.
build.sbt
Outdated
@@ -3021,7 +3020,8 @@ buildEngineDistribution := { | |||
val _ = (`engine-runner` / assembly).value | |||
updateLibraryManifests.value | |||
val modulesToCopy = componentModulesPaths.value.map(_.data) | |||
val engineModules = Seq(file("runtime.jar")) | |||
val arrow = Seq((`runtime-language-arrow`/Compile/packageBin ).value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct. Let's not assemble runtime-language-arrow
, as that is unnecessary.
Arrow specification will enforce padding during serialization. This change is mostly copying from existing Java reference implementation. Also while the padding is now correct, allocation of separate `ByteDirect` buffers is not. TBC
The builder accepts an Arrow array and allows for populating the array via the `set` method. Once the builder is `built` no more modifications are allowed.
Arrow spec expects (informally) that memory allocated to data and bitmaps is continous. With this change, when allocating memory for some array, we allocate full chunk of memory and then slice **from it** chunks for data and bitmap, rather than allocating separate memory chunks. TODO: add some reference counting in the future.
distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso
Outdated
Show resolved
Hide resolved
distribution/lib/Standard/Base/0.0.0-dev/src/Data/Arrow_Array_Builder.enso
Outdated
Show resolved
Hide resolved
...java/org/enso/interpreter/node/expression/builtin/interop/generic/WriteArrayElementNode.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/enso/interpreter/node/expression/builtin/mutable/SetBuiltinVectorNode.java
Outdated
Show resolved
Hide resolved
engine/runtime/src/main/java/org/enso/interpreter/runtime/data/vector/ArrowArrayBuilder.java
Outdated
Show resolved
Hide resolved
engine/runtime/src/main/java/org/enso/interpreter/runtime/data/vector/ArrowArrayBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have designed, implemented and packaged the Arrow language as a separate, Enso independent entity. This PR puts Arrow Array Builder into Standard.Base
. That's not right!
Arrow isn't a core concept of Enso. As such there should be no builtins related to Arrow support.
Should the Arrow language remain independent from Enso language, then the builder has to be provided by the Arrow language, exposing its functionality via InteropLibrary
. Simplest idea is to require two arguments to new_arrow
constructor (created after parsing new[Int8]
by the Arrow language) - the first being the size and the second an isExecutable
TruffleObject
(for example an Enso lambda function) that will be execute(builder)
where builder
is another TruffleObject
reacting to invokeMember("set")
& co.
Or we can decide that Arrow is natural part of Enso. But then we shouldn't have a separate Arrow language JAR & co. Needless to say, I'd rather keep Enso small and have Arrow separate - unless there are inherent performance or UX limitations.
distribution/lib/Standard/Base/0.0.0-dev/src/Data/Arrow_Array_Builder.enso
Outdated
Show resolved
Hide resolved
...java/org/enso/interpreter/node/expression/builtin/interop/generic/WriteArrayElementNode.java
Outdated
Show resolved
Hide resolved
engine/runtime/src/main/java/org/enso/interpreter/runtime/data/vector/ArrowArrayBuilder.java
Outdated
Show resolved
Hide resolved
Despite the name, there is nothing Arrow-specific in that module. And I think it would make sense to make it generic. I will update the PR. |
@JaroslavTulach As I mentioned, there was nothing in the builder that made it arrow-specific. In fact this can serve as a generic polyglot array builder. The changes now reflect that. |
foreign arrow new_arrow = """ | ||
new[Int8] | ||
|
||
Polyglot_Array_Builder.set self index value = Polyglot_Array_Builder.set_builtin self index value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this if people object. This is mostly for testing the functionality and it is not part of the official API
builder.set 1 0 . should_fail_with Unsupported_Argument_Types | ||
Polyglot.write_array_element array_10 1 0 | ||
builder.at 1 . should_equal 0 | ||
v.at 1 . should_equal 0 # dangerous but allowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JaroslavTulach This demonstrates the potential consequences of using Polyglot API i.e. updating the builder after the fact that makes it "sealed".
/cc @jdunkerley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
demonstrates the potential consequences of using Polyglot API
This demonstrates that the sealing only happens in the builder, but not on the actual array. If we want object coming from our arrow language to support sealing, why don't we put the sealed
flag into ArrowFixedArrayInt
& co.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I continue to have an API design architectural problem with this PR.
was nothing in the builder that made it arrow-specific. In fact this can serve as a generic polyglot array builder. The changes now reflect that.
Whatever was @jdunkerley reason for requesting the builder, it is not achieved as the usage of the new builder is completely optional and there are other means to achieve the same (and more) without the builder.
Proper API design shall start with description of the use-case. As a reviewer I am not supposed to guess the use-case. However I have to: If the use-case was encapsulation of the write operation, then it is not achieved! If the use-case is something else, then please mention it in the Pull Request Description - as currently the justification for majority of the code (I mean the bloat related to all the new builder builtins) in this PR is missing (or at last I cannot find it).
But I also added a case where you can use Polyglot API to do the modifications in place. Then the sky is the limit (the potential damage as well).
Right. So what is your use-case related to Polyglot.write_array_element
? Should one be allowed to do the modifications in place from Enso language or not? The original decision when working on #5011 and related PRs was that we don't want to expose write operations in Enso.
Using builder pattern is a good way to avoid exposing write operations, however it has to be done properly - e.g. (in current architectural design) the builder has to be provided by the Arrow language.
|
||
package org.enso.interpreter.arrow.runtime; | ||
|
||
public class RoundingUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the class have to be public at all? Isn't it enough to be package private?
err.enter(); | ||
Builtins builtins = EnsoContext.get(this).getBuiltins(); | ||
throw new PanicException(builtins.error().makeInvalidArrayIndex(array, index), this); | ||
} catch (UnsupportedTypeException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnsupportedMessageException
leads to makeTypeError
while UnsupportedTypeException
leads to panic. Isn't that strange? Shouldn't UnsupportedTypeException
lead to some type error?
@@ -165,6 +168,10 @@ public Atom makeNoSuchConversion(Object target, Object that, UnresolvedConversio | |||
return noSuchConversion.newInstance(target, that, conversion); | |||
} | |||
|
|||
public Atom makeInvalidOperation(Object message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Object message
type is pretty generic and potentially wrong. We cannot pass java.lang.String
there - that'd be an error. Recently I changed one of such signatures to Text
. Other alternative is EnsoObject
or TruffleObject
or overloaded methods supporting also long
and double
.
I can see the Object
argument is accepted all around, but as I say, it is technically not correct.
stdlibName = "Standard.Base.Data.Polyglot_Array_Builder.Polyglot_Array_Builder") | ||
public abstract class PolyglotArrayBuilder implements EnsoObject { | ||
|
||
protected boolean sealed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected
? Better to make (package) private
.
|
||
protected boolean sealed = false; | ||
|
||
public void seal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
.
builder.set 1 0 . should_fail_with Unsupported_Argument_Types | ||
Polyglot.write_array_element array_10 1 0 | ||
builder.at 1 . should_equal 0 | ||
v.at 1 . should_equal 0 # dangerous but allowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
demonstrates the potential consequences of using Polyglot API
This demonstrates that the sealing only happens in the builder, but not on the actual array. If we want object coming from our arrow language to support sealing, why don't we put the sealed
flag into ArrowFixedArrayInt
& co.?
v.at 8 . should_equal 127 | ||
builder.append 21 . should_fail_with Invalid_Operation | ||
builder.set 1 21 . should_fail_with Invalid_Operation | ||
Polyglot.write_array_element array_10 1 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me recap what we achieved so far:
- we expose
new[Int8]
to public - we let public to create
array_10
and hold it indefinitly - we let public to modify the array any time with
Polyglot.write_array_element array_10
- on top of that we created some fake facade called
Polyglot_Array_Builder
- the facade provides no additional functionality
- every operation can already be done on
array_10
by other means
I guess I am missing the point of the whole builder exercise. Whatever it tried to achieve (probably some encapsulation or giving the creator more rights) clearly isn't achieved. Btw. Give the Creator of an Object More Rights pattern is described at page 79 of the Practical API Design book.
engine/runtime/src/main/java/org/enso/interpreter/runtime/data/vector/ArrowArrayBuilder.java
Outdated
Show resolved
Hide resolved
Assuming @jdunkerley main requirement is the: don't let mutable arrays float thru the computation, then ...
Change the syntax of the Arrow language to create the builder itself: foreign arrow new_array_int8 = """
new[Int8]
b10 = new_array_int8 10
b10.append 2
b10.append 4
arr = b10.build # or to_array?
arr.length . should_equal 10
arr.at 0 . should_equal 2
arr.at 1 . should_equal 4
arr.at 5 . is_nothing . should_be_true # maybe?
Polyglot.write_array_element arr 6 33 # raises an exception as the arr is already immutable if this is what we want, then it all can be implemented in the Arrow language itself without any need for so many builtins in the Enso engine. As a result of the changes the test cases that use new would need to be updated to use the builder (newly returned from the factory method). Arrow Array is already Enso ArrayPlease note that one can apply all Enso Array methods on the b3 = new_array_int8 3
b3.append 1
b3.append 2
b3.append 3
arr = b3.build
v1 = arr.map _.to_text # yields ['1', '2', '3']
v2 = arr.fold 10 (+) # yields 16
v3 = Meta.type_of arr # yields Array |
Avoiding the introduction of various builtins by pushing builder pattern into allow language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an architecture perspective the change is looking good now. It is located in the right place (e.g. runtime-language-arrow
), it doesn't require any changes to engine/runtime
. It fits perfectly the vision why Arrow language was founded as a (loosely) separated project.
I have doubts about the decision to make builder : Array
. Possibly that's unnecessary leakage of mutability into the Enso world. However that may stay as it is and remain a subject of future discussions.
I believe we should improve error reporting by using better subclasses of InteropException. Given the typed array nature of Arrow buffers, I'd use UnsupportedTypeException more.
I haven't verified the change from the point of performance. I assume there will be some benchmarking work in the future.
...ime-language-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedArrayDate.java
Outdated
Show resolved
Hide resolved
|
||
@ExportMessage | ||
public boolean hasArrayElements() { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "builder" is in fact also an (Enso) Array
and it can be mutated until somebody calls build
on it. I'd be interested to see a use-cases for such a design decision...
...guage-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedSizeArrayBuilder.java
Outdated
Show resolved
Hide resolved
...guage-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedSizeArrayBuilder.java
Outdated
Show resolved
Hide resolved
...guage-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedSizeArrayBuilder.java
Outdated
Show resolved
Hide resolved
...guage-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedSizeArrayBuilder.java
Show resolved
Hide resolved
...guage-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedSizeArrayBuilder.java
Show resolved
Hide resolved
...guage-arrow/src/main/java/org/enso/interpreter/arrow/runtime/ArrowFixedSizeArrayBuilder.java
Outdated
Show resolved
Hide resolved
Follow up on #9150 - making sure that Arrow builder is not accidentally treated as an Array by disallowing reading elements. # Important Notes Also making sure that the length of the resulting Arrow Array is consistent with what user requested.
Pull Request Description
Including arrow language in the distribution by default. Added a basic example for creating an Arrow array.
Making sure that memory layout agrees with Arrow specification (padding, continuous allocation of memory chunks).
Related to #9118.
This should unblock work on allowing serialization/deserialization to/from Parquet but I'd like to delay it to a follow up ticket as it is going to be a significant amount of specialized work.
Important notes
As requested, Arrow array can currently be only modified via a builder pattern. Once the builder is sealed, no more modifications are allowed. At least not via official API. We all know that one can bypass those via a foreign function in JS or Python.
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.