Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient conversion of Vec to JsArray #2058

Closed
lastmjs opened this issue May 3, 2022 · 9 comments
Closed

More efficient conversion of Vec to JsArray #2058

lastmjs opened this issue May 3, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@lastmjs
Copy link
Contributor

lastmjs commented May 3, 2022

I just want to bring this up and get some preliminary feedback. My project Azle is a TS/JS environment for the Internet Computer, a Wasm environment. It is relatively resource-constrained compared with other cloud solutions at the moment, and all computation is metered with a unit called cycles. Each RPC call into a canister has a cycle limit.

Users will need to deal with large arrays of Vec<u8> to deal with raw binary data. Right now it seems like the conversion between Vec<u8> and JsArray (to get the user's raw binary data into the canister in the RPC call) is extremely inefficient, insomuch that the cycles limit on the computation is reached and the call fails. I am reaching the limit on arrays of length between 10,000 and 25,000 (~10-20kb).

What can be done to make the conversion more efficient? And btw I am not 100% it's Boa's conversion that is causing the problem, I think my code can be optimized to some extent. But I fear the BigO of the JsArray code (like from_iter) will just be insufficient. What can be done if this is the case? Is there a possibility of just sharing a memory location with a boa context so that we do not need to iterate through entire arrays to convert them over?

@Razican
Copy link
Member

Razican commented May 4, 2022

The thing is that a JS Array is an array of values of any type, which means that every value in the Rust array must be converted to a JS value. I don't think there is much room for improvement here (maybe pre-allocation, which we might be already doing)

But, the ideal JS type for a Vec<u8> is a TypedArray, which uses a Rust Vec backend to be stored. So, I would say that the next step should be to implement easy conversion from Rust Vec to TypedArray.

@lastmjs
Copy link
Contributor Author

lastmjs commented May 6, 2022

Awesome, I will experiment with that next, thanks!

@lastmjs
Copy link
Contributor Author

lastmjs commented Jun 18, 2022

I'm ready to tackle this, can you show me some code for what is currently possible converting a Vec<u8> into a JsUint8Array and vice versa? If it isn't currently possible to do this without iterating over every item, I can work on a pull request to do it efficiently.

@Razican
Copy link
Member

Razican commented Jun 20, 2022

Most of this work would be done here: https://github.com/boa-dev/boa/blob/main/boa_engine/src/object/jstypedarray.rs

The idea is to add another From implementation in the JsTypedArrayType! macro (which should probably be renamed for Rust standard naming convention, in snake_case).

This From would receive a Vec<$element>, and create a $name object. For this, it will need to first create an ArrayBuffer like this:

let my_vector: Vec<u8> = vec![];
let array_buffer = boa_engine::builtins::array_buffer::ArrayBuffer {
  array_buffer_data: Some(my_vector),
  array_buffer_byte_length: 8,
  array_buffer_detach_key: JsValue::undefined()
};
let array_buffer_js_value: JsValue = array_buffer.into();

It might be useful to have a JsArrayBuffer API too, and create the From constructor from that API object.

Then, you can create the internal JsObject with JsObject::from_proto_and_data. This should solve issues of some of the information here being private to boa_engine.

The last part, of converting a JsUint8Array (or any other sub-type) to a Rust Vec can be trickier. The issue here is that the Vec is referenced from multiple places (that's why we need the garbage collector). But this also means that you cannot just "move" out of it.

You can use Option::take() to take the vector, but this would then make the array unusable in JavaScript. You can also clone it out, but honestly, I would leave that to the user.

I think the best approach here is to have a 3-method interface: take(), borrow() and borrow_mut(). The first would take the vector, moving it, without any performance penalty, but would actually change the the JS structure (and therefore break it). It's still useful in case the context won't run anything else, which is a valid use case. The borrow() and borrow_mut() could just return a reference to the vector, making it possible to clone() it if needed, or they can actually modify it from Rust, if they select to do that, and make the changes available in JS. This would call to the Gc APIs, and could panic if it's borrowed elsewhere.

@nekevss
Copy link
Member

nekevss commented Jun 21, 2022

@Razican Can you really implement a borrow() and borrow_mut() for array_buffer_data? I had been playing around at how an API wrapper might work for ArrayBuffer for fun when I saw the above comment. I thought I'd try to add those methods but struggled to get it to work. Ultimately, I ended up with the below.

https://github.com/nekevss/boa/blob/feature/impl-dataview-arraybuffer-wrappers/boa_engine/src/object/jsarraybuffer.rs

@lastmjs
Copy link
Contributor Author

lastmjs commented Jun 21, 2022

I've attempted to follow your guidance here but run into a major problem early on. Here's what I'm trying to add to https://github.com/boa-dev/boa/blob/main/boa_engine/src/object/jstypedarray.rs:

impl From<Vec<$element>> for $name {
    #[inline]
    fn from(o: Vec<$element>) -> Self {
        let o_len = o.len();

        let array_buffer = crate::builtins::array_buffer::ArrayBuffer {
            array_buffer_data: Some(o),
            array_buffer_byte_length: o_len,
            array_buffer_detach_key: JsValue::undefined()
        };            

        let uint8_array_js_object = crate::object::JsObject::from_proto_and_data(
            Context::default().intrinsics().constructors().$constructor_object().prototype(),
            crate::object::ObjectData {
                kind: crate::object::ObjectKind::ArrayBuffer(array_buffer),
                internal_methods: &crate::object::internal_methods::ORDINARY_INTERNAL_METHODS
            }
        );

        // TODO not finished here
    }
}

The problem is that array_buffer_data must be a Vec<u8>, and I'm not sure how to proceed with converting all of the different types of Vec<numberType> to a Vec<u8>, and based on our conversations in the Discord this might not be feasible.

Here's some of the conversation from Discord:

Person 1:

Do you have suggestions for how the best way to convert a Vec<u16>, Vec<u32> etc into an ArrayBuffer?

Person 2:

Hm, that's a bit more complicated
Because JS only knows about u8 buffer arrays, the other typed arrays are just bit reinterpretations of the same buffer
Could be possible if you can transmute from Vec<u8> to Vec<numType> safely
Hm, doesn't seem like a transmute is possible
The alternative would be to change the definition of ArrayBuffer to take an Option<NumVec>, where NumVec is an enum containing every possible Vec<numType>
But that's a lot more work which involves rewriting a lot of the internals of ArrayBuffer

@HalidOdat HalidOdat linked a pull request Jul 6, 2022 that will close this issue
bors bot pushed a commit that referenced this issue Jul 6, 2022
This PR implements an optimization done by V8 and spidermonkey. Which stores indexed properties in two class storage methods dense and sparse. The dense method stores the property in a contiguous array ( `Vec<T>` ) where the index is the property key. This storage method is the default. While on the other hand we have sparse storage method which stores them in a hash map with key `u32` (like we currently do). This storage method is a backup and is slower to access because we have to do a hash map lookup, instead of an array access.

In the dense array we can store only data descriptors that have a value field and are `writable`, `configurable` and `enumerable` (which by default array elements are). Since all the fields of the property descriptor are the same except value field, we can omit them an just store the `JsValue`s in `Vec<JsValue>` this decreases the memory consumption and because it is smaller we are less likely to have cache misses.

There are cases where we have to convert from dense to sparse (the slow case):
- If we insert index elements in a non-incremental way, like `array[1000] = 1000` (inserting an element to an index that is already occupied only replaces it does not make it sparse)
- If we delete from the middle of the array (making a hole), like `delete array[10]`  (only converts if there is actualy an element there, so calling delete on a non-existent index property will do nothing)

Once it becomes sparse is *stays* sparse there is no way to convert it again. (the computation needed to check whether it can be converted outweighs the benefits of this optimization)

I did some local benchmarks and on array creation/pop and push there is ~45% speed up and the impact _should_ be bigger the more elements we have. For example the code below on `main` takes `~21.5s` while with this optimization is `~3.5s` (both on release build).

```js
let array = [];
for (let i = 0; i < 50000; ++i) {
   array[i] = i;
}
```

In addition I also made `Array::create_array_from_list` do a direct setting of the properties (small deviation from spec but it should have the same behaviour), with this #2058 should be fixed, conversion from `Vec` to `JsArray`, not `JsTypedArray` for that I will work on next :)
bors bot pushed a commit that referenced this issue Jul 7, 2022
This PR adds the `ArrayBuffer` rust wrapper. It also provides a capability to construct a `JsArrayBuffer` from a user defined blob of data ( `Vec<u8>` ) and it is not cloned, it is directly used as the internal buffer.

This allows us to replace the inifficent `Vec<u8>` to `JsArray` then to `TypedArray` (in typed arrays `from_iter`), with a `JsArrayBuffer` created from user data to `TypedArray`. With this `Vec<u8>` to `JsTypedArray` should be fully fixed as discussed in #2058.
@Razican
Copy link
Member

Razican commented Oct 10, 2022

I think this was solved in #2170, right? Can you confirm @lastmjs ?

@lastmjs
Copy link
Contributor Author

lastmjs commented Oct 11, 2022

I haven't tested it yet, our solution is still a bit hacky. But this looks like it would probably solve the problem. I plan to incorporate this eventually.

@Razican
Copy link
Member

Razican commented Oct 18, 2022

I will close this, since the solution seems to be there. If anyone finds that this is still a problem, feel free to comment here, and we can re-open it.

@Razican Razican closed this as completed Oct 18, 2022
bors bot pushed a commit that referenced this issue Nov 24, 2022
<!---
Thank you for contributing to Boa! Please fill out the template below, and remove or add any
information as you feel necessary.
--->

This Pull Request is related to the #2058 and the discussion in the discord chat.

It changes the following:

- Adds a `take` method to `JsArrayBuffer`
- Builds out `JsArrayBuffer` docs
- Adds a `JsArrayBuffer::take()` example to `jsarraybuffer.rs` in `boa_examples`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants