Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: simplify flatten function using Array.prototype.flat #3354

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

dvd101x
Copy link
Collaborator

@dvd101x dvd101x commented Jan 22, 2025

Just a suggestion to use the native method of Arrays.

@josdejong
Copy link
Owner

That is nice. I would like to do that, though looking at support of Array.prototype.flat we may be a bit early: https://caniuse.com/mdn-javascript_builtins_array_flat. So, since this doesn't really "hurt", I think we can wait a bit until there is more broad support for it.

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 24, 2025

Hi Jos, thanks for reviewing this.

Ok, maybe later, also noticed this is a bit faster.

@josdejong
Copy link
Owner

Great that it is faster, that makes sense!

What we can do for now is: use Array.prototype.flat when it is available, and otherwise fallback on the custom implementation?

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 24, 2025

It makes sense.

In the latest commit I included the fallback and also changed to use for loops thus closing the gap in performance.

Tested for performance

method improvement
current (reference) 0%
flat (Infinity) 29%
flatten with for ( fallback ) 16 %

I've been reviewing the algorithms in utils/array.js and I think there is an area of opportunity for arrays that are in matrix form like [[1, 2], [3, 4]] unlike [[1, 2], [3]] or [[1, 2], 3].

I'm testing a flatten function that skips some validations if the array is in matrix form that shows like 2X improvement. Will give it a try next and make a new commit in a few days.

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 25, 2025

By changing the arr.map to a for it did have a benefit but only in chromium. Maybe I should leave it as map for readability?

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 26, 2025

Now there is a faster flattening algorithm for Matrices and Arrays with an homogenous size.

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 27, 2025

The algorithm for arrays with homogeneous sizes works faster, but verifying its applicability is costly. DenseMatrix instances are already validated. However, for arrays, validation is still required. This presents a tradeoff: it's advantageous for large arrays but not for small ones.

Considerations:

  • Should the size of the array be experimentally tested to determine the optimal size and even test the cost of checking the size?
  • Perhaps expose the isHomogeneous flag to the user and pass it on?
  • Assume that arrays might not have a homogeneous size and use the faster algorithm only for DenseMatrix.

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 28, 2025

The latest commit return the algorithms in a way that they should always be faster regardless of the case, matrix, array with homogeneous size and array with inhomogeneous size. There is still an area of opportunity to check if the array has homogenous size that might be faster for large arrays.

@josdejong
Copy link
Owner

Ah, I see what you did: two different implementations. So how much performance difference does it yield in the end? I think we should be careful overoptimizing this, since a different browser engine or the next update of Chrome may do things differently again. So, the additional code and complexity should be really worth it.

Is there a benchmark test already so I can do some comparisons on my machine?

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 30, 2025

Hi Jos, thank you for reviewing this.

Here are the results I got and I'm including the code at the end:

test time dev error dev time this error this improvement
flatten(Matrix) 500.75 µs ±0.73% 469.69 µs ±0.59% 6.3%
flatten(numberMatrix) 505.30 µs ±0.96% 480.30 µs ±0.88% 4.9%
flatten(arrayWithHomogeneousSize) 149.00 µs ±0.25% 125.90 µs ±0.39% 15.5%
flatten(arrayWithNonHomogeneousSize) 25.91 µs ±0.73% 24.21 µs ±0.30% 6.6%

flatten

Those are very valid concerns

  • Regarding the two different implementations: The optimization assumes that if the first element of the array is an array, then all elements are arrays, eliminating the need for individual checks.
  • Regarding additional complexity: Changing from map to a for loop might affect readability with no performance benefit in Safari but some in Chrome. I'm not entirely certain about this, I saw in the codebase some interest in changing to for loops for performance and that's why I went that way but it certainly could be removed.

I don't have a strong opinion about this, I could either change to map and test again, include flatten as a method for denseMatrix to avoid the use of .valueOf and of course I would wait on your take on this after testing.

Appendix

import { Bench } from 'tinybench'
import { DenseMatrix, flatten, random } from '../../lib/esm/index.js'
import { formatTaskResult } from './utils/formatTaskResult.js'

const genericMatrix = new DenseMatrix(random([10, 10, 10, 10]))
const numberMatrix = new DenseMatrix(genericMatrix, 'number')
const arrayWithHomogeneousSize = genericMatrix.valueOf()
const arrayWithNonHomogeneousSize = [random([10, 10, 10]), random([10, 9, 9]), random()]

const bench = new Bench({ time: 500, iterations: 100 })
  .add('flatten(Matrix)', () => {
    flatten(genericMatrix)
  })
  .add('flatten(numberMatrix)', () => {
    flatten(numberMatrix)
  })
  .add('flatten(arrayWithHomogeneousSize)', () => {
    flatten(arrayWithHomogeneousSize)
  })
  .add('flatten(arrayWithNonHomogeneousSize)', () => {
    flatten(arrayWithNonHomogeneousSize)
  })
  .add('flatten(array).map(abs)', () => {
    numberMatrix.map(abs)
  })

bench.addEventListener('cycle', (event) => console.log(formatTaskResult(bench, event.task)))
await bench.run()

@josdejong
Copy link
Owner

Thanks for testing and for sharing your benchmark.

I've did run the benchmark too, but in my case the results are worse than develop 🤔 (running Node.js 20)

# develop

flatten(Matrix)                        367.89 µs   ±1.04%
flatten(numberMatrix)                  329.75 µs   ±1.14%
flatten(arrayWithHomogeneousSize)      114.19 µs   ±0.50%
flatten(arrayWithNonHomogeneousSize)    20.76 µs   ±0.33%

# PR #3354

flatten(Matrix)                        386.34 µs   ±1.11%
flatten(numberMatrix)                  391.91 µs   ±1.59%
flatten(arrayWithHomogeneousSize)      746.43 µs   ±0.43%
flatten(arrayWithNonHomogeneousSize)   116.00 µs   ±0.34%

I don't understand the big differences in flattening an array between your machine and mine, maybe I'm doing something wrong.

Anyway, even in the most positive case (your results) I have the feeling that the differences are so small that it is not worth the additional complexity of two different implementations. What do you think?

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 30, 2025

That's very interesting. I did test using node v23.6.1 using a windows machine with 32 GB RAM and a Core i9, I have another machine available, will try to run the same benchmark on that one.

To develop these algorithms I was mostly using firefox and jsbench. I had high hopes for the skip made for the case of the homogenous size as it made a significant improvement in my tests.

I'm including a link on jsbench.
https://jsbm.dev/uwFEkhJLyceFK

@josdejong
Copy link
Owner

👍 please don't spend too much time on this, we shouldn't overengineer or micro opimize this 😅

Copy link
Collaborator Author

@dvd101x dvd101x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jos

I did spend a little time testing on another computer and even the method of arr.flatten(Infinity) wasn't a consistent improvement in speed in node v22.11.0. I think what I was seeing as impropvement was more Firefox and Safari related.

So I reverted back to the original form, the only notable improvement in speed I found was on the call to flatten on a matrix, it sped up by using valueOf() instead of toArray(). Maybe it is a matter of safety, but since the flatten method makes no assignments I thought it may be ok.

If it indees should be toArray() then this PR could be only for some spelling in comments or closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants