Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow (large) Uint8Array comparison #1915

Open
appy-one opened this issue Jun 4, 2021 · 7 comments
Open

Slow (large) Uint8Array comparison #1915

appy-one opened this issue Jun 4, 2021 · 7 comments
Labels

Comments

@appy-one
Copy link

appy-one commented Jun 4, 2021

While creating unit tests for AceBase, I had to compare whether stored binary data equals the generated data, which is 5MB large. The expect comparison method is very slow on this value, takes about 5 seconds:

// Generate data
let data = new Uint8Array(5 * 1000 * 1000);
for(let i = 0 ; i < data.length; i++) { data[i] = i % 255; }
// .. Store in the database & retrieve it again into variable 'stored'
expect(stored).toEqual(data); // <-- SLOW

If I compare the data myself with a loop, this is way faster (takes few ms):

let isEqual = true;
for(let i = 0; i < data.length && isEqual; i++) {
   isEqual = data[i] === stored[i]; 
}
expect(isEqual).toBeTrue();

EDIT: I'm using jasmine v3.7.0, jasmine-core v3.7.1

@sgravrock
Copy link
Member

This is more interesting (read: harder) than I thought it was going to be. I expected that the major sources of overhead would be the recursive calls to MatchersUtil#eq_ and type checks for each array element. But it turns out that those sources barely show up in the profiler at all. Here's where the time is actually going:

  • The bookkeeping needed to build object diffs costs about half a second.
  • Calling Object.keys on each array costs about 3/4 second.
  • Typed arrays go through the general object comparison path, rather than the array-specific path which is about twice as fast. (I'm not sure whether running typed arrays through the array-specific code path would be correct.)
  • There's a fair bit of "death by a thousand cuts" going on that doesn't add up to any obvious optimization opportunities.

It's possible that we could send typed arrays down the array code path without causing any problems. That would save about half the time, and if someone wants to do the work of figuring out whether that's correct, I'd be happy to review a PR. But that still leaves us in multiple seconds territory, which I suspect isn't going to make you much happier than you are now. I don't see any obvious way to improve things beyond that without taking away features that already exist (custom equality tester support, checking for extra properties) or ruling out obvious future enhancements (showing diffs for typed arrays like we do for plain arrays).

If you know that you don't add any extra properties to your typed arrays and you don't have any relevant custom equality testers, you can implement the fast version in a custom equality tester:

  jasmine.addCustomEqualityTester(function(a, b) {
    if (a.constructor.name !== 'Uint8Array' || b.constructor.name !== 'Uint8Array') {
      return undefined;
    }

    if (a.length !== b.length) {
      return false;
    }

    for (let i = 0; i < a.length; i++) {
      if (a[i] !== b[i]) {
        return false;
      }
    }

    return true;
  });

@sgravrock sgravrock transferred this issue from jasmine/jasmine-npm Jun 5, 2021
@appy-one
Copy link
Author

appy-one commented Jun 7, 2021

For typed arrays there is no need to check each entry for type equality since there is no way you can store something else than a number in a typed array. Your custom testing code above would be the perfect check for any typed array, maybe you should consider implementing their own route instead of going through the Object or Array routes? I understand the issue with devs maybe defining custom properties on their typed arrays, but they should be the ones requiring custom testing methods, not those using typed arrays the way they should be used 😉. I'll add your workaround code for now, thx!

@sgravrock
Copy link
Member

I'm glad the workaround worked for you.

I understand the issue with devs maybe defining custom properties on their typed arrays, but they should be the ones requiring custom testing methods, not those using typed arrays the way they should be used 😉.

People sometimes say that we should break someone else's code to deliver an improvement they want. I've never yet heard anyone say that we should break their code to deliver an improvement that someone else wants.

Comparing typed arrays that have extra properties on them is an obscure but legitimate use case, much like comparing really huge typed arrays or wanting a spec to run for more than 28 days. If we broke that, anyone who depended on that capability would be rightly upset. All the more so because the change would turn a failing spec into an incorrectly passing spec, which is one of the worst kinds of breaking change a testing framework can make.

We could switching to a faster comparison algorithm for typed arrays in the next major release. But we'd need to weigh the amount of user interest in faster comparisons against the costs: It would cut typed arrays off from the various customizations that the matcher system supports, and we'd have to forever treat them as special cases for diff building.

@aleen42
Copy link

aleen42 commented Jan 13, 2022

Recently, I have found that if I used PhantomJS headless integrated with jasmine to test the following case, it should block
and keep executing until timeout:

it('test Uint8Array', () => {
    expect(new Uint8Array([])).toEqual(new Uint8Array([]));
});

I think it is due to the JS engine of PhantomJS, which has scaled up the performance problem.

The possible workaround is:

beforeEach(() => {
    jasmine.addCustomEqualityTester((a, b) => {
        if ([
            Int8Array, Uint8Array, Uint8ClampedArray,
            Int16Array, Uint16Array,
            Int32Array, Uint32Array, Float32Array,
            Float64Array,
        ].some(i => a instanceof i || b instanceof i)) {
            return Array.from(a).every((v, i) => v === b[i]);
        }
    });
});

But I hope that Jasmine can use fast-deep-equal to enhance performance when comparing two objects recursively.

@sgravrock
Copy link
Member

PhantomJS is not supported. Its last release was in 2016 and development ceased in 2018. We haven't been able to test Jasmine against it since version 3.4 (almost three years ago) due to Selenium dropping support for it, and we removed the last PhantomJS support code in Jasmine 4.0. I recommend switching to a maintained headless browser such as headless Chrome.

I don't think it makes sense to replace Jasmine's equality testing with a package like fast-deep-equal. Even if it was perfectly compatible with Jasmine's existing equality logic, we'd have to remove features that have hooks into equality like object diffs, custom equality testers, asymmetric equality testers, and custom object formatters.

@sgravrock
Copy link
Member

See #1966. The popular state management framework mobx adds non-numeric properties to arrays. This suggests that it's actually fairly common for arrays to have extra properties. Not only that, but users might often not know that their arrays have extra properties. I think this rules out the idea of switching to a faster but less accurate comparison algorithm. I don't think it would even be safe to offer one as an opt-in global configuration setting.

I could see adding something like an asymmetric equality tester that compares only properties in the 0 to length-1 range. That would be opt-in for each array being compared, which limits the potential harm caused by people using it when they don't know their arrays have extra properties.

@sgravrock sgravrock added the bug label Jun 30, 2022
@mm-jpoole
Copy link

If you are looking to compare binary data, you could just hash it then compare the hashes rather than checking every byte for equality. That should be quite a lot faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants