Faster whitespace detection #26

non · 2015-02-11T15:03:32Z

At nescala Matthias suggested there was a faster way to check for possible whitespace characters using bitwise logic. We should look into that.

softprops · 2015-02-11T18:36:54Z

fancy

propensive · 2015-02-11T19:36:00Z

Did he explain it to you? He talked me through it - it's simple, cunning and opportunistic. :)

non · 2015-02-11T20:19:38Z

I don't remember exactly how it worked. I will try to reverse engineer it (or ask him) later.

propensive · 2015-02-11T21:25:49Z

The basic idea, IIRC, was that because all the whitespace characters are low in the ASCII table, you can shift 1.byte left by each byte, AND it with a bitmask containing a few strategically-placed 1s, and if the answer is nonzero, then it's whitespace. I forget how much of a saving this was, but Matthias had worked out exactly what the difference was at the machine level, and it saved a tiny amount in a particularly hot hotspot...

non · 2015-02-11T21:41:56Z

@propensive If you can a link to some cod that does this I'd be appreciative.

My instinct was first to just try an initial c <= 32 test to find any potential whitespace candidate (since any char that is 32 or less is either whitespace or illegal).

propensive · 2015-02-11T22:22:02Z

I don't remember exactly -- I only saw the code on his computer, but recollection is that it was something like this:

def isWhitespace(b: Byte) = ((1L << b) & 4294977024L) != 0

where I calculated that magic number from List('\n', '\r', ' ', '\t').map(1L << _).sum.

Though do you need it to fail fast? Could you do the shift (as above), and OR the result from each byte into an accumulator which you check once at the end of parsing, something like this? Just throwing ideas out there... I've no idea if this actually works, or profiles well...

i10416 mentioned this issue Sep 25, 2023

Question: does whitespace detection by bitwise operation make jsoniter-scala faster? plokhotnyuk/jsoniter-scala#1086

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster whitespace detection #26

Faster whitespace detection #26

non commented Feb 11, 2015

softprops commented Feb 11, 2015

propensive commented Feb 11, 2015

non commented Feb 11, 2015

propensive commented Feb 11, 2015

non commented Feb 11, 2015

propensive commented Feb 11, 2015

Faster whitespace detection #26

Faster whitespace detection #26

Comments

non commented Feb 11, 2015

softprops commented Feb 11, 2015

propensive commented Feb 11, 2015

non commented Feb 11, 2015

propensive commented Feb 11, 2015

non commented Feb 11, 2015

propensive commented Feb 11, 2015