Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize generating UUIDs from strings #7030

Merged
merged 2 commits into from
Nov 13, 2018

Conversation

jgaskins
Copy link
Contributor

@jgaskins jgaskins commented Nov 6, 2018

I'm building a driver for Neo4j and wanted to check on the overhead of generating UUIDs from strings since it's common to store UUIDs as strings in Neo4j. While it was still really good (1M UUIDs/sec), I noticed that UUID incurred a bunch of heap allocations for hardcoded arrays and intermediate strings, so I wanted to see if it could be improved.

In total, generating a UUID from a 36-char string involved creating 2 intermediate arrays and 32 intermediate strings (each hex pair in the hardcoded array has one within the loop and another in string_has_hex_pair_at!). Moving these to use stack-based values instead of heap, I was able to get a 4x performance boost.

➜  Code crystal build --release benchmark_optimized_uuid.cr
➜  Code ./benchmark_optimized_uuid
59c87908-4c45-4a94-b45f-2d338c2b2e17
59c87908-4c45-4a94-b45f-2d338c2b2e17
   string   1.03M (970.15ns) (± 0.44%)  688 B/op   4.11× slower
optimized   4.23M (236.14ns) (± 0.59%)    0 B/op        fastest

Benchmark code

@straight-shoota
Copy link
Member

straight-shoota commented Nov 6, 2018

There is even more room for improvement.

The conversion of hex pairs is duplicatesdin the main method and string_has_hex_pair_at!. The helper method should directly return the byte value if it is successfully read.

@asterite
Copy link
Member

asterite commented Nov 6, 2018

Awesome! I don't know how those hardcoded arrays passed a previous review.

By the way, for those who wondered about a use case for to_i and friends from a slice instead of a string, here's one :-)

@asterite
Copy link
Member

asterite commented Nov 6, 2018

(though the current code will probably be faster than to_u8 on a slice because it goes straight to the point, but this is just a guess)

@ysbaddaden
Copy link
Contributor

Or, since strings are immutable, maybe String#[offset, length] could return a sub String with a pointer to the parent String allocation, instead of always allocate/copy? It would be simpler than going through a Slice.

@asterite
Copy link
Member

asterite commented Nov 6, 2018

@ysbaddaden The problem is that a String knows its bytesize and utf-8 size, and that's right at the beginning of a String. Another idea would be to have String be a sort of a union (as in C union) where it can be an inlined string or a string reference to some other string. But then the problem with that is taking a small substring from a huge string and never being able to free the huge string...

I think for performance it's better to go to Slice with to_slice.

This improves performance by another 2x, which makes sense because it's
doing the conversion once instead of twice.
@jgaskins
Copy link
Contributor Author

jgaskins commented Nov 6, 2018

to_i and friends from a slice instead of a string

This is actually kinda where I was trying to go with this, but I figured I'd go with the smaller change. :-)

The conversion of hex pairs is duplicatesdin the main method and string_has_hex_pair_at!. The helper method should directly return the byte value if it is successfully read.

I found so many roadblocks trying to do that the first time (basically, I felt I had to choose between two potential nil values with to_u8? or a different error with to_u8) but it turns out this does indeed double performance again — we're now up to 8x.

➜  Code crystal build --release benchmark_optimized_uuid.cr && ./benchmark_optimized_uuid
df817978-7772-49fb-8b2f-6293a8e212db
df817978-7772-49fb-8b2f-6293a8e212db
   string   1.04M ( 961.5ns) (± 0.37%)  688 B/op   8.19× slower
optimized   8.52M (117.33ns) (± 0.46%)    0 B/op        fastest

@asterite
Copy link
Member

asterite commented Nov 6, 2018

Another optimization could be to work directly with the return value from string.to_slice instead of string. For example doing string[i] will check if the string is ascii-only and then index fast into it, otherwise it will have to go utf-8 char per char to check that. Most probably the string will be ascii-only, but why do the check all the time if we can skip that? (the only problem is that you'll have a byte, but you can turn that into a char with chr, or maybe it's faster not do use to_u8?, but at that point the performance gain won't be much, I think)

@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 6, 2018

Or, since strings are immutable, maybe String#[offset, length] could return a sub String with a pointer to the parent String allocation, instead of always allocate/copy? It would be simpler than going through a Slice.

@ysbaddaden One of the main downsides with this is that it silently can use up a lot of memory, if the substrings have a longer life time than the parent string. Java used to have substrings behave like you suggests here but moved away from it.

Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jgaskins 👍

@sdogruyol sdogruyol merged commit 3a2b940 into crystal-lang:master Nov 13, 2018
@sdogruyol sdogruyol added this to the 0.27.1 milestone Nov 13, 2018
@jgaskins jgaskins deleted the optimize-uuid branch November 13, 2018 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants