Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-5861): optimize parsing basic latin strings #642

Merged
merged 9 commits into from
Jan 31, 2024

Conversation

nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented Jan 29, 2024

Description

What is changing?

  • BSON's UTF8 parsing logic will attempt to use String.fromCharCode if bytes are under 128
  • Move UTF8 validation after attempting the above, if bytes are below 128 there is nothing to validate
Is there new documentation needed for these changes?
  • No

What is the motivation for this change?

This has shown to have much better performance for strings that fit in the basic latin range. Quite often this is BSON keys.

Release Highlight

BSON short basic latin string parsing performance improved!

The BSON library's string decoding logic now attempts to optimize for basic latin (ASCII) characters. This will apply to both BSON keys and BSON values that are or contain strings. If strings are less than 6 bytes we observed approximately ~100% increase in speed while around 15 bytes the performance was about ~30% better. For any non-basic latin bytes or at 20 bytes or greater the BSON library will continue to use Node.js' Buffer.toString API.

The intent is to generally target the deserialization of BSON keys which are often short and only use basic latin, Et tu, _id?

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken force-pushed the NODE-5861-string-perf branch from e6a0721 to d956aea Compare January 29, 2024 21:16
@nbbeeken nbbeeken marked this pull request as ready for review January 30, 2024 22:03
addaleax
addaleax previously approved these changes Jan 31, 2024
Copy link
Contributor

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

src/utils/latin.ts Outdated Show resolved Hide resolved
@durran durran self-assigned this Jan 31, 2024
@durran durran added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Jan 31, 2024
@nbbeeken nbbeeken requested a review from durran January 31, 2024 19:33
durran
durran previously approved these changes Jan 31, 2024
@durran durran added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Jan 31, 2024
Copy link
Contributor

@baileympearson baileympearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one small suggestion, otherwise looks good

src/utils/latin.ts Outdated Show resolved Hide resolved
@durran durran merged commit cdb779b into main Jan 31, 2024
4 checks passed
@durran durran deleted the NODE-5861-string-perf branch January 31, 2024 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants