-
-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String interning #279
Comments
More specifically rustc interner. Rustc Symbol doc:
Which makes comparisons between symbols trivial, it is just a comparison between two u32. The cost is when interning a string, since it requires to do a HashMap lookup. I found a implementation of this string-interner |
Yep, this is interesting. I had a conversation the with Spidermonkey devs about how we use strings for internal slots, but later want to move to something more faster. I think they do something like this. I don't know very much about https://crates.io/crates/string-interner so this will be new to me, but im happy to let this issue be the conversation point. |
Nice! This is pretty mush how |
I think that the This could probably allow avoiding the use of |
I will give this a look, and maybe #336 too, at the same time. |
Hi @Razican Correct me if I'm wrong!!! String interning only has a good effect if the string that's it is interning is supposed to be immutable, like identifiers. var x = "";
for(var i = 0; i < 100; i++) {
x += "1"
} Without string interning in |
Yes, I was just trying a couple of things, but it does seem that way! |
This might partially conflict with #736. We might need to create our own interner that allows using UTF-16 strings. |
This Pull Request is part of #279. It adds a string interner to Boa, which allows many types to not contain heap-allocated strings, and just contain a `NonZeroUsize` instead. This can move types to the stack (hopefully I'll be able to move `Token`, for example, maybe some `Node` types too. Note that the internet is for now only available in the lexer. Next steps (in this PR or future ones) would include also using interning in the parser, and finally in execution. The idea is that strings should be represented with a `Sym` until they are displayed. Talking about display. I have changed the `ParseError` type in order to not contain anything that could contain a `Sym` (basically tokens), which might be a bit faster, but what is important is that we don't depend on the interner when displaying errors. The issue I have now is in order to display tokens. This requires the interner if we want to know identifiers, for example. The issue here is that Rust doesn't allow using a `fmt::Formatter` (only in nightly), which is making my head hurt. Maybe someone of you can find a better way of doing this. Then, about `cursor.expect()`, this is the only place where we don't have the expected token type as a static string, so it's failing to compile. We have the option of changing the type definition of `ParseError` to contain an owned string, but maybe we can avoid this by having a `&'static str` come from a `TokenKind` with the default values, such as "identifier" for an identifier. I wanted for you to think about it and maybe we can just add that and avoid allocations there. Oh, and this depends on the VM-only branch, so that has to be merged before :) Another thing to check: should the interner be in its own module?
This Pull Request is part of #279. It adds a string interner to Boa, which allows many types to not contain heap-allocated strings, and just contain a `NonZeroUsize` instead. This can move types to the stack (hopefully I'll be able to move `Token`, for example, maybe some `Node` types too. Note that the internet is for now only available in the lexer. Next steps (in this PR or future ones) would include also using interning in the parser, and finally in execution. The idea is that strings should be represented with a `Sym` until they are displayed. Talking about display. I have changed the `ParseError` type in order to not contain anything that could contain a `Sym` (basically tokens), which might be a bit faster, but what is important is that we don't depend on the interner when displaying errors. The issue I have now is in order to display tokens. This requires the interner if we want to know identifiers, for example. The issue here is that Rust doesn't allow using a `fmt::Formatter` (only in nightly), which is making my head hurt. Maybe someone of you can find a better way of doing this. Then, about `cursor.expect()`, this is the only place where we don't have the expected token type as a static string, so it's failing to compile. We have the option of changing the type definition of `ParseError` to contain an owned string, but maybe we can avoid this by having a `&'static str` come from a `TokenKind` with the default values, such as "identifier" for an identifier. I wanted for you to think about it and maybe we can just add that and avoid allocations there. Oh, and this depends on the VM-only branch, so that has to be merged before :) Another thing to check: should the interner be in its own module?
I think we can close this issue, as we are now using an interner in most of the engine. |
Will there be a string interner for more affective memory use and string comparisons.
Like rustc does it.
The text was updated successfully, but these errors were encountered: