-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize generated tables ; clean up unicode.py #73
Conversation
There was an easy opportunity to better optimize the tables generated by unicode.py. Not sure why I didn't catch this long ago, but in any case now the tables are substantially smaller and should maybe improve performance slightly. There was also some dead code sitting in unicode.py that I pulled out.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @huonw (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see CONTRIBUTING.md for more information. |
Could you give a high-level summary of how the tables changed? (github isn't showing the diff and I'm finding it hard to visualise from just the python changes.) |
My suspicion is that this script was once used to generate other Unicode data that wasn't used by @kwantam I'm still interested in a high level summary though. I think I wrote the original version of this over a year ago for |
Quick background: the property tables that Now, in some cases, the tables that were previously generated contained ordered pairs that should have been merged together. For example,
Note that, for example,
Net, In case you're wondering, the reason this was happening is because of the way that the Unicode-provided tables were formatted. The function that imports these tables,
@BurntSushi I think the reason Anyhow, as you say, a lot of the stuff that this patch removes from |
Nice! Thanks @kwantam! |
optimize generated tables ; clean up unicode.py
Apply optimization described in rust-lang/regex#73 (comment) to rust's copy of `unicode.py`. This shrinks librustc_unicode's tables.rs from 479kB to 456kB, and should improve performance slightly for related operations (e.g., is_alphabetic(), is_xid_start(), etc). In addition, pull in fix from @dscorbett's commit d25c39f86568a147f9b7080c25711fb1f98f056a in regex, which makes `load_properties()` more tolerant of whitespace in the Unicode tables. (This fix does not result in any changes to tables.rs, but could if the Unicode tables change in the future.)
Apply optimization described in rust-lang/regex#73 (comment) to rust's copy of `unicode.py`. This shrinks librustc_unicode's tables.rs from 479kB to 456kB, and should improve performance slightly for related operations (e.g., is_alphabetic(), is_xid_start(), etc). In addition, pull in fix from @dscorbett's commit d25c39f86568a147f9b7080c25711fb1f98f056a in regex, which makes `load_properties()` more tolerant of whitespace in the Unicode tables. (This fix does not result in any changes to tables.rs, but could if the Unicode tables change in the future.)
There was an easy opportunity to better optimize the tables generated
by unicode.py. Not sure why I didn't catch this long ago, but in any
case now the tables are substantially smaller and should maybe improve
performance slightly.
There was also some dead code sitting in unicode.py that I pulled out.