-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize the newline_list to avoid recomputing it again later #2381
Conversation
b88ef1d
to
b836f03
Compare
Should be green and ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something bothers me that we are putting Source into ParseResult. I feel like we should consider only putting LineOffsets into it. Our end goal is to not have any source by the time we get our serialized source back. We just have not made it to that point yet.
If this is true then Source is nothing more than startLine and source bytes[]. Do we actually need startLine or could this be boiled into the linecalcs as well? I did not read that but I wonder if we ever need that value after parsing is over. If so then Source can go away as well.
@eregon Also simple error in build-java failing in CI but you obviously will fix that. |
@enebo I think it's a lot cleaner and less error-prone this way, and it's also exactly how it works in the Ruby API (how
It cannot be stored in the lineOffsets (which are offsets in the byte[]/char* source, so 0-based), it needs to be a separate field. Regarding once we don't need the source bytes anymore for Loader, I think that's fine, then one would be able to just pass It is also something this PR helps significantly, since the lineOffsets now come from the serialized form and there is no longer a need to use the If it's problematic later for some unforeseen we can of course always rework this, it's pretty easy to change, b836f03 is a pretty small commit. Are you OK with that? |
b836f03
to
1fac105
Compare
@eregon yeah I forgot we are just getting an index as well so I did not think through the startLine comment. Will you be keeping the source? I don't see anyone doing this with location fields removed but for FFI consumers who have all the location data they probably do want the source (although they do already have it). I am ok though with landing this. We can argue out how we deal with source bytes once we actually don't need it for anything. I am isolating this code outside the JRuby codebase so these APIs can change as much as they want and it won't require a new JRuby release. Just a new gem release containing a newly made provider. |
@eregon I am only approving Java changes as I did not look very closely at the rest. |
Until TruffleRuby can cache the serialized form on disk (TBD where/how, not for the 24.0 release) it needs to read the source as a byte[] anyway to pass it to parseAndSerialize().
Don't you vendor a specific version/commit of Prism in JRuby when using it as the parser for Ruby code? Or do you also use the Java Loader when Prism is used as a gem? I wouldn't expect that since what's tested in CI on JRuby is the Ruby API using FFI (so not using the Java Loader). |
I am isolating the Java part as an artifact and it is included into a Ruby gem which is effectively a "barely" fork of prism gem that has been renamed. That gem includes the artifact and will compile the source with the ENVs we want when the gem is installed (later we will probably pre-compile to remove compile requirement on main platforms). The rationale for doing this is if there is a CVE or just a good fix there is a high likelihood I only have to do a gem release and not a full JRuby release. |
I see, makes sense. Could be worth documenting at https://github.com/ruby/prism/blob/main/docs/build_system.md#building-prism-as-part-of-jruby |
@kddnewton Could you review & merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally fine with the changes, just a couple of things I'd like to see changed.
* That way there is no half-initialized Source object visible to the caller. * Consistent with Prism::ParseResult in Ruby.
2e6c772
to
cace09f
Compare
* As the user should not set these. * Use #instance_variable_set/rb_iv_set() instead internally.
@kddnewton I think I addressed all your comments, could you merge? |
50924b0
to
b2c3222
Compare
…ension * Faster that way: $ ruby -Ilib -rprism -rbenchmark/ips -e 'Benchmark.ips { |x| x.report("parse") { Prism.parse("1 + 2") } }' 195.722k (± 0.5%) i/s rb_iv_set(): 179.609k (± 0.5%) i/s rb_funcall(): 190.030k (± 0.3%) i/s before this PR: 183.319k (± 0.4%) i/s
b2c3222
to
eaf7c2f
Compare
Part of #2402