-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pathname objects are treated as IO and corrupt input on parse #2110
Comments
Hi @doriantaylor! Thanks for opening this issue. There's already a PR open -- see #1821 -- which describes the problem and also outlines some of the challenges in solving it well. Maybe we can have the conversation about this approach there? |
ah damn shoulda checked. ;) okay schlepping over there… |
Reopening this issue because that other patch is silly. Looks like |
@doriantaylor Hey again.
Well, it wasn't what I wanted and I said so, but I'd like to encourage you to avoid using terms that imply absolute judgement on other folks' work, particularly when it's a submission from a first-time contributor. I want people to have a thoroughly enjoyable time when they start contributing! Thanks for understanding.
I'm not sure I understand what you mean, master is green except for experimental TruffleRuby builds that are non-blocking: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri It looks like you submitted a PR at #2111 so we can continue the conversation there! |
Apologies, I just meant there wasn't enough of it to save. As for |
- no need to resolve Pathname in Nokogiri.parse - added test coverage for XML::Document and HTML::Document#parse - moved resolution of Pathname to after `url` is set in HTML::Document#parse - open files with implict "r" because "b" assumes ASCII-8BIT encoding Part of sparklemotion#2110, sparklemotion#1821
- no need to resolve Pathname in Nokogiri.parse - added test coverage for XML::Document and HTML::Document#parse - moved resolution of Pathname to after `url` is set in HTML::Document#parse - open files with implict "r" because "b" assumes ASCII-8BIT encoding Part of sparklemotion#2110, sparklemotion#1821
- added test coverage for objects that respond to `#path` - added test coverage for `Pathname` specifically Related to sparklemotion#2110.
- added test coverage for objects that respond to `#path` - added test coverage for `Pathname` specifically Related to sparklemotion#2110.
I just ran into a situation where
Pathname
objects get treated asIO
instead of a file path.The problem is that a
Pathname
object will be passed toio_read_callback
whereread
is called on it, but sincePathname
objects have no state, the first ~4000 (4096?) bytes of the file get read in over and over until silently kiboshed by the parser (assuming default flags).The solution would be to put a branch in the parser logic to pull out
Pathname
objects for special treatment, either by flattening them to locations or explicitly opening them as file handles. This is a straightforward fix I am offering to do; I am registering this issue beforehand to gauge interest.The text was updated successfully, but these errors were encountered: