Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringScanner::scan behaviour differs wrt MRI #1544

Closed
pepijnve opened this issue Jan 22, 2019 · 11 comments
Closed

StringScanner::scan behaviour differs wrt MRI #1544

pepijnve opened this issue Jan 22, 2019 · 11 comments

Comments

@pepijnve
Copy link

Tested with 1.0-RC11

Difference in behaviour is (I think) due to the anchor in /^\s+/. This test passes on MRI, fails with Truffle.

require 'strscan'

describe "StringScanner" do
  it 'should behave like MRI' do
    scanner = StringScanner.new("a b")
    expect(scanner.scan(/[a-z]+/)).to(eq("a"))
    expect(scanner.scan(/^\s+/)).to(eq(" "))
    expect(scanner.scan(/[a-z]+/)).to(eq("b"))
  end
end
@chrisseaton
Copy link
Collaborator

require 'strscan'
scanner = StringScanner.new("a b")
p scanner.scan(/[a-z]+/)
p scanner.scan(/^\s+/)
p scanner.scan(/[a-z]+/)

@chrisseaton chrisseaton self-assigned this Jan 22, 2019
@chrisseaton
Copy link
Collaborator

I think this is possibly covered by this failing spec

fails:StringScanner#scan treats ^ as matching from the beginning of the current position

I'll fix that and the other failing specs for StringScanner and then examine this bug again.

@chrisseaton
Copy link
Collaborator

Fix on the way.

@chrisseaton
Copy link
Collaborator

Fixed in df10d7a. It's not the best fix in the world because it has to create a new pattern in some cases, but it now passes the spec.

@eregon eregon added this to the 1.0.0-rc12 milestone Jan 23, 2019
@presidentbeef
Copy link

@chrisseaton - the fix for this is a bit limited. If the regex does not begin with ^ in the very first position, the fix will not work.

However, there are plenty of ways to introduce ^ that violate this assumption. For example:

/(^.)/
/a|^b/

etc.

Using a modified version of the code in the original issue:

truffleruby-1.0.0-rc14 :001 > require 'strscan'
 => true
truffleruby-1.0.0-rc14 :002 > scanner = StringScanner.new("a b")
 => #<StringScanner 0/3 @ "a b">
truffleruby-1.0.0-rc14 :003 > scanner.scan(/[a-z]+/)
 => "a"
truffleruby-1.0.0-rc14 :004 > scanner.scan(/(^\s+)/)
 => nil
truffleruby-1.0.0-rc14 :005 > scanner.scan(/^\s+/)
 => " "

I bring this up because I'd like to test Brakeman on TruffleRuby. Brakeman depends on ruby_parser, which uses StringScanner extensively, and happens to have (at least one) regex where this occurs.

Let me know if you'd like this to be filed as a separate issue.

@chrisseaton
Copy link
Collaborator

Yes the fix was a bit of a hack! We'll keep thinking.

@chrisseaton
Copy link
Collaborator

We're working on this.

@eregon eregon modified the milestones: 1.0.0-rc12, 1.0.0-rc16 Apr 10, 2019
@aardvark179
Copy link
Contributor

A proper fix for this is currently going through CI.at the moment.

@aardvark179
Copy link
Contributor

The fix has been merged into master

@presidentbeef
Copy link

Hi,

Sorry to continue poking at this, but...

$ irb
truffleruby-19.0.0 :001 > require 'strscan'
 => true
truffleruby-19.0.0 :002 > s = StringScanner.new("ab")
 => #<StringScanner 0/2 @ "ab">
truffleruby-19.0.0 :003 > s.scan(/#{/^[a-z]/}/)
 => "a"
truffleruby-19.0.0 :004 > s.scan(/#{/^[a-z]/}/)
 => nil
$ irb
2.6.1 :001 > require 'strscan'
 => true
2.6.1 :002 > s = StringScanner.new("ab")
 => #<StringScanner 0/2 @ "ab">
2.6.1 :003 > s.scan(/#{/^[a-z]/}/)
 => "a"
2.6.1 :004 > s.scan(/#{/^[a-z]/}/)
 => "b"

Please let me know if you'd like this in a separate issue. Thanks!

@chrisseaton
Copy link
Collaborator

For information, the reason it's so complicated is that we try to avoid copying strings. The regular expression implementation we use (JONI) isn't set up for our persistent data structures and we're having a bit of trouble matching the two up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants