Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nil assertion failed when accessing $~ #4776

Open
hugoabonizio opened this issue Aug 1, 2017 · 40 comments
Open

Nil assertion failed when accessing $~ #4776

hugoabonizio opened this issue Aug 1, 2017 · 40 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:docs topic:compiler:semantic

Comments

@hugoabonizio
Copy link
Contributor

When a regex match fails, accessing $~ raises an exception: https://carc.in/#/r/2gks

puts $~
Nil assertion failed (Exception)
0x5627dc75aa80: not_nil! at /usr/lib/crystal/class.cr 65:0
0x5627dc73254e: __crystal_main at /usr/lib/crystal/thread/mutex.cr 1:1
0x5627dc741989: main at /usr/lib/crystal/main.cr 12:15
0x7fc2284c44ca: __libc_start_main at ??
0x5627dc731e6a: _start at ??
0x0: ??? at ??

However, I think the Ruby's behavior would be more useful in this case which is return nil: https://carc.in/#/r/2gkt

Maybe related: #3978 #3648

$ crystal -v
Crystal 0.23.1 [e2a1389] (2017-07-13) LLVM 3.8.1
@straight-shoota
Copy link
Member

The documentation says:

After performing a match, the special variable $~ will be an instance of Regex::MatchData if it matched, nil otherwise.

So $~ is only valid after a match, and then it is either MatchData object or nil if it did not matched.
I find it reasonable to throw an exception if it is not accessed after a match, although it could have a specific error message.

@hugoabonizio
Copy link
Contributor Author

@straight-shoota I understand, but this behavior makes impossible to use $~ whithin a condition expression after trying to match, for example: https://carc.in/#/r/2gma

"a" =~ /b/ || $~

@drosehn
Copy link

drosehn commented Aug 1, 2017

Hmm. I think that works if you use an explicit if, but not if you're trying to use conditionals in a one-liner (as in your example). I'd generally use:

if "a" =~ /b/
    $~    # or whatever
end

This might be a situation where crystal could be smarter about the one-liner case.

hugoabonizio referenced this issue in crystal-community/autolink.cr Aug 2, 2017
@hugoabonizio
Copy link
Contributor Author

@drosehn I ran up into this incompatibility when trying to port this Ruby code:

def auto_linked?(left, right)
  (left =~ AUTO_LINK_CRE[0] and right =~ AUTO_LINK_CRE[1]) or
    (left.rindex(AUTO_LINK_CRE[2]) and $' !~ AUTO_LINK_CRE[3])
end

I found a workaround, but the same happens for $1 and $1?.

IMO $~ should be nil by default, but if it's not possible, the error message should be more accurate at least. Now it gives no clue on what's happening since I'm not calling #not_nil! in my code.

@drosehn
Copy link

drosehn commented Aug 2, 2017

I'm just another crystal user here, not a developer of the compiler. This seems to me like a situation where the compiler could do a better job, but I don't have the experience or time to investigate/fix the compiler itself. I was just suggesting one tactic you could try to work around it.

As something of an aside: Note that crystal never promises to be completely compatible with ruby. In some situations it absolutely cannot be (because the goal is to compile into very fast code). In other situations the people who are doing all the work to write crystal happen to have their own opinions on the best way for the language to work. They might agree completely with my earlier comment, or they might not. However I know that I don't have the time to write my own compiler (not even if I spent the rest of my life trying to do it!).

Cheerio.

@makenowjust
Copy link
Contributor

@hugoabonizio Sring#rindex(Regex) on Crystal doesn't update $~. I guess your issue comes from this. In regard to this point, "$~ should be nil" and etc are irrelevant.

But I actually agree with the part of your opinion: "the error message should be more accurate at least."

@hugoabonizio
Copy link
Contributor Author

@drosehn I understand the Crystal goals, just pointed this situation where - agreeing with you - "the compiler could do a better job". Thank you for your help! 😃

@makenowjust I understand, probably just fixing this inconsistency with other regex related methods could fix the problem. For now I found a workaround and moved on, but a better error message implemented by someone with more knoledge than me on the compiler world be nice!

@rdp
Copy link
Contributor

rdp commented Oct 12, 2018

It would be sweet if the compiler would somehow force you to check for a match instead of blowing up later with an NPE...since crystal in other places is "nil safe" as it were. If possible...

@jhass
Copy link
Member

jhass commented Aug 6, 2019

Given

"foo" =~ /bar/
puts $~

raises this is at least a documentation bug, though I would prefer the documented behavior too. I wonder if we can't rewrite something like

puts $~ if "foo" =~ /bar/

to something like:

"foo" =~ /bar/
__temp123 = assign_match
puts __temp123 if __temp123

to preserve the "no nil check inside if" behavior of $~ while making it nilable.

@jhass jhass added kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:docs topic:compiler:semantic labels Aug 6, 2019
@straight-shoota
Copy link
Member

It would be sweet if the compiler would somehow force you to check for a match instead of blowing up later with an NPE...

That might actually be a valid option. The comment in https://github.com/crystal-lang/crystal/blob/master/src/compiler/crystal/semantic/main_visitor.cr#L576-L581 states that it adds not_nil! because it would be annoying to ask the user to always check for nil. That might be useful in a scripting context when you're certain that the match data result will be populated after a regex match. But as @rdp stated, this essentially contradicts Crystal's Nil safety and in introduces a null pointer exception in Crystal's core library.

not_nil! should only be used when it is safe to assume the value can't be nil for reasons the compiler is incapable to figure out or if you don't care whether the code blows up in case of an unexpected state. Neither of these reasons is valid for accessing global variables $~ and $?.

So I suppose we should remove that not_nil! call.

@rdp
Copy link
Contributor

rdp commented Nov 20, 2019

Maybe could introduce nn! or !! so that not_nil's aren't as...many characters to type? :) . Just brainstorming. Cheers!

@straight-shoota
Copy link
Member

No, that defeats the purpose. not_nil! is supposed to be inconvenient because it's usually to be avoided.

@j8r
Copy link
Contributor

j8r commented Nov 20, 2019

This can be one reason to remove the dollar sign notation (#6969).

@asterite
Copy link
Member

Sorry, could someone write a "tl;dr" summary of the problem and proposed solution?

$~ is nilable because when you do "foo" =~ /bar/ the match data might be nil. Now in Ruby one uses $~, $1 and others like this:

if "foo" =~ /bar/
  # Because the `if` check guarantees that $~ will be set, this will never fail
  puts $~
end

Or:

case "foo"
when /bar/
  puts $1
end

which is essentially the same.

But note that $~ can be nil, it's just that you don't usually use it outside an if that checks for a match.

That's why the compiler inserts not_nil! for you, because otherwise you'll always have to do:

case "foo"
when /bar/
  # Ugh, why am I checking for not nil if I know this is not nil?
  puts $~.not_nil!
end

So what are the proposals to improve this? (or why is this a problem in the first place? the code from OP seems like ficticious code, I never had NilAssertionFailed raised because of a misuse of $~, what about you all?)

@j8r
Copy link
Contributor

j8r commented Nov 20, 2019

Note it can also be reproduced with all dollar variants $0, $1 etc.
One can do this:

foo =~ /bar/
puts $~

Reversing the order, by mistake, returns an error.

Another issue is this variable get leaked:

if "foo" =~ /foo/
  puts $~
end

puts $~

Logically, the variable should be scoped, and not available before using a regex match.
One goal of the language is to catch the errors early at the compilation time.

@straight-shoota
Copy link
Member

You're right, these global variables have some special use cases and one could consider them usually guarded by a regex match. I think I somehow mixed up $~ and $1. $~ is always available when a regex match was successful, but $1 etc. only if such a capture group exists.

The issue discussed here is that when not used in a guarded branch, $~ might be nil and immediately raises. The error message is confusing to the user because it is not clear where the nil assertion comes from (it's added by the compiler and doesn't exist in code).

As long as you only use $~ as it is supposed to, that nil assertion should never fail. But it can easily happen that something goes wrong, especially if you're not familiar with the exact semantics like being visible only one scope upwards. Refactoring may also break things.

In case it goes wrong, the user should be presented with a meaningful error message.
So the suggested improvement would be to replace .not_nil! with a customized exception such as No match data available.

As mentioned in #4776 (comment) the documentation currently states that $~ would be nil if there is no match.

So it would be an option to actually change the implementation to what is documented. But given the special nature of these global variables, I suppose this is not a good idea as per @asterite's comment. In any case, the documentation should reflect the actual behaviour.

@asterite
Copy link
Member

Improving the error messages is a good idea 👍

@ysbaddaden
Copy link
Contributor

I never had any issue. I always access $ variables after verifying that there was a match. I'd be very annoyed if suddenly I had nilable $ match variables.

Now, the injected not nil check message could be more explicit about having no match (unexpected). Similar to how [] accessors raise an explicit message / exception.

@j8r
Copy link
Contributor

j8r commented Nov 20, 2019

At least, $~.as Regex::MatchData and $0.as String would be better.

@straight-shoota
Copy link
Member

@j8r cast from Nil to Regex::MatchData failed is not much of an improvement.

@j8r
Copy link
Contributor

j8r commented Nov 20, 2019

You know it's about a regex syntax @straight-shoota , compared to nothing at all - it helps.

@straight-shoota
Copy link
Member

It might help a bit, but it's not enough. If we're changing it, we should change it to a good error message, not a slightly less shitty one.

@jhass
Copy link
Member

jhass commented Nov 20, 2019

Honestly, I wouldn't be suuuper opposed to just dropping the whole affair (including =~) and forcing users to

if match = foo.match(/bar/)
  puts match[1]
end

Yes these are nice for little one of scripts, but I don't see that as Crystal's domain anyway. For more serious software the above trumps in clarity.

@straight-shoota
Copy link
Member

I'd also support removing that feature entirely an alternative option. This was discussed before in #6969 and decided to keep them (though not a strong decision).

When matching a regex as a when condition it would be more complicated to rewrite without $~.

@asterite
Copy link
Member

I wouldn't mind dropping them and having to write it like in #6969 (comment) :

case
when match = line_to_parse.match(/host=(.*)/)
  hostname = match[1]
when match = line_to_parse.match(/credentials=(.*?):(.*)/)
  user = match[1]
  password = match[2]
end

vs.

case line_to_parse
when /host=(.*)/
  hostname = $1
when /credentials=(.*?):(.*)/
  user = $1
  password = $2
end

it's a bit more verbose but: no magic, simpler language, simpler compiler. But it almost reduces the main use case of ===.

@straight-shoota
Copy link
Member

@asterite It's going to be even more verbose when the when branches not only match on regular expressions, but also do comparisons, method calls etc. on the target.

But it almost reduces the main use case of ===.

Oh yes. It might not be a bad move if we could remove that as well... It's always difficult to grasp the difference between === and related operators and that it looks similar to equality comparison but is actually not commutative.
Helping to get rid of === would actually be supporting argument for removing $~.

But if we want to continue in this direction, we should have a dedicated discussion about that, and especially look closer at the consequences such a change would have.

@j8r
Copy link
Contributor

j8r commented Nov 20, 2019

I really tried to find an user friendly alternative, there is this option.
It's a bit like IO, the result is passed around by reference.

@rdp
Copy link
Contributor

rdp commented Nov 20, 2019

I think ruby uses === for range's but...that doesn't mean it has to stay I suppose...

@asterite
Copy link
Member

Sorry, I meant === is pretty useful for regex and combining it with $~, but === will stay in the language forever. I don't think that needs to be discussed.

@RX14
Copy link
Contributor

RX14 commented Nov 27, 2019

Can someone actually detail what the issue is and code to reproduce it here? Is it to do with the compiler flow checking regex matches and replacing $~ with $~.not_nil!? Is the code good or bad?

Either way the error message is bad, but i'm not sure if it's occurring when it should or not? And if it's occuring in sane code?

@straight-shoota
Copy link
Member

straight-shoota commented Nov 27, 2019

Yes, it's the implicit .not_nil! inserted by the compiler that causes unexpected errors. This typically happens when the code is faulty, but it's a mistake that can happen easily. Especially since the documentation states that $~
returns nil instead of raising.

A simple reproducible sample already mentioned above is "a" =~ /b/; puts $~.

@RX14
Copy link
Contributor

RX14 commented Dec 2, 2019

@straight-shoota is $~ ever typed as nil??

@straight-shoota
Copy link
Member

AFAIK the special variable $~ is actually nilable, but only internally. The main visitor replaces each access to that variable with $~.not_nil!, so its not nilable when used in user code.

@RX14
Copy link
Contributor

RX14 commented Dec 5, 2019

what the fuck

@asterite
Copy link
Member

asterite commented Dec 5, 2019

Haha, yeah, there's a bit of magic going on there. The $~ variable is parsed as a global variable, and then when it's typed it's replaced with $~.not_nil!, but $~ in that expression is actually a local variable.

But I don't think it matters how it's implemented as long as it behaves correctly. I do think we can and should improve the error message. I started working on this but stopped because I have limited time, but once I have more time I can fix it.

@straight-shoota
Copy link
Member

To drive this forward, I think we can make a simple improvement to the error message. This should give the user more context information which helps locate the error.

  1. When accessing $~ and $? as nil values, raise NilAssertionError with a custom error message that refers to the name of the variable, instead of the generic "Nil assertion failed".

  2. The numbered variables $0, $1 etc. expand to $~[0], which eventually expands to @$~.not_nil![0] (@ is used to indicate it's a variable, but it's a local variable). When 1. is implemented, the .not_nil! would be changed and the error message refer to $~. This is already an improvement. But perhaps it can be better. If we substitute directly with a read to the variable $~, the nil error message can be customized to the specific global variable. It could look like this: (@$~ || raise NilAssertionError.new("$0 is not defined. $~ is nil"))[0]

For implementing both of this, it would be useful to have a parameter on #not_nil! to customize the error message (currently this only works by initializing NilAssertionError directly).

@zw963
Copy link
Contributor

zw963 commented Sep 29, 2022

A little wired when code only one line.

puts $0
╰─ $ cr run 1.cr
Unhandled exception: Nil assertion failed (NilAssertionError)
  from /home/zw963/Crystal/share/crystal/src/nil.cr:108:5 in 'not_nil!'
  from /home/zw963/Crystal/share/crystal/src/crystal/once.cr:44 in '__crystal_main'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:115:5 in 'main_user_code'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:101:7 in 'main'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:127:3 in 'main'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '__libc_start_main'
  from ../sysdeps/x86_64/start.S:117 in '_start'
  from ???

No any clue point out what happen.

Although, add one line before it, will point out the correct line in backtrace.

x = 100
puts $0
 ╰─ $ 130  cr 1.cr 
Unhandled exception: Nil assertion failed (NilAssertionError)
  from /home/zw963/Crystal/share/crystal/src/nil.cr:108:5 in 'not_nil!'
  from 1.cr:2:6 in '__crystal_main'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:115:5 in 'main_user_code'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:101:7 in 'main'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:127:3 in 'main'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '__libc_start_main'
  from ../sysdeps/x86_64/start.S:117 in '_start'
  from ???

@asterite
Copy link
Member

I think that might be a bug with the stacktrace decoder.

@asterite
Copy link
Member

I would probably use a different exception than NilAssertionFailed for this...

@straight-shoota
Copy link
Member

I think that might be a bug with the stacktrace decoder.

Yes, probably. It's definitely unrelated to this problem (although I see how it increases confusion; but it's unlikely to have $0 in the first line, so it's not that relevant in practice -- assuming the cause is simply based on line 1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:docs topic:compiler:semantic
Projects
None yet
Development

No branches or pull requests