Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-raising Char::Reader API #14009

Open
HertzDevil opened this issue Nov 22, 2023 · 1 comment
Open

Non-raising Char::Reader API #14009

HertzDevil opened this issue Nov 22, 2023 · 1 comment

Comments

@HertzDevil
Copy link
Contributor

HertzDevil commented Nov 22, 2023

Char::Reader#next_char raises when there are no more characters to read, so one must call #has_next? first, even though this check is already effectively the first thing #next_char calls. #previous_char is the same, whereas #current_char cannot distinguish between a null byte and the end of the string, and also needs #has_next? (#12279 is due to this). These frequent combined uses suggest that there should be a more ergonomic API that takes advantage of nilable types:

struct Char::Reader
  def current_char? : Char?
    has_next? ? current_char : nil
  end

  # note: returns `nil` instead of `'\0'` if `self` is at the last character
  def next_char? : Char?
    # not exception-free, but can be refactored to make it so
    @pos + @current_char_width < @string.bytesize ? next_char : nil
  end

  def previous_char? : Char?
    # ditto
    has_previous? ? previous_char : nil
  end
end

Apart from ergonomics, #current_char? and #next_char? will never return the null terminator. IMO this makes code using them easier to reason about.

As an example, this snippet:

crystal/src/path.cr

Lines 1222 to 1239 in f8fafc3

while true
char = reader.current_char
break if separators.includes?(char)
if char == '%'
# percent encoded character
return unless reader.has_next?
reader.next_char
return unless reader.current_char.ascii_number?
return unless reader.has_next?
reader.next_char
return unless reader.current_char.ascii_number?
else
# unreserved / sub-delims
return unless char.ascii_alphanumeric? || char.in?('_', '.', '-', '~', '!', '$', ';', '=') || char.in?('&'..',')
end
return unless reader.has_next?
reader.next_char
end

can be simplified to:

char = reader.current_char
while char && !separators.includes?(char)
  if char == '%'
    # percent encoded character
    return unless char = reader.next_char?
    return unless char.ascii_number?
    return unless char = reader.next_char?
    return unless char.ascii_number?
  else
    # unreserved / sub-delims
    return unless char.ascii_alphanumeric? || char.in?('_', '.', '-', '~', '!', '$', ';', '=') || char.in?('&'..',')
  end
  char = reader.next_char?
end

which is more readable, especially since the original used #next_char in an imperative manner where the return value was discarded and only its side effect was desired.


Going further, we could introduce one more method that combines #current_char and #next_char:

struct Char::Reader
  def shift? : Char?
    # this `next_char` is also guaranteed not to raise
    has_next? ? current_char.tap { next_char } : nil
  end
end

This would allow a concise form of external iteration, albeit with a minor difference compared to #each:

reader = Char::Reader.new("abc")
reader.each do |char|
  # `reader` is not advanced yet, so `char == reader.current_char`
  # (technically this is not documented, but is a reasonable assumption)
end

reader = Char::Reader.new("abc")
while char = reader.shift?
  # `reader` is advanced, so `reader.current_char?` may return `nil`
end
@straight-shoota
Copy link
Member

Sounds very reasonable 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants