Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISBN Verifier #503

Closed
gpucce opened this issue May 5, 2022 · 12 comments
Closed

ISBN Verifier #503

gpucce opened this issue May 5, 2022 · 12 comments

Comments

@gpucce
Copy link
Contributor

gpucce commented May 5, 2022

Hello,

In the ISBN verifier exercise, in the way the tests are designed I think there is no way of getting around having two occurrences of replace(s, "-"=>"") or equivalent without a try ... catch statement.

If this is the case and I am not missing something, maybe it is worth writing an hint in the description or changing the tests so it is possible?

I can make the PR if it is something that would make sense.

@cmcaine
Copy link
Contributor

cmcaine commented May 5, 2022

Hello!

Why do you particularly want to avoid calling replace in two places?

If you do want to call it just the once you can do so without a try/catch. I think what's tripping you up is that you can't reuse ISBN(s) in isvalid(ISBN, s) without a try/catch and yet you don't want to duplicate the code in the ISBN function. But what if you had a third function?

Does that help?

Regarding a hint in the description, I don't think we are pushing students to use replace only once or avoid try/catch, so I'm not sure we need one?

Anyone wanting to improve their solution can always request mentoring :)

Looking at the description, I think our main issue is that we don't specify the problem very completely (you have to read the tests for that).

We also suggest a truly horrendous bonus task: "Implement show(io::IO, isbn::ISBN) to print the ISBN in the standard format with dashes." Where to place the dashes depends on pages and pages of data about different worldwide publishers, etc cos each part of the ISBN is variable length.

We should probably remove that, tho I think that bit is only visible in the instructions.md that doesn't get used? I forget how that works.

@gpucce
Copy link
Contributor Author

gpucce commented May 5, 2022

Hi thanks for the answer! I had not thought about that indeed, I really wanted to use those two together :)

However, just so I don't miss the point again with a third function I would need to use that function twice or can I avoid that?

The strange bonus track I have not noticed honestly so it should be hidden.

For sure I had to read the tests while doing the exercises otherwise the DomainError would have never crossed my mind.

@cmcaine
Copy link
Contributor

cmcaine commented May 5, 2022

ISBN(s) and isvalid(ISBN, s) will each need to call your third function at least once each. Further details (not too spoilery) below.

The removal of the try/catch is simply by replacing a function that will throw an error with one that returns an error type.

I would implement this by defining Base.tryparse(::Type{ISBN}, s), but of course you don't need to add a method to that existing function you could write something like tryparse_isbn(s) or isbn_or_nothing(s) or whatever you'd like to call it.

@gpucce
Copy link
Contributor Author

gpucce commented May 5, 2022

Thanks so much! This is the solution I didn't know existed :)

@cmcaine
Copy link
Contributor

cmcaine commented May 5, 2022

Here's my solution just written and a comment:

# It doesn't make any sense to me for an ISBN to be a String - basically none of the String API makes sense for an ISBN.
# But the tests require this, so whatever.
struct ISBN{T} <: AbstractString
    s::T
end

Base.string(s::ISBN) = s.s

function Base.tryparse(::Type{ISBN}, s)
    acc = 0
    coeff = 10
    for c in s
        if c == '-'
            continue
        elseif isdigit(c)
            # c - '0' is a much faster version of parse(Int, c) when we know c is in '0':'9'.
            acc += (c - '0') * coeff
            coeff -= 1
        # Only the check "digit" is allowed to be 'X' (10)
        elseif c == 'X' && coeff == 1
            acc += 10
            coeff -= 1
        else
            # Invalid character
            return nothing
        end
    end
    if coeff != 0
        # String did not contain 10 "digits"
        return nothing
    elseif acc % 11 != 0
        # Check digit invalid
        return nothing
    else
        # Canonicalise the string in the simplest way:
        s = replace(s, '-' => "")
        return ISBN{typeof(s)}(s)
    end
end

function ISBN(s)
    x = tryparse(ISBN, s)
    if isnothing(x)
        throw(DomainError(s, "Invalid ISBN (wrong length, wrong checksum or invalid characters)"))
    else
        x
    end
end

Base.isvalid(::Type{ISBN}, s) = tryparse(ISBN, s) != nothing

macro isbn_str(s)
    ISBN(s)
end

Note that by using tryparse we lose the context for why parsing has failed. Arguably, the more natural way of expressing this in Julia would be to have ISBN(s) do the parsing and then just use a try/catch.

If you are particular fan of returning rather than throwing errors (e.g. if you come from Haskell or something) then you could do that by returning a more descriptive error type than nothing. E.g. you could just return DomainError(s, "invalid character") or something (using the exception struct as an error value).

And if you are iterating over the characters in the string you don't need to use replace in isvalid at all, so here's another way you can write it:

# It doesn't make any sense to me for an ISBN to be a String - basically none of the String API makes sense for an ISBN.
# But the tests require this, so whatever.
struct ISBN{T} <: AbstractString
    s::T

    function ISBN(s)
        if !isvalid(ISBN, s)
            throw(DomainError(s, "Invalid ISBN (wrong length, wrong checksum or invalid characters)"))
        else
            # Canonicalise the string in the simplest way:
            s = replace(s, '-' => "")
            new{typeof(s)}(s)
        end
    end
end

Base.string(s::ISBN) = s.s

function Base.isvalid(::Type{ISBN}, s)
    acc = 0
    coeff = 10
    for c in s
        if c == '-'
            continue
        elseif isdigit(c)
            # c - '0' is a much faster version of parse(Int, c) when we know c is in '0':'9'.
            acc += (c - '0') * coeff
            coeff -= 1
        # Only the check "digit" is allowed to be 'X' (10)
        elseif c == 'X' && coeff == 1
            acc += 10
            coeff -= 1
        else
            # Invalid character
            return false
        end
    end
    # s must contain 10 "digits" and check digit must be valid.
    return coeff == 0 && acc % 11 == 0
end

macro isbn_str(s)
    ISBN(s)
end

@SaschaMann
Copy link
Contributor

# It doesn't make any sense to me for an ISBN to be a String - basically none of the String API makes sense for an ISBN.
# But the tests require this, so whatever.

isvalid is only defined on strings by default, which is why I think I added that back then. But I agree there's no real need for it, feel free to remove that test.

@cmcaine
Copy link
Contributor

cmcaine commented May 6, 2022 via email

@SaschaMann
Copy link
Contributor

Personally, I would define only a constructor and have it throw errors.

That sounds reasonable. Do you want to submit a PR to change it?

@gpucce
Copy link
Contributor Author

gpucce commented May 6, 2022

If you want and there is no rush I can try to make the PR, I think I might have understood the point. Otherwise I will solve the exercise again after you change it!

cmcaine added a commit that referenced this issue May 6, 2022
As I wrote in #503

Thinking about it some more, I think that's because `isvalid` isn't really
the right function for us. `isvalid` checks if a string or character
contains only valid UTF-8 codepoints. I think it only makes sense for types
where the representation of the type (the bits in memory) can encode
something invalid in normal use. This is the case for strings because Julia
does not validate that strings or characters are correctly encoded when
they are created, it just treats them as opaque bytes.

For most Julia types, however, you cannot create an invalid value of the
type, except through silly use of pointers.

I think in real code the advice would be to parse the string into an ISBN
type, and if that is not possible (e.g. because the string does not encode
a valid ISBN), the ISBN constructor (or `parse(ISBN, s)`, whichever makes
more sense) should throw an error. If a non-throwing option is required,
`tryparse(ISBN, s)` is the right function to define.

Personally, I would define only a constructor and have it throw errors.
@cmcaine cmcaine mentioned this issue May 6, 2022
@cmcaine
Copy link
Contributor

cmcaine commented May 9, 2022

I made a PR at #504. Can't remember if @gpucce would have been notified about that.

@gpucce
Copy link
Contributor Author

gpucce commented May 9, 2022

I think yes because I knew you had done it :) @cmcaine should I close the issue?

@cmcaine
Copy link
Contributor

cmcaine commented May 9, 2022

I'll do it. Thanks for engaging!

@cmcaine cmcaine closed this as completed May 9, 2022
cmcaine added a commit that referenced this issue May 16, 2022
* Change problem definition

As I wrote in #503

Thinking about it some more, I think that's because `isvalid` isn't really
the right function for us. `isvalid` checks if a string or character
contains only valid UTF-8 codepoints. I think it only makes sense for types
where the representation of the type (the bits in memory) can encode
something invalid in normal use. This is the case for strings because Julia
does not validate that strings or characters are correctly encoded when
they are created, it just treats them as opaque bytes.

For most Julia types, however, you cannot create an invalid value of the
type, except through silly use of pointers.

I think in real code the advice would be to parse the string into an ISBN
type, and if that is not possible (e.g. because the string does not encode
a valid ISBN), the ISBN constructor (or `parse(ISBN, s)`, whichever makes
more sense) should throw an error. If a non-throwing option is required,
`tryparse(ISBN, s)` is the right function to define.

Personally, I would define only a constructor and have it throw errors.

* Improve docstring

* Use the vararg form of print

It's slightly faster and maybe better practice.

* Remove unused isbn-verifier/.meta/description.md

* isbn: update tests and instructions

Simplifies the public API of `ISBN` to something that's arguably more
realistic and includes enough detail in the description to solve the
exercise without reading the tests.

* Define an explicit `==` method in isbn-verifier example

* write a design.md

* better test_nothrows

* Clarify where to put the bonus tests

Co-authored-by: Sascha Mann <[email protected]>

Co-authored-by: Sascha Mann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants