Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance error messages of conversion errors #1212

Merged

Conversation

Askaholic
Copy link
Contributor

@Askaholic Askaholic commented Oct 3, 2020

Adds the name of arguments that generate conversion errors as mentioned in bullet 1 of #1055 and changes the message of PyDowncastError to match more closely with existing python errors. I'm very new to procedural macros so this is likely not the best way to go about things. Here's what it looks like now:

#[pyfunction]
fn tuple_func(foo: (i64, &str)) {}
>>> tuple_func("bar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument 'foo': 'str' object cannot be converted to 'PyTuple'

However I did notice an oddity. When converting to an integer type it doesn't seem to trigger the same pyo3 code and instead generates a different error from within python.

#[pyfunction]
fn int_func(foo: i64) {}
>>> int_func("bar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument 'foo': 'str' object cannot be interpreted as an integer

I looked for possible sources of the error and it might be coming from here:
https://github.com/python/cpython/blob/fb0a4651f1be4ad936f8277478f73f262d8eeb72/Objects/abstract.c#L1265
But there are actually a number of different places where this same text is present in an error string.

@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from a80c86f to f3b8257 Compare October 3, 2020 22:03
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this PR, very pleased to see it! I'll do my best to find time to review it tomorrow evening.

@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from f3b8257 to f0c5dc7 Compare October 3, 2020 22:36
@kngwyu
Copy link
Member

kngwyu commented Oct 4, 2020

So.... what is the motivation?
I don't think this is very helpful.

@davidhewitt
Copy link
Member

I think for users of libraries this can be very nice when figuring out what went wrong. C++'s pybind11 has similar.

I'm afraid I didn't find time to do any reviewing today, will do my best to find time asap.

@kngwyu
Copy link
Member

kngwyu commented Oct 5, 2020

But I think users don't know argument names...

@Askaholic
Copy link
Contributor Author

Askaholic commented Oct 7, 2020

So.... what is the motivation?

Go look at the issue ticket #1055

I don't think this is very helpful.

I entirely disagree. This is one of the first features that I noticed was missing when I started using PyO3. It's extremely annoying to debug your code when your function calls are throwing type errors, but they don't tell you what the cause is. Like, imagine if you have some code that takes function arguments from the user:

some_args = get_from_cli()

pyo3_function(**some_args)
# raises TypeError: str cannot be interpreted as int

That doesn't tell the user which one of their arguments has an error. There's lots of crazy stuff you can do in python, and there are lots of situations in which you may have a function with a lot of arguments, or a few arguments with the same type. TBH even if I have 2 arguments like:

def print_range(lo: int, hi: int):
    ...

lo, hi = do_some_stuff()
print_range(lo, hi)
# raises TypeError: str cannot be interpreted as int

It's annoying to have to go back and debug your code just to figure out which argument has the wrong type, when the library really has access to that information already, but refuses to tell you.

Just as you may look at this code:

def foo(bar, **kwargs):
    pass

foo(1, bar=10)
# raises TypeError: foo() got multiple values for argument 'bar'

and say, "well the error message really doesn't need to include the name of the function or the name of the argument because it's obvious from the code," well, there's a reason why the python developers chose to include all of that information in the error message: people will do crazy things, context will get lost, and it helps if your error messages tell you the complete reason for an error.

Rant over. For now...

C++'s pybind11 has similar.

Yes, although there are a few things I don't like about pybind11's approach. For instance I don't like that it prints out the value of the argument and not just the type. This is pretty annoying for arguments that have large string representations e.g. numpy arrays, as a simple type error can suddenly print out 50 lines of text.

Also I don't think pybind11 handles function signatures properly, it just includes annotations in the doc string.

@davidhewitt BTW what would be a good file to add unit tests in? I'm also not sure how the spec.is_args branch is taken. For my simple test example it returned false and I'm not exactly sure when it's supposed to be true as there's no doc comment for that function.

And lastly, is it normal for some of the tests to fail? Even the master branch had a few failing tests when I ran them locally after cloning.

@kngwyu
Copy link
Member

kngwyu commented Oct 7, 2020

OK, thank you for the explanation.

But I feel TypeError: argument 'foo': Can't convert 'bar' to PyTuple confusing. What is bar?
I think the original one by David (TypeError: A to B (argument a)) is clearer.
If you want to include the local variable name, please make it clear that it is not an argument name.

BTW what would be a good file to add unit tests in?

I recommend tests/test_pyfunction.rs.

And lastly, is it normal for some of the tests to fail?

No. The failure is because you changed the error message.
Please change this test: https://github.com/PyO3/pyo3/blob/0a346dfa7cb45ebc04aec54e2168699f3ad84bee/tests/test_string.rs

@Askaholic
Copy link
Contributor Author

Oh I see. I thought you were objecting to including the argument name in error messages. I also agree that the error message of conversion errors isn't ideal, but that part of the message already exists on the master, it's not part of my changes. I think again, the main problem is that the error message includes the value of the argument instead of the type. If I were to write the error, I would model it off of the format of the builtin python errors which afaik only use object types:

None.foo
# AttributeError: 'NoneType' object has no attribute 'foo'

"foo" + 1
# TypeError: can only concatenate str (not "int") to str

1 + "foo"
# TypeError: unsupported operand type(s) for +: 'int' and 'str'

"".join([1, 2])
# TypeError: sequence item 0: expected str instance, int found

So I'd probably go with something like TypeError: argument 'foo': 'str' object cannot be converted to 'PyTuple'. Not sure if quotes around the types are necessary as the python native errors are not consistent on this.

No. The failure is because you changed the error message.

Well as I said, even when building the master branch locally (before I made any changes), I got 2 errors.

@kngwyu
Copy link
Member

kngwyu commented Oct 7, 2020

I also agree that the error message of conversion errors isn't ideal, but that part of the message already exists on the master, it's not part of my changes.

Oops, I'm sorry I misunderstood it!

TypeError: argument 'foo': 'str' object cannot be converted to 'PyTuple'

Yeah, I agree to use the type name instead.

Well as I said, even when building the master branch locally (before I made any changes), I got 2 errors.

I know there happenes some errors with pyenv, but if not, please file a separate issue with error messages.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks again for implementing this, and sorry for my delay in reviewing. Finally had a chance to think about this properly!

I'm definitely persuaded that using the type name is better than using the repr. I was thinking about this further and it's easy enough for debugging to print the problematic value. We should change this in PyDowncastError.

It seems we have two different arrangements for the argument name already proposed:

  • TypeError: argument 'foo': 'str' object cannot be converted to 'PyTuple'

  • TypeError: 'str' object cannot be converted to 'PyTuple' (for argument 'foo')

Thinking about this further, I also quite like the idea of using Python exception chaining to add context. I think if we did this right, the effect could be something like the below:

TypeError: 'str' object cannot be converted to 'PyTuple'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
    ...
ValueError: invalid value for argument 'foo'

Which style does everyone prefer?

pyo3-derive-backend/src/pymethod.rs Outdated Show resolved Hide resolved
@Askaholic
Copy link
Contributor Author

I thought about the exception chaining before, but I decided that it's uglier. Having a chain of exceptions kindof implies that there are multiple pieces of code that are failing, but really there is only one: the argument conversion. It feels a bit more like an implementation detail that the argument name isn't available further down the chain, and therefore needs to be injected into the error message later. So, I prefer just modifying the error message.

@kngwyu
Copy link
Member

kngwyu commented Oct 12, 2020

I agree @Askaholic. I feel the chained message confusing.

@davidhewitt
Copy link
Member

👍 happy to agree with that. I guess the choice is then between the argument name at the front, or at the end.

I am personally not fond of multiple colons in exception messages, which is why I suggested adding (for argument 'foo') at the end of the message. But I am not sure what is best, so would like to listen to everyone's opinion.

@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from f0c5dc7 to 067c5d0 Compare October 13, 2020 07:42
@Askaholic
Copy link
Contributor Author

I prefer the multiple colon way, even though it's kindof weird having multiple layers of colons. It's almost like a replacement for the error chaining where it gives you the same context as the error chain, but more condensed. It also exists in some native python error messages like the one I posted earlier:

"".join([1, 2])
# TypeError: sequence item 0: expected str instance, int found

This brings up another interesting point, which is that in order for the approach we're taking here (catch an internal error and reformat like "{context}: {old_msg}") to look right, it makes sense to always start error messages with a lower case letter, as seen above.

@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch 3 times, most recently from 3929845 to e0a80e4 Compare October 13, 2020 07:58
@davidhewitt
Copy link
Member

It also exists in some native python error messages like the one I posted earlier:

👍 I am persuaded that the colon way is the right style!

it makes sense to always start error messages with a lower case letter, as seen above

Agreed. Feel free to amend the change the existing message for PyDowncastError as part of this PR (I guess you will do so anyway to change from repr to type name):

"Can't convert {} to {}",

@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from e0a80e4 to a006efd Compare October 13, 2020 08:03
@Askaholic
Copy link
Contributor Author

Yea, I wasn't sure if I should do that in this PR or not, but I totally can. I'll just have to expand the scope a bit and rename it.

Askaholic added a commit to Askaholic/pyo3 that referenced this pull request Oct 13, 2020
Displays type(obj) instead of repr(obj) and uses `cannot` instead of `can't`
to be more consistent with existing python error messages.

See discussion at PyO3#1212.
@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from a006efd to 01e22f0 Compare October 13, 2020 08:20
Askaholic added a commit to Askaholic/pyo3 that referenced this pull request Oct 13, 2020
Displays type(obj) instead of repr(obj) and uses `cannot` instead of
`can't`
to be more consistent with existing python error messages.

See discussion at PyO3#1212.
@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from 01e22f0 to 31b64cd Compare October 13, 2020 21:41
@Askaholic
Copy link
Contributor Author

I'm running into a bit of a tricky situation whenever extract returns something other than a TypeError. For example the test with the bad unicode string returns a UnicodeEncodeError, however, this does not store it's error messages as a single string, but rather as a tuple of the relevant components that are formatted to the error message in __str__:

>>> e = UnicodeEncodeError('utf-8', '\ud800', 0, 1, 'surrogates not allowed')
>>> str(e)
"'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed"

I see 3 possible solutions:

  • Error chain everything. If we just create a new error that's caused by the other error then we can handle anything, but the messages are uglier and harder to read especially for cases where there is just a simple TypeError.
  • Error chain for non-TypeErrors. Use the colon in method for TypeErrors and error chaining for everything else. The downside is then the error reporting becomes somewhat inconsistent, although I would guess that TypeErrors would be the most common error type.
  • Only add argument context for TypeErrors. We just leave all other error types untouched as we don't know if we can manipulate their error messages easily.

I'm not really sure what the best choice is.

Askaholic added a commit to Askaholic/pyo3 that referenced this pull request Oct 13, 2020
Displays type(obj) instead of repr(obj) and uses `cannot` instead of
`can't`
to be more consistent with existing python error messages.

See discussion at PyO3#1212.
@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from 31b64cd to 3b6a5d0 Compare October 13, 2020 22:02
@Askaholic Askaholic changed the title Add argument name to error messages in conversion errors Enhance error messages of conversion errors Oct 13, 2020
@kngwyu
Copy link
Member

kngwyu commented Oct 14, 2020

Only add argument context for TypeErrors

👍 for this

@davidhewitt
Copy link
Member

Only add argument context for TypeErrors

+1 for this. The only other built-in exception commonly used for invalid arguments is ValueError. Up to you if you want to handle that case also or leave that for the future - I don't think we have many uses of it in pyO3 for argument conversion (if any) so much less important to handle than TypeError

@Askaholic
Copy link
Contributor Author

Just thought of another possible solution:

Create our own wrapper exception ArgumentError which displays the way we want. This error could either take the error message as a single string argument, or it could have 2 fields arg_name (or just arg?) and cause and format them in the __str__ method. Using 2 fields would allow the user to more easily interact with the error programmatically, although I imagine the use cases for doing so would be quite rare.

This wrapper exception basically does the same thing as python error chaining, but it displays nicer. The only thing is if the user is trying to handle exceptions within a try-except block, they can't directly catch the underlying error and instead need to catch ArgumentError first and then check the type of cause manually.

@davidhewitt
Copy link
Member

davidhewitt commented Oct 14, 2020

Have to say I'm 👎 on creating a custom exception type here. I know we've got a couple already in the codebase (and I kind of wish we didn't, but they're necessary for other reasons). It's very awkward for Python users to be able to import these exceptions in Python (see #805). And being unable to import them makes it very difficult for Python users to figure out even what super type they have that they can catch.

Having used boost.Python and pybind11 in C++, it was so much nicer to just catch TypeError in pybind11 compared to digging through the boost.Python C++ source code to eventually find boost.python.ArgumentError inherited from TypeError and I could catch that.

So in summary I'm in favour of keeping the design here as simple as possible.

(If we were to create a new argument type, I think the correct design is to release a pyo3 Python package on pypi and expect users to import the exception that way. However that has its own complications - we would never be able to have two different PyO3 python packages in the same project, so we'd have to keep the contents of that package as stable as possible.)

Askaholic added a commit to Askaholic/pyo3 that referenced this pull request Oct 15, 2020
Displays type(obj) instead of repr(obj) and uses `cannot` instead of
`can't`
to be more consistent with existing python error messages.

See discussion at PyO3#1212.
@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from 3b6a5d0 to ca908ba Compare October 15, 2020 01:30
@Askaholic
Copy link
Contributor Author

Alright, I'll stick with only adding the argument name to TypeErrors.

Displays type(obj) instead of repr(obj) and uses `cannot` instead of
`can't`
to be more consistent with existing python error messages.

See discussion at PyO3#1212.
@Askaholic Askaholic force-pushed the issue/#1055-add-arg-name-to-conversion-error branch from ca908ba to 6724783 Compare October 15, 2020 01:32
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this LGTM! Just needs an ## Added entry in the changelog please.

I see this is still in draft; anything else you want to do to this PR before it's merged?

.map(|s| format!("{} ({})", s.to_string_lossy(), type_name))
.unwrap_or_else(|_| type_name.to_string());
let err_msg = format!("Can't convert {} to {}", from, #error_names);
let err_msg = format!("'{}' object cannot be converted to '{}'", type_name, #error_names);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think it's now nicely shaped up, too.

if !err.matches($py, $py.get_type::<pyo3::exceptions::$err>()) {
panic!("Expected {} but got {:?}", stringify!($err), err)
}
err
}};
($py:expr, $val:ident, $code:expr, $err:ident, $err_msg:expr) => {{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this 👍

@Askaholic
Copy link
Contributor Author

Yea last bullet would be:

  • docs to all new functions and / or detail in the guide

Anything I should add there?

@Askaholic Askaholic marked this pull request as ready for review October 15, 2020 20:03
@davidhewitt
Copy link
Member

davidhewitt commented Oct 16, 2020

docs to all new functions and / or detail in the guide

Ah yes good memory. I would suggest just doing a quick search of the repository for the old error message "Can't convert" and replace all uses in docs with the equivalent new error message. That should be enough 👍

Thanks again for this PR!

@Askaholic
Copy link
Contributor Author

I can't find any more references to the old error message. I don't think it was included in the docs/guide anywhere so I guess this is ready.

It would be awesome if you could add a "hacktoberfest-accepted" label so this will count towards my hacktoberfest PR's. https://hacktoberfest.digitalocean.com/details#details

@davidhewitt
Copy link
Member

Perfect. Thanks very much again, will definitely add the label!

@davidhewitt davidhewitt merged commit 27f2d0e into PyO3:master Oct 18, 2020
@davidhewitt davidhewitt mentioned this pull request Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants