-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Unicode related error messages #11573
Conversation
@@ -39,7 +29,7 @@ function next(s::UTF16String, i::Int) | |||
elseif length(s.data)-1 > i && utf16_is_lead(s.data[i]) && utf16_is_trail(s.data[i+1]) | |||
return utf16_get_supplementary(s.data[i], s.data[i+1]), i+2 | |||
end | |||
throw(ArgumentError("invalid UTF-16 character index")) | |||
throw(UnicodeError(UTF_ERR_INVALID_INDEX,0,0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the GC frame concern still apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuyichao Please get that fixed! 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -84,6 +84,8 @@ include("iterator.jl") | |||
include("osutils.jl") | |||
|
|||
# strings & printing | |||
include("utferror.jl") | |||
include("utftype.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally superficial, sorry... I think this would be a tiny bit better as utftypes.jl
?
p.s. thank you thank you thank you for opening this separately. 👍
Would be nice to test at least a few of these error paths if we can do that yet. Address the comments inline and I'd say this is an easy merge, thanks! |
UTF32String(data::Vector{UInt32}) = UTF32String(reinterpret(Char, data)) | ||
|
||
const empty_utf16 = UTF16String(UInt16[0]) | ||
const empty_utf32 = UTF32String(UInt32[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these wait to be added until the commit that first uses them?
The full testing really happens when you have |
Yes, that makes sense that the coverage of this will go way up with the more extensive changes that will come separately. Tests of the error paths would've caught the typos like having Would be nice to make your commit messages a bit more descriptive too, if possible. |
@@ -0,0 +1,30 @@ | |||
# This file is a part of Julia. License is MIT: http://julialang.org/license | |||
|
|||
##\brief Error messages for Unicode / UTF support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you did there. At least the backslash is less confusing here than @
. @jakebolewski what do you think, can we live with this? Or merge anyway and delete it afterwards if we really don't like the doxygen-style comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see the point about looking like a macro... that was done because in Python, for a docstring, you must use @
instead of \
, because \
is an escape character inside a string.
Maybe for Julia, to put doxygen information into doc strings, we'd need to use a different start character, to avoid the confusion with macros, and then convert it to @
before passing them to
the doxygen processor.
The only difference now between "plain text" and working doxygen (it actually is the valid Python doxygen style), is the added \
. Whether or not doxygen use becomes prevalent (see #11123) in the rest of julia, that information is very useful, so I'd hope you'd accept having a few extra \
characters...
If nobody starts documenting the code in Base, it never is going to happen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are awesome. Extra annotations in the comments that don't do anything yet and don't mean anything to the majority of us who've probably never used doxygen are just noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but to me, they help a lot when working on the code, even without the very nice stuff that doxygen builds from them... (have you ever tried doxygen on a very large project? It really is helpful...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but you're contributing to a pre-existing large project that doesn't use doxygen (yet, maybe that will change but that's not for this PR to worry about), and at least 2 core contributors have told you to knock it off with the doxygen annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@one-more-minute You've convinced me... (as a matter of fact, I've worked for a while doing just that, pulling information out of unstructured data... I should have thought of that!).
However, don't you need at least some standardization of tags, so you could pick out which part documents the parameters, which part the return value, which part the errors thrown (if any)?
For example: I've seen ### Arguments:
, ### Returns:
, Inputs:
, Outputs:
, ### Arguments
, ### Keyword Arguments
, etc... as you see, no consistency...
It doesn't even have to be complete consistency, it can be a many-to-one mapping... fairly free form, as long as there aren't too many options for the metadata parser to figure out, or for people to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would at least give other Julia contributors time to evaluate the format, good or bad.
Well I've been evaluating by spending more time than I'd like reviewing your PR's, and so far I don't like the doxygen markup. But I don't care enough to let it block this PR which is otherwise worth merging, so I'm just going to do that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I was just going to change this to using something more along the lines of Mike's idea... 😀 I guess I can do that in a later PR, as soon as the dust as settled a bit on the parsing Markdown ideas... Thanks! (and thanks for taking the time, and pushing back on the doxygen format, because now I will push for something better for documentation, which will still allow me to generate output for doxygen, but which fits better with Julia, and which I hope you do like)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, don't you need at least some standardization of tags
Sure, I think it's completely reasonable to settle on some conventions eventually (these things usually happen naturally anyway). I think @catawbasam is right about side issues, though, so let's keep that discussion focussed in #8966 so that things can move forward here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkelman and I've been spending more time than I'd like changing things so frequently 😀 but nobody has been holding a gun to either of our heads to make you review, or me to respond to your reviews. A long process yes, and I'm sorry for the time it has taken, but I think in the end, the final result for julia is much better. Thanks again for your time.
Alright aside from the doxygen nits I think we can merge this one anyway. Anyone have objections or additional feedback? |
By the way, reading through my old emails, here's one pretty easy way of making multi-line commit messages from the command line:
|
Improve Unicode related error messages
This improves the error messages for the Unicode support, in
utf8.jl
,utf16.jl
, andutf32.jl
They will save the position in the string where the error occurred, as well as the character or code unit that caused the error, and the
show
method for theUnicodeError
exception type displays that extra information.