-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
num_get::do_get(in, end, double) broken if double is followed by character #18156
Comments
assigned to @mclow |
I did some additional testing; I tested all the characters from ' ' to ' Clearly, there's some hex stuff going on here, but I don't know what's up with 'I', 'N', 'P', and 'X'. '-4.9A' failed to convert to double |
Test case showing another condition of it happening |
*** Bug llvm/llvm-bugzilla-archive#19354 has been marked as a duplicate of this bug. *** |
Another example (from 19354): #include int main() libstdc++ prints 0, libc++ prints 3 |
I have also filed rdar://17766732 as this occurs in shipping OS X versions. |
I think the behaviour is correct for a-f and x, via this route:
As for the others, I think they're wrong, but they're understandable according to what strtold accepts:
Presumably libc++ is taking some shortcut in the required steps. |
Well, not any kind of shortcut it turns out. It's an explicit choice in locale.cpp:
Everything else seems to pretty logically follow. |
You may be interested in the conversation between myself and Marshall in another bug (http://llvm.org/bugs/show_bug.cgi?id=19611). The conversation here seems to be related. |
My reading of
|
Hmm. I think it might be undefined behaviour in the standard itself (does that mean reading the pdf itself might invoke nasal demons?). That sentence assumes that "c" has been properly initialised, but it looks suspiciously like there's no check the std::find did actually succeed. |
*** Bug llvm/llvm-bugzilla-archive#22372 has been marked as a duplicate of this bug. *** |
LWG issue #2381 (http://cplusplus.github.io/LWG/lwg-active.html#2381) also reads on this. |
I have a similar problem like this where I am parsing floats from strings using istringstream. Parsing "10.0f" as a float with value 10.0 works with libstdc++ but not with libc++. Looks like stdc++ just stops reading characters after invalid characters like 'f' are reached and converts string till invalid character as float. Where as libc++ just fails here. As per standard, look like libstdc++ is right here, so wanted to know whether libc++ intentionally fails here or it is a bug? example: where str is const "std::string &" and value is "float *" |
Arun, what do you mean by "As per standard, look like libstdc++ is right here"? Duncan commented earlier that this seems to be correct per the standard in comment #9. If you disagree, please state your argument. |
Hi, I was talking about my interpretation of some reference documentation given below. As per some documentation, character reading is stopped by numeric extractor once invalid character is reached, so may be that f should be ignored. This could also vary across platforms as you are getting different results with your sample. Here is the reference: http://www.cplusplus.com/reference/locale/num_get/get/ I had faced this problem a year back when I was solving one chromium bug - https://bugs.chromium.org/p/angleproject/issues/detail?id=1046 and we finally added a workaround in our code itself. |
I just ran into this on FreeBSD (10.3). The code was like:
The result (eof) is false on GNU libstdc++, but true on LLVM libc++. The former seems correct, by my reading of LWG #2381, particularly the example:
Is that the consensus? |
I propose the following fix: |
Emphasis added to the key part of that statement: """ As Tim already mentioned above, a-f CAN be part of A valid numerical expression (eg: 0xbadf00d), so they are accumulated. libc++ seems to be following the standard correctly. libstdc++ seems to have deviated from the standard in a way that makes practical sense. Given its history, one might argue that the behavior (also as seeming expected / relied upon by multiple projects) might establish a defacto standard and may indeed be preferable to the codified standard. However, we are not here to argue about which is better, we are here to determine which is correct, and as I read it, the libc++ implementation is correct. This does, however, represent a binary compatibility regression between the two runtimes, and libstdc++'s implementation is potentially more desirable to clients. I would suggest that if we want to pursue this change, we should get clarification added to the C++ standard to indicate if that "a" should indeed be "the". |
Marshall Clow already submitted LWG #2381. The proposed example clarifies that the validation is sensitive to the context of "the" expression. |
*** Bug llvm/llvm-bugzilla-archive#36099 has been marked as a duplicate of this bug. *** |
I think you are misreading this. As noted by Duncan, the key phrase is:
But you are ignoring the words "the next". They imply context dependence. If the intent of the author of that sentence was to ignore the context of the number you are currently parsing then the words "the next" wouldn't be there. Further, your current implementation doesn't conform to your interpretation of the specification either. If it did, then you would accept any arbitrary sequence of digits, plus/minus signs and decimal points but this doesn't seem to be the case. For example, under your interpretation "21-42" should be consumed as a single run of characters that are "allowed as the next character of an input field" and then rejected as an invalid double. If libc++ did this I suspect it would break a lot of software. |
I recently found this bug report after discovering that both libstdc++ and libc++ have their own issues reading doubles/floats from a stream (fwiw, my notes and examples from that exercise are documented at https://github.com/tardate/LittleCodingKata/blob/master/cpp/DoubleTrouble/README.md). Am I correct in understanding the current state of the bug is basically stalled, pending approval of LWG #2381, or perhaps activity on C++20 that might supersede it? I'm a little concerned that LWG #2381 as it stands may not fully resolve the issues, as it appears limited to adding "p" to numeric character set and providing a clarifying example of how a hex float should be accumulated. Specifically, it doesn't address (for example) failing cases such as "abc" as mentioned on Bug 19354 - merged as a duplicate - which remain subject to interpretation and potentially requiring further clarification to get right. I do agree with the intent of LWG#2381, which I think covers two main points. To paraphrase:
This is I believe "common sense" - and basically the libstdc++ approach (notwithstandanding its other issues) - but not the current libc++ implementation. But I don't think it is possible to conclude one or other is correct under the C++17 standard, because as it stands the spec is ambiguous and incomplete on these points. Looking at C++20 draft n4791, 27.4.2.1.2 does not yet seem to reflect LWG #2381 - it's basically the same as C++17, and still has the ambiguities and gaps and so still arguably contradicts LWG#2381. It seems that without clarity on the spec front, I doubt if this bug can be resolved - is that the general concensus? Is there hope for successfully addressing the specification issue at least in C++20? NB: if my reading of LWG#2381 is correct per the two points above, from my (very inexpert) reading of the code it appears that the libc++ implementation could be "fixed" with minor surgery:
|
mentioned in issue llvm/llvm-bugzilla-archive#19354 |
mentioned in issue llvm/llvm-bugzilla-archive#22372 |
mentioned in issue llvm/llvm-bugzilla-archive#36099 |
mentioned in issue llvm/llvm-bugzilla-archive#46948 |
because its semantics differ between libstdc++ and libc++ llvm/llvm-project#18156 (fixes #240)
Behavior of parsing doubles from streams is different between GCC and Clang [1]. [1] llvm/llvm-project#18156
Is this bug fixed now? Using LLVM 19.1.0, we see different behavior in this code below, compared to when running with LLVM 18. The code references this bug ticket, so I decided to follow up. https://github.com/danmar/cppcheck/blob/main/lib/mathlib.cpp#L498 In particular, these tests no longer fail |
Extended Description
When extracting doubles from stringstream the character following the double affects if extraction works or not (see attached code example). If the double is followed by a space or underscore extraction works, if a letter follows extraction fails. It seems to be related to the extraction of the double performed by num_get::do_get().
If I read the standard correctly (ISO/IEC 14882:2003(E), 22.2.2.1.2 num_get virtual functions [lib.facet.num.get.virtuals]; sorry no 2011 at hand) this should work independently of the character following the double.
Tested with:
$ c++ --version
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix
Steps to reproduce:
Actual result:
'-4.9' converted to -4.9
'-4.9 X' converted to -4.9
'-4.9_' converted to -4.9
'-4.9d' failed to convert to double
'-4.9X' failed to convert to double
Expected result: (produced with c++ --stdlib=libstdc++ streamtest.cxx)
'-4.9' converted to -4.9
'-4.9 X' converted to -4.9
'-4.9_' converted to -4.9
'-4.9d' converted to -4.9
'-4.9X' converted to -4.9
Build Date & Platform: 2013-11-01 on Mac OS 10.9
The text was updated successfully, but these errors were encountered: