Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

num_get::do_get(in, end, double) broken if double is followed by character #18156

Open
llvmbot opened this issue Nov 2, 2013 · 28 comments
Open
Assignees
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. locale issues related to localization

Comments

@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2013

Bugzilla Link 17782
Version unspecified
OS All
Attachments Test program documenting the problem.
Reporter LLVM Bugzilla Contributor
CC @brycelelbach,@dexonsmith,@vangyzen,@hfinkel,@jeremyhu,@mclow,@seanm,@TNorthover

Extended Description

When extracting doubles from stringstream the character following the double affects if extraction works or not (see attached code example). If the double is followed by a space or underscore extraction works, if a letter follows extraction fails. It seems to be related to the extraction of the double performed by num_get::do_get().

If I read the standard correctly (ISO/IEC 14882:2003(E), 22.2.2.1.2 num_get virtual functions [lib.facet.num.get.virtuals]; sorry no 2011 at hand) this should work independently of the character following the double.

Tested with:

$ c++ --version
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix

Steps to reproduce:

  1. Compile the attached test program (c++ --stdlib=libc++ streamtest.cxx)
  2. Run the attached test program

Actual result:

'-4.9' converted to -4.9
'-4.9 X' converted to -4.9
'-4.9_' converted to -4.9
'-4.9d' failed to convert to double
'-4.9X' failed to convert to double

Expected result: (produced with c++ --stdlib=libstdc++ streamtest.cxx)

'-4.9' converted to -4.9
'-4.9 X' converted to -4.9
'-4.9_' converted to -4.9
'-4.9d' converted to -4.9
'-4.9X' converted to -4.9

Build Date & Platform: 2013-11-01 on Mac OS 10.9

@llvmbot
Copy link
Member Author

llvmbot commented Nov 2, 2013

assigned to @mclow

@mclow
Copy link
Contributor

mclow commented Nov 4, 2013

I did some additional testing; I tested all the characters from ' ' to '' (so, testing all the strings from "-4.9 " to "-4.9". The only ones that failed were were the ones that ended with certain alpha characters (note: lower case failed also; same as upper case)

Clearly, there's some hex stuff going on here, but I don't know what's up with 'I', 'N', 'P', and 'X'.

'-4.9A' failed to convert to double
'-4.9B' failed to convert to double
'-4.9C' failed to convert to double
'-4.9D' failed to convert to double
'-4.9E' failed to convert to double
'-4.9F' failed to convert to double
'-4.9G' converted to -4.9
'-4.9H' converted to -4.9
'-4.9I' failed to convert to double
'-4.9J' converted to -4.9
'-4.9K' converted to -4.9
'-4.9L' converted to -4.9
'-4.9M' converted to -4.9
'-4.9N' failed to convert to double
'-4.9O' converted to -4.9
'-4.9P' failed to convert to double
'-4.9Q' converted to -4.9
'-4.9R' converted to -4.9
'-4.9S' converted to -4.9
'-4.9T' converted to -4.9
'-4.9U' converted to -4.9
'-4.9V' converted to -4.9
'-4.9W' converted to -4.9
'-4.9X' failed to convert to double
'-4.9Y' converted to -4.9
'-4.9Z' converted to -4.9

@llvmbot
Copy link
Member Author

llvmbot commented Dec 28, 2013

Test case showing another condition of it happening
Displays both inputs as "'hello' 'world' 0", instead of "'hello' 'abcdefworld' 0"

@mclow
Copy link
Contributor

mclow commented Apr 7, 2014

*** Bug llvm/llvm-bugzilla-archive#19354 has been marked as a duplicate of this bug. ***

@mclow
Copy link
Contributor

mclow commented Apr 7, 2014

Another example (from 19354):

#include
#include

int main()
{
std::stringstream ss;
double f;
ss << "abc";
ss >> f;
if(ss.fail()) {
std::cout << "Failed!" << std::endl;
ss.clear();
}
std::cout << "Position: " << ss.tellg() << std::endl;
}

libstdc++ prints 0, libc++ prints 3

@seanm
Copy link

seanm commented Jul 22, 2014

I have also filed rdar://17766732 as this occurs in shipping OS X versions.

@TNorthover
Copy link
Contributor

I think the behaviour is correct for a-f and x, via this route:

  1. operator>> skips space and sends the result on to num_get
  2. num_get accumulates as many characters from the set "0123456789abcdefxABCDEFX+-" (+ decimal separator) as it can (Stage 2).
  3. num_get sends the result to strtold (Stage 3).
  4. strtold decides there's a valid floating-point constant with nonsense at the end (according to C99 spec).
  5. This means an error for num_get (Stage 3 still).

As for the others, I think they're wrong, but they're understandable according to what strtold accepts:

  • 'I' for INF
  • 'N' for NAN
  • 'P' for C99 floating-point hex constants: "0x1ff.5p3a".

Presumably libc++ is taking some shortcut in the required steps.

@TNorthover
Copy link
Contributor

Well, not any kind of shortcut it turns out. It's an explicit choice in locale.cpp:

const char __num_get_base::__src[33] = "0123456789abcdefABCDEFxX+-pPiInN";

Everything else seems to pretty logically follow.

@llvmbot
Copy link
Member Author

llvmbot commented Jul 24, 2014

  1. This means an error for num_get (Stage 3 still).

You may be interested in the conversation between myself and Marshall in another bug (http://llvm.org/bugs/show_bug.cgi?id=19611). The conversation here seems to be related.

@dexonsmith
Copy link
Mannequin

dexonsmith mannequin commented Jul 24, 2014

As for the others, I think they're wrong, but they're understandable
according to what strtold accepts:

My reading of do_get() stage 2 (22.4.2.1.2 in the C++11 n3337) is that this is correct behaviour.

If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.

@TNorthover
Copy link
Contributor

Hmm. I think it might be undefined behaviour in the standard itself (does that mean reading the pdf itself might invoke nasal demons?).

That sentence assumes that "c" has been properly initialised, but it looks suspiciously like there's no check the std::find did actually succeed.

@mclow
Copy link
Contributor

mclow commented Jan 28, 2015

*** Bug llvm/llvm-bugzilla-archive#22372 has been marked as a duplicate of this bug. ***

@mclow
Copy link
Contributor

mclow commented Jan 28, 2015

LWG issue #​2381 (http://cplusplus.github.io/LWG/lwg-active.html#2381) also reads on this.

@llvmbot
Copy link
Member Author

llvmbot commented Jun 17, 2015

I have a similar problem like this where I am parsing floats from strings using istringstream.

Parsing "10.0f" as a float with value 10.0 works with libstdc++ but not with libc++.

Looks like stdc++ just stops reading characters after invalid characters like 'f' are reached and converts string till invalid character as float. Where as libc++ just fails here.

As per standard, look like libstdc++ is right here, so wanted to know whether libc++ intentionally fails here or it is a bug?

example:
std::istringstream stream(str);
stream >> (*value);

where str is const "std::string &" and value is "float *"

@jeremyhu
Copy link
Mannequin

jeremyhu mannequin commented Jun 4, 2016

Arun, what do you mean by "As per standard, look like libstdc++ is right here"?

Duncan commented earlier that this seems to be correct per the standard in comment #​9. If you disagree, please state your argument.

@llvmbot
Copy link
Member Author

llvmbot commented Jun 7, 2016

Hi, I was talking about my interpretation of some reference documentation given below.

As per some documentation, character reading is stopped by numeric extractor once invalid character is reached, so may be that f should be ignored. This could also vary across platforms as you are getting different results with your sample.

Here is the reference:
http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/
<-"...Then (if good), it calls num_get::get (using the stream's selected locale) to perform both the extraction and the parsing operations..."

http://www.cplusplus.com/reference/locale/num_get/get/
<- "...The function stops reading characters from the sequence as soon as one character cannot be part of a valid numerical expression (or end is reached)..."

I had faced this problem a year back when I was solving one chromium bug - https://bugs.chromium.org/p/angleproject/issues/detail?id=1046 and we finally added a workaround in our code itself.

@vangyzen
Copy link

I just ran into this on FreeBSD (10.3).

The code was like:

std::istringstream ss("123abc");
double retVal;
ss >> retVal;
bool result = ss.eof();

The result (eof) is false on GNU libstdc++, but true on LLVM libc++. The former seems correct, by my reading of LWG #​2381, particularly the example:

http://cplusplus.github.io/LWG/lwg-active.html#2381

Is that the consensus?

@vangyzen
Copy link

I propose the following fix:

https://reviews.llvm.org/D26920

@jeremyhu
Copy link
Mannequin

jeremyhu mannequin commented Feb 10, 2017

http://www.cplusplus.com/reference/locale/num_get/get/
<- "...The function stops reading characters from the sequence as soon as
one character cannot be part of a valid numerical expression (or end is
reached)..."

Emphasis added to the key part of that statement:

"""
... The function stops reading characters from the sequence as soon as one character cannot be part of A valid numerical expression (or end is reached)...
"""

As Tim already mentioned above, a-f CAN be part of A valid numerical expression (eg: 0xbadf00d), so they are accumulated.

libc++ seems to be following the standard correctly. libstdc++ seems to have deviated from the standard in a way that makes practical sense. Given its history, one might argue that the behavior (also as seeming expected / relied upon by multiple projects) might establish a defacto standard and may indeed be preferable to the codified standard. However, we are not here to argue about which is better, we are here to determine which is correct, and as I read it, the libc++ implementation is correct.

This does, however, represent a binary compatibility regression between the two runtimes, and libstdc++'s implementation is potentially more desirable to clients. I would suggest that if we want to pursue this change, we should get clarification added to the C++ standard to indicate if that "a" should indeed be "the".

@vangyzen
Copy link

I would suggest that if we want to pursue this change, we
should get clarification added to the C++ standard to indicate if that "a"
should indeed be "the".

Marshall Clow already submitted LWG #​2381. The proposed example clarifies that the validation is sensitive to the context of "the" expression.

@mclow
Copy link
Contributor

mclow commented Jan 26, 2018

*** Bug llvm/llvm-bugzilla-archive#36099 has been marked as a duplicate of this bug. ***

@llvmbot
Copy link
Member Author

llvmbot commented Aug 5, 2018

I think you are misreading this. As noted by Duncan, the key phrase is:

If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.

But you are ignoring the words "the next". They imply context dependence. If the intent of the author of that sentence was to ignore the context of the number you are currently parsing then the words "the next" wouldn't be there.

Further, your current implementation doesn't conform to your interpretation of the specification either. If it did, then you would accept any arbitrary sequence of digits, plus/minus signs and decimal points but this doesn't seem to be the case. For example, under your interpretation "21-42" should be consumed as a single run of characters that are "allowed as the next character of an input field" and then rejected as an invalid double. If libc++ did this I suspect it would break a lot of software.

@llvmbot
Copy link
Member Author

llvmbot commented Jan 7, 2019

I recently found this bug report after discovering that both libstdc++ and libc++ have their own issues reading doubles/floats from a stream (fwiw, my notes and examples from that exercise are documented at https://github.com/tardate/LittleCodingKata/blob/master/cpp/DoubleTrouble/README.md).

Am I correct in understanding the current state of the bug is basically stalled, pending approval of LWG #​2381, or perhaps activity on C++20 that might supersede it?

I'm a little concerned that LWG #​2381 as it stands may not fully resolve the issues, as it appears limited to adding "p" to numeric character set and providing a clarifying example of how a hex float should be accumulated. Specifically, it doesn't address (for example) failing cases such as "abc" as mentioned on Bug 19354 - merged as a duplicate - which remain subject to interpretation and potentially requiring further clarification to get right.

I do agree with the intent of LWG#2381, which I think covers two main points. To paraphrase:

  1. Conversion should follow the same rules as strtold, meaning that it terminates at the first character that is not "... allowed as the next character..." (even it is from the set of valid numeric characters)

  2. All characters not used in a valid conversion are left on the stream (i.e. no magic discards per current libc++ implementation)

This is I believe "common sense" - and basically the libstdc++ approach (notwithstandanding its other issues) - but not the current libc++ implementation. But I don't think it is possible to conclude one or other is correct under the C++17 standard, because as it stands the spec is ambiguous and incomplete on these points.

Looking at C++20 draft n4791, 27.4.2.1.2 does not yet seem to reflect LWG #​2381 - it's basically the same as C++17, and still has the ambiguities and gaps and so still arguably contradicts LWG#2381.

It seems that without clarity on the spec front, I doubt if this bug can be resolved - is that the general concensus? Is there hope for successfully addressing the specification issue at least in C++20?

NB: if my reading of LWG#2381 is correct per the two points above, from my (very inexpert) reading of the code it appears that the libc++ implementation could be "fixed" with minor surgery:

  • if __num_get_float fails, then __do_get_floating_point should rollback the stream pointer to unwind all the characters extracted in stage 2
  • if __num_get_float can perform a partial conversion (i.e. not all of the characters extracted in stage 2):
    • treat this as success, return the extracted value
    • and pass back the position in the stream that it was converted up to (__p2) - so __do_get_floating_point can adjust the stream position accordingly.

@mclow
Copy link
Contributor

mclow commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#19354

@mclow
Copy link
Contributor

mclow commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#22372

@mclow
Copy link
Contributor

mclow commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#36099

@llvmbot
Copy link
Member Author

llvmbot commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#46948

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
mgieseki added a commit to mgieseki/dvisvgm that referenced this issue Apr 30, 2023
because its semantics differ between libstdc++ and libc++
llvm/llvm-project#18156
(fixes #240)
@philnik777 philnik777 added the locale issues related to localization label Jul 15, 2023
andre-schulz added a commit to andre-schulz/mve that referenced this issue May 11, 2024
Behavior of parsing doubles from streams is different between GCC and
Clang [1].

[1] llvm/llvm-project#18156
@heshpdx
Copy link

heshpdx commented Oct 9, 2024

Is this bug fixed now? Using LLVM 19.1.0, we see different behavior in this code below, compared to when running with LLVM 18. The code references this bug ticket, so I decided to follow up.

https://github.com/danmar/cppcheck/blob/main/lib/mathlib.cpp#L498

In particular, these tests no longer fail
https://github.com/danmar/cppcheck/blob/main/test/testmathlib.cpp#L660
@danmar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. locale issues related to localization
Projects
None yet
Development

No branches or pull requests

7 participants