num_get::do_get(in, end, double) broken if double is followed by character #18156

llvmbot · 2013-11-02T10:35:23Z


Bugzilla Link	17782
Version	unspecified
OS	All
Attachments	Test program documenting the problem.
Reporter	LLVM Bugzilla Contributor
CC	@brycelelbach,@dexonsmith,@vangyzen,@hfinkel,@jeremyhu,@mclow,@seanm,@TNorthover

Extended Description

When extracting doubles from stringstream the character following the double affects if extraction works or not (see attached code example). If the double is followed by a space or underscore extraction works, if a letter follows extraction fails. It seems to be related to the extraction of the double performed by num_get::do_get().

If I read the standard correctly (ISO/IEC 14882:2003(E), 22.2.2.1.2 num_get virtual functions [lib.facet.num.get.virtuals]; sorry no 2011 at hand) this should work independently of the character following the double.

Tested with:

$ c++ --version
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix

Steps to reproduce:

Compile the attached test program (c++ --stdlib=libc++ streamtest.cxx)
Run the attached test program

Actual result:

'-4.9' converted to -4.9
'-4.9 X' converted to -4.9
'-4.9_' converted to -4.9
'-4.9d' failed to convert to double
'-4.9X' failed to convert to double

Expected result: (produced with c++ --stdlib=libstdc++ streamtest.cxx)

'-4.9' converted to -4.9
'-4.9 X' converted to -4.9
'-4.9_' converted to -4.9
'-4.9d' converted to -4.9
'-4.9X' converted to -4.9

Build Date & Platform: 2013-11-01 on Mac OS 10.9

llvmbot · 2013-11-02T10:35:23Z

assigned to @mclow

mclow · 2013-11-04T17:10:47Z

I did some additional testing; I tested all the characters from ' ' to '~~' (so, testing all the strings from "-4.9 " to "-4.9~~". The only ones that failed were were the ones that ended with certain alpha characters (note: lower case failed also; same as upper case)

Clearly, there's some hex stuff going on here, but I don't know what's up with 'I', 'N', 'P', and 'X'.

'-4.9A' failed to convert to double
'-4.9B' failed to convert to double
'-4.9C' failed to convert to double
'-4.9D' failed to convert to double
'-4.9E' failed to convert to double
'-4.9F' failed to convert to double
'-4.9G' converted to -4.9
'-4.9H' converted to -4.9
'-4.9I' failed to convert to double
'-4.9J' converted to -4.9
'-4.9K' converted to -4.9
'-4.9L' converted to -4.9
'-4.9M' converted to -4.9
'-4.9N' failed to convert to double
'-4.9O' converted to -4.9
'-4.9P' failed to convert to double
'-4.9Q' converted to -4.9
'-4.9R' converted to -4.9
'-4.9S' converted to -4.9
'-4.9T' converted to -4.9
'-4.9U' converted to -4.9
'-4.9V' converted to -4.9
'-4.9W' converted to -4.9
'-4.9X' failed to convert to double
'-4.9Y' converted to -4.9
'-4.9Z' converted to -4.9

llvmbot · 2013-12-28T00:55:55Z

Test case showing another condition of it happening
Displays both inputs as "'hello' 'world' 0", instead of "'hello' 'abcdefworld' 0"

mclow · 2014-04-07T17:31:35Z

*** Bug llvm/llvm-bugzilla-archive#19354 has been marked as a duplicate of this bug. ***

mclow · 2014-04-07T17:32:41Z

Another example (from 19354):

#include
#include

int main()
{
std::stringstream ss;
double f;
ss << "abc";
ss >> f;
if(ss.fail()) {
std::cout << "Failed!" << std::endl;
ss.clear();
}
std::cout << "Position: " << ss.tellg() << std::endl;
}

libstdc++ prints 0, libc++ prints 3

seanm · 2014-07-22T21:18:23Z

I have also filed rdar://17766732 as this occurs in shipping OS X versions.

TNorthover · 2014-07-24T13:06:51Z

I think the behaviour is correct for a-f and x, via this route:

operator>> skips space and sends the result on to num_get
num_get accumulates as many characters from the set "0123456789abcdefxABCDEFX+-" (+ decimal separator) as it can (Stage 2).
num_get sends the result to strtold (Stage 3).
strtold decides there's a valid floating-point constant with nonsense at the end (according to C99 spec).
This means an error for num_get (Stage 3 still).

As for the others, I think they're wrong, but they're understandable according to what strtold accepts:

'I' for INF
'N' for NAN
'P' for C99 floating-point hex constants: "0x1ff.5p3a".

Presumably libc++ is taking some shortcut in the required steps.

TNorthover · 2014-07-24T13:21:01Z

Well, not any kind of shortcut it turns out. It's an explicit choice in locale.cpp:

const char __num_get_base::__src[33] = "0123456789abcdefABCDEFxX+-pPiInN";

Everything else seems to pretty logically follow.

llvmbot · 2014-07-24T13:56:05Z

This means an error for num_get (Stage 3 still).

You may be interested in the conversation between myself and Marshall in another bug (http://llvm.org/bugs/show_bug.cgi?id=19611). The conversation here seems to be related.

dexonsmith · 2014-07-24T17:59:28Z

As for the others, I think they're wrong, but they're understandable
according to what strtold accepts:

My reading of do_get() stage 2 (22.4.2.1.2 in the C++11 n3337) is that this is correct behaviour.

If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.

TNorthover · 2014-07-24T18:35:31Z

Hmm. I think it might be undefined behaviour in the standard itself (does that mean reading the pdf itself might invoke nasal demons?).

That sentence assumes that "c" has been properly initialised, but it looks suspiciously like there's no check the std::find did actually succeed.

mclow · 2015-01-28T23:32:06Z

*** Bug llvm/llvm-bugzilla-archive#22372 has been marked as a duplicate of this bug. ***

mclow · 2015-01-28T23:37:14Z

LWG issue #2381 (http://cplusplus.github.io/LWG/lwg-active.html#2381) also reads on this.

llvmbot · 2015-06-17T11:34:12Z

I have a similar problem like this where I am parsing floats from strings using istringstream.

Parsing "10.0f" as a float with value 10.0 works with libstdc++ but not with libc++.

Looks like stdc++ just stops reading characters after invalid characters like 'f' are reached and converts string till invalid character as float. Where as libc++ just fails here.

As per standard, look like libstdc++ is right here, so wanted to know whether libc++ intentionally fails here or it is a bug?

example:
std::istringstream stream(str);
stream >> (*value);

where str is const "std::string &" and value is "float *"

jeremyhu · 2016-06-04T22:50:58Z

Arun, what do you mean by "As per standard, look like libstdc++ is right here"?

Duncan commented earlier that this seems to be correct per the standard in comment #9. If you disagree, please state your argument.

llvmbot · 2016-06-07T10:08:14Z

Hi, I was talking about my interpretation of some reference documentation given below.

As per some documentation, character reading is stopped by numeric extractor once invalid character is reached, so may be that f should be ignored. This could also vary across platforms as you are getting different results with your sample.

Here is the reference:
http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/
<-"...Then (if good), it calls num_get::get (using the stream's selected locale) to perform both the extraction and the parsing operations..."

http://www.cplusplus.com/reference/locale/num_get/get/
<- "...The function stops reading characters from the sequence as soon as one character cannot be part of a valid numerical expression (or end is reached)..."

I had faced this problem a year back when I was solving one chromium bug - https://bugs.chromium.org/p/angleproject/issues/detail?id=1046 and we finally added a workaround in our code itself.

vangyzen · 2016-11-10T21:44:44Z

I just ran into this on FreeBSD (10.3).

The code was like:

std::istringstream ss("123abc");
double retVal;
ss >> retVal;
bool result = ss.eof();

The result (eof) is false on GNU libstdc++, but true on LLVM libc++. The former seems correct, by my reading of LWG #2381, particularly the example:

http://cplusplus.github.io/LWG/lwg-active.html#2381

Is that the consensus?

vangyzen · 2016-11-21T20:01:58Z

I propose the following fix:

https://reviews.llvm.org/D26920

jeremyhu · 2017-02-10T20:59:33Z

http://www.cplusplus.com/reference/locale/num_get/get/
<- "...The function stops reading characters from the sequence as soon as
one character cannot be part of a valid numerical expression (or end is
reached)..."

Emphasis added to the key part of that statement:

"""
... The function stops reading characters from the sequence as soon as one character cannot be part of A valid numerical expression (or end is reached)...
"""

As Tim already mentioned above, a-f CAN be part of A valid numerical expression (eg: 0xbadf00d), so they are accumulated.

libc++ seems to be following the standard correctly. libstdc++ seems to have deviated from the standard in a way that makes practical sense. Given its history, one might argue that the behavior (also as seeming expected / relied upon by multiple projects) might establish a defacto standard and may indeed be preferable to the codified standard. However, we are not here to argue about which is better, we are here to determine which is correct, and as I read it, the libc++ implementation is correct.

This does, however, represent a binary compatibility regression between the two runtimes, and libstdc++'s implementation is potentially more desirable to clients. I would suggest that if we want to pursue this change, we should get clarification added to the C++ standard to indicate if that "a" should indeed be "the".

vangyzen · 2017-02-10T21:39:43Z

I would suggest that if we want to pursue this change, we
should get clarification added to the C++ standard to indicate if that "a"
should indeed be "the".

Marshall Clow already submitted LWG #2381. The proposed example clarifies that the validation is sensitive to the context of "the" expression.

mclow · 2018-01-26T05:02:46Z

*** Bug llvm/llvm-bugzilla-archive#36099 has been marked as a duplicate of this bug. ***

llvmbot · 2018-08-05T17:42:30Z

I think you are misreading this. As noted by Duncan, the key phrase is:

If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.

But you are ignoring the words "the next". They imply context dependence. If the intent of the author of that sentence was to ignore the context of the number you are currently parsing then the words "the next" wouldn't be there.

Further, your current implementation doesn't conform to your interpretation of the specification either. If it did, then you would accept any arbitrary sequence of digits, plus/minus signs and decimal points but this doesn't seem to be the case. For example, under your interpretation "21-42" should be consumed as a single run of characters that are "allowed as the next character of an input field" and then rejected as an invalid double. If libc++ did this I suspect it would break a lot of software.

llvmbot · 2019-01-07T15:25:13Z

I recently found this bug report after discovering that both libstdc++ and libc++ have their own issues reading doubles/floats from a stream (fwiw, my notes and examples from that exercise are documented at https://github.com/tardate/LittleCodingKata/blob/master/cpp/DoubleTrouble/README.md).

Am I correct in understanding the current state of the bug is basically stalled, pending approval of LWG #2381, or perhaps activity on C++20 that might supersede it?

I'm a little concerned that LWG #2381 as it stands may not fully resolve the issues, as it appears limited to adding "p" to numeric character set and providing a clarifying example of how a hex float should be accumulated. Specifically, it doesn't address (for example) failing cases such as "abc" as mentioned on Bug 19354 - merged as a duplicate - which remain subject to interpretation and potentially requiring further clarification to get right.

I do agree with the intent of LWG#2381, which I think covers two main points. To paraphrase:

Conversion should follow the same rules as strtold, meaning that it terminates at the first character that is not "... allowed as the next character..." (even it is from the set of valid numeric characters)
All characters not used in a valid conversion are left on the stream (i.e. no magic discards per current libc++ implementation)

This is I believe "common sense" - and basically the libstdc++ approach (notwithstandanding its other issues) - but not the current libc++ implementation. But I don't think it is possible to conclude one or other is correct under the C++17 standard, because as it stands the spec is ambiguous and incomplete on these points.

Looking at C++20 draft n4791, 27.4.2.1.2 does not yet seem to reflect LWG #2381 - it's basically the same as C++17, and still has the ambiguities and gaps and so still arguably contradicts LWG#2381.

It seems that without clarity on the spec front, I doubt if this bug can be resolved - is that the general concensus? Is there hope for successfully addressing the specification issue at least in C++20?

NB: if my reading of LWG#2381 is correct per the two points above, from my (very inexpert) reading of the code it appears that the libc++ implementation could be "fixed" with minor surgery:

if __num_get_float fails, then __do_get_floating_point should rollback the stream pointer to unwind all the characters extracted in stage 2
if __num_get_float can perform a partial conversion (i.e. not all of the characters extracted in stage 2):
- treat this as success, return the extracted value
- and pass back the position in the stream that it was converted up to (__p2) - so __do_get_floating_point can adjust the stream position accordingly.

mclow · 2021-11-26T19:06:16Z

mentioned in issue llvm/llvm-bugzilla-archive#19354

mclow · 2021-11-26T20:04:18Z

mentioned in issue llvm/llvm-bugzilla-archive#22372

mclow · 2021-11-27T00:14:34Z

mentioned in issue llvm/llvm-bugzilla-archive#36099

llvmbot · 2021-11-27T03:28:01Z

mentioned in issue llvm/llvm-bugzilla-archive#46948

because its semantics differ between libstdc++ and libc++ llvm/llvm-project#18156 (fixes #240)

Behavior of parsing doubles from streams is different between GCC and Clang [1]. [1] llvm/llvm-project#18156

heshpdx · 2024-10-09T23:29:04Z

Is this bug fixed now? Using LLVM 19.1.0, we see different behavior in this code below, compared to when running with LLVM 18. The code references this bug ticket, so I decided to follow up.

https://github.com/danmar/cppcheck/blob/main/lib/mathlib.cpp#L498

In particular, these tests no longer fail
https://github.com/danmar/cppcheck/blob/main/test/testmathlib.cpp#L660
@danmar

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021

llvmbot added the confirmed Verified by a second party label Jan 26, 2022

kiryph mentioned this issue Apr 27, 2023

macOS: XML error: error in path data: number expected (premature end of data) mgieseki/dvisvgm#240

Closed

mgieseki added a commit to mgieseki/dvisvgm that referenced this issue Apr 30, 2023

don't use operator >> to parse floating point numbers

f7ce2d7

because its semantics differ between libstdc++ and libc++ llvm/llvm-project#18156 (fixes #240)

philnik777 added the locale issues related to localization label Jul 15, 2023

al42and mentioned this issue Feb 13, 2024

Strange behavior of the postfix operator on macOS beltoforion/muparser#123

Closed

andre-schulz added a commit to andre-schulz/mve that referenced this issue May 11, 2024

Test alternative pasing of doubles

0a63b5c

Behavior of parsing doubles from streams is different between GCC and Clang [1]. [1] llvm/llvm-project#18156

hly2019 mentioned this issue Jul 30, 2024

stringstream acts incorrectly with double type value when some characters are followed emscripten-core/emscripten#22292

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

num_get::do_get(in, end, double) broken if double is followed by character #18156

num_get::do_get(in, end, double) broken if double is followed by character #18156

llvmbot commented Nov 2, 2013

llvmbot commented Nov 2, 2013

mclow commented Nov 4, 2013

llvmbot commented Dec 28, 2013

mclow commented Apr 7, 2014

mclow commented Apr 7, 2014

seanm commented Jul 22, 2014

TNorthover commented Jul 24, 2014

TNorthover commented Jul 24, 2014

llvmbot commented Jul 24, 2014

dexonsmith mannequin commented Jul 24, 2014

TNorthover commented Jul 24, 2014

mclow commented Jan 28, 2015

mclow commented Jan 28, 2015

llvmbot commented Jun 17, 2015

jeremyhu mannequin commented Jun 4, 2016

llvmbot commented Jun 7, 2016

vangyzen commented Nov 10, 2016

vangyzen commented Nov 21, 2016

jeremyhu mannequin commented Feb 10, 2017

vangyzen commented Feb 10, 2017

mclow commented Jan 26, 2018

llvmbot commented Aug 5, 2018

llvmbot commented Jan 7, 2019

mclow commented Nov 26, 2021

mclow commented Nov 26, 2021

mclow commented Nov 27, 2021

llvmbot commented Nov 27, 2021

heshpdx commented Oct 9, 2024 •

edited

Loading

num_get::do_get(in, end, double) broken if double is followed by character #18156

num_get::do_get(in, end, double) broken if double is followed by character #18156

Comments

llvmbot commented Nov 2, 2013

Extended Description

llvmbot commented Nov 2, 2013

mclow commented Nov 4, 2013

llvmbot commented Dec 28, 2013

mclow commented Apr 7, 2014

mclow commented Apr 7, 2014

seanm commented Jul 22, 2014

TNorthover commented Jul 24, 2014

TNorthover commented Jul 24, 2014

llvmbot commented Jul 24, 2014

dexonsmith mannequin commented Jul 24, 2014

TNorthover commented Jul 24, 2014

mclow commented Jan 28, 2015

mclow commented Jan 28, 2015

llvmbot commented Jun 17, 2015

jeremyhu mannequin commented Jun 4, 2016

llvmbot commented Jun 7, 2016

vangyzen commented Nov 10, 2016

vangyzen commented Nov 21, 2016

jeremyhu mannequin commented Feb 10, 2017

vangyzen commented Feb 10, 2017

mclow commented Jan 26, 2018

llvmbot commented Aug 5, 2018

llvmbot commented Jan 7, 2019

mclow commented Nov 26, 2021

mclow commented Nov 26, 2021

mclow commented Nov 27, 2021

llvmbot commented Nov 27, 2021

heshpdx commented Oct 9, 2024 • edited Loading

heshpdx commented Oct 9, 2024 •

edited

Loading