-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More principled handling of string<->numeric conversions. #1796
Conversation
This PR is a counter-proposal to #1785. In this one, we are not adding any API calls that take explicit locales, instead just changing the documentation on existing APIs to specify that OIIO uses "C" locale exclusively for formatting numerical values. Internally, of course we still play some games with locales to get that right. And we still fix all the bugs discovered with string_view and whatnot that were part of the other proposal. |
OIIO_CHECK_EQUAL (Strutil::stoi("+123"), 123); | ||
OIIO_CHECK_EQUAL (Strutil::stoi(" 123 "), 123); | ||
OIIO_CHECK_EQUAL (Strutil::stoi("123.45"), 123); | ||
OIIO_CHECK_EQUAL (Strutil::stoi("12345678901234567890"), std::numeric_limits<int>::max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some overflow cases too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an overflow case on line 392. What did you mean?
src/libutil/strutil.cpp
Outdated
// Locale-independent quickie ASCII digit and alphanum tests, good enough | ||
// for our parsing. | ||
inline int isupper (char c) { return c >= 'A' && c <= 'Z'; } | ||
inline int islower (char c) {return c >= 'a' && c <= 'z'; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: missing space
LGTM Eventually I would love to see us roll our own strtod/strtof -- but they are harder to write, so this seems like a good start. |
This refactor comes after a belated realization that our numeric conversion routines (principally from_string<> and format) are "locale-dependent" via their use of strtod, as well as atof() scattered through the code. In particular, this means that when software using this code is running with a locale in which the decimal mark is something other than '.' (e.g., many European locales that use ',' for decimal mark), these functions will incorrectly parse text floating point numbers that use the standard period as decimal mark. For example, if the locale is "fr_FR", atof("123.45") will return 123.0, rather than the correct 123.45. After much debate, and a few implementations, we are declaring that since the whole point of OIIO is about reading and writing persistent data (image files!), that in order to be consistent, ALL of OIIO's string parsing and formatting, including that done by utility library functions, will be locale-independent by using the "C" locale (in particular, we will use the usual North American default of '.' for decimal separator). Additionally: * Added a new stof(), stoi(), stoul() to replace the clunkier template syntax of from_string<>. As explained, these are hard-coded to "C" locale, not the native/global locale. * Much more attention to string_view safety, finding a number of subtle bugs (including Strutil::parse_int, etc.) where it was definitely not safe to assume that the string_view passed was the null-terminated edge of the string, and so, for example if a string_view pointed to the characters "12" but the very next characters were "345" parse_int would return 12345 instead of the correct 12. * Strutil::format and ustring::format (and our copy of tinyformat underneath) were touched up so that the versions of format() that return strings will use the "C" locale, versions added that return strings and use an explicitly-passed locale, and the versions that append their results to existing streams continue to operate as before by honoring the existing locale conventions of the streams they are using. We may revise this later depending on what the upstream tinyformat chooses to do about the issue. * Several places where we assembled strings using std::ostringstream, I forced the stream to use classic "C" locale in cases where it was likely to have any floating-point data output. * For maketx and oiiotool, globally set the locale to classic. In these cases, we control the whole app, so this is safe to do. If we ever decide that we want versions of these functions that take explicit locales, it's not hard to add.
We can easily drop in an alternate strtod implementation if we find a suitable one. Suitable means:
|
…twareFoundation#1796) This refactor comes after a belated realization that our numeric conversion routines (principally from_string<> and format) are "locale-dependent" via their use of strtod, as well as atof() scattered through the code. In particular, this means that when software using this code is running with a locale in which the decimal mark is something other than '.' (e.g., many European locales that use ',' for decimal mark), these functions will incorrectly parse text floating point numbers that use the standard period as decimal mark. For example, if the locale is "fr_FR", atof("123.45") will return 123.0, rather than the correct 123.45. After much debate, and a few implementations, we are declaring that since the whole point of OIIO is about reading and writing persistent data (image files!), that in order to be consistent, ALL of OIIO's string parsing and formatting, including that done by utility library functions, will be locale-independent by using the "C" locale (in particular, we will use the usual North American default of '.' for decimal separator). Additionally: * Added a new stof(), stoi(), stoul() to replace the clunkier template syntax of from_string<>. As explained, these are hard-coded to "C" locale, not the native/global locale. * Much more attention to string_view safety, finding a number of subtle bugs (including Strutil::parse_int, etc.) where it was definitely not safe to assume that the string_view passed was the null-terminated edge of the string, and so, for example if a string_view pointed to the characters "12" but the very next characters were "345" parse_int would return 12345 instead of the correct 12. * Strutil::format and ustring::format (and our copy of tinyformat underneath) were touched up so that the versions of format() that return strings will use the "C" locale, versions added that return strings and use an explicitly-passed locale, and the versions that append their results to existing streams continue to operate as before by honoring the existing locale conventions of the streams they are using. We may revise this later depending on what the upstream tinyformat chooses to do about the issue. * Several places where we assembled strings using std::ostringstream, I forced the stream to use classic "C" locale in cases where it was likely to have any floating-point data output. * For maketx and oiiotool, globally set the locale to classic. In these cases, we control the whole app, so this is safe to do. If we ever decide that we want versions of these functions that take explicit locales, it's not hard to add.
Would be great if this could be backported to 1.8 and maybe 1.7 too (I'm stuck on 1.7 on Windows because of #1795). |
This refactor comes after a belated realization that our numeric
conversion routines (principally from_string<> and format) are
"locale-dependent" via their use of strtod, as well as atof() scattered
through the code.
In particular, this means that when software using this code is running
with a locale in which the decimal mark is something other than '.'
(e.g., many European locales that use ',' for decimal mark), these
functions will incorrectly parse text floating point numbers that use
the standard period as decimal mark. For example, if the locale is
"fr_FR", atof("123.45") will return 123.0, rather than the correct
123.45.
After much debate, and a few implementations, we are declaring that
since the whole point of OIIO is about reading and writing persistent
data (image files!), that in order to be consistent, ALL of OIIO's
string parsing and formatting, including that done by utility library
functions, will be locale-independent by using the "C" locale (in
particular, we will use the usual North American default of '.' for
decimal separator).
Additionally:
Added a new stof(), stoi(), stoul() to replace the clunkier template
syntax of from_string<>. As explained, these are hard-coded to "C" locale,
not the native/global locale.
Much more attention to string_view safety, finding a number of subtle
bugs (including Strutil::parse_int, etc.) where it was definitely not
safe to assume that the string_view passed was the null-terminated edge
of the string, and so, for example if a string_view pointed to the
characters "12" but the very next characters were "345" parse_int would
return 12345 instead of the correct 12.
Strutil::format and ustring::format (and our copy of tinyformat
underneath) were touched up so that the versions of format() that return
strings will use the "C" locale, versions added that return strings and
use an explicitly-passed locale, and the versions that append their
results to existing streams continue to operate as before by honoring
the existing locale conventions of the streams they are using.
We may revise this later depending on what the upstream tinyformat
chooses to do about the issue.
Several places where we assembled strings using std::ostringstream,
I forced the stream to use classic "C" locale in cases where it was
likely to have any floating-point data output.
For maketx and oiiotool, globally set the locale to classic. In these
cases, we control the whole app, so this is safe to do.
If we ever decide that we want versions of these functions that take
explicit locales, it's not hard to add.