-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overly relaxed whitespace parsing leads to unexpected parsing success (was Local.datetime_from_str errantly ignoring format whitespace) #660
Comments
In case anyone needs a hack workaround. (full code with tests) /// workaround for chrono Issue #660 https://github.com/chronotope/chrono/issues/660
/// match spaces at beginning and ending of inputs
/// TODO: handle all Unicode whitespace.
/// This fn is essentially counteracting an errant call to `std::string:trim`
/// within `Local.datetime_from_str`.
/// `trim` removes "Unicode Derived Core Property White_Space".
/// This implementation handles three whitespace chars. There are twenty-five whitespace
/// chars according to
/// https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace .
pub fn datetime_from_str_workaround_Issue660(value: &str, pattern: &str) -> bool {
let spaces = " ";
let tabs = "\t";
let lineends = "\n\r";
// match whitespace forwards from beginning
let mut v_sc: u32 = 0; // `value` spaces count
let mut v_tc: u32 = 0; // `value` tabs count
let mut v_ec: u32 = 0; // `value` line ends count
let mut v_brk: bool = false;
for v_ in value.chars() {
if spaces.contains(v_) {
v_sc += 1;
} else if tabs.contains(v_) {
v_tc += 1;
} else if lineends.contains(v_) {
v_ec += 1;
} else {
v_brk = true;
break;
}
}
let mut p_sc: u32 = 0; // `pattern` space count
let mut p_tc: u32 = 0; // `pattern` tab count
let mut p_ec: u32 = 0; // `pattern` line ends count
let mut p_brk: bool = false;
for p_ in pattern.chars() {
if spaces.contains(p_) {
p_sc += 1;
} else if tabs.contains(p_) {
p_tc += 1;
} else if lineends.contains(p_) {
p_ec += 1;
} else {
p_brk = true;
break;
}
}
if v_sc != p_sc || v_tc != p_tc || v_ec != p_ec {
return false;
}
// match whitespace backwards from ending
v_sc = 0;
v_tc = 0;
v_ec = 0;
if v_brk {
for v_ in value.chars().rev() {
if spaces.contains(v_) {
v_sc += 1;
} else if tabs.contains(v_) {
v_tc += 1;
} else if lineends.contains(v_) {
v_ec += 1;
} else {
break;
}
}
}
p_sc = 0;
p_tc = 0;
p_ec = 0;
if p_brk {
for p_ in pattern.chars().rev() {
if spaces.contains(p_) {
p_sc += 1;
} else if tabs.contains(p_) {
p_tc += 1;
} else if lineends.contains(p_) {
p_ec += 1;
} else {
break;
}
}
}
if v_sc != p_sc || v_tc != p_tc || v_ec != p_ec {
return false;
}
return true;
}
pub fn main() {
for (dts, pattern) in &[
// should not match
("2000-01-01 02:03:01", "%Y-%m-%d %H:%M:%S "),
("2000-01-01 02:03:02", " %Y-%m-%d %H:%M:%S "),
("2000-01-01 02:03:03\n", "%Y-%m-%d %H:%M:%S "),
// should match
("2000-01-01 02:03:04 ", "%Y-%m-%d %H:%M:%S "),
(" 2000-01-01 02:03:05", " %Y-%m-%d %H:%M:%S"),
] {
let dt = match Local.datetime_from_str(dts, pattern) {
Ok(val) => {
// HACK: workaround chrono Issue #660 by checking for matching begin, end of `dts`
// and `pattern`
if !datetime_from_str_workaround_Issue660(dts, pattern) {
None
} else {
Some(val)
}
}
Err(err) => {None}
};
println!("{:?}", dt)
}
} console prints
|
I think I agree that we should fix this. Would you be able to submit a PR to fix this, along with some test cases similar to the ones you already have here? That would be great! |
`format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt` Issue chronotope#660
@djc I posted a proof-of-concept PR at #786 This does fix this Issue #660. However, while working on this Issue, I noticed the tendency to trim arbitrary amounts of whitespace happens in many places:
(I have only glanced at each statement listed). It seems like the "right solution" is to change this chrono behavior "silently consume arbitrary whitespace" to "be exacting about whitespace characters". Do you have any thoughts on this? My own preference is that whitespace and character literals should matter when doing "matching" type things. Silently accepting arbitrary noise is imprecise and unexpected. My concern is how much user code could break. (again, I haven't taken the time to deeply understand all chrono whitespace handling in the git grep listing, only skimmed through some of it). |
`format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt` Issue chronotope#660
`format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt` Issue chronotope#660
Well, some uses of trimming are legit (that is, legal in terms of the grammar for the relevant format). But yes, I do share your worry that we might break too many users. Maybe this is something we should only change in the 0.5 release. @esheppa what do you think? |
I think waiting for One thing that I'd like to do is improve the error messages from the parser, which would be ideal to combine with any increasing strictness, to ensure that if/when we do break user code, at least fixing it is easier. (For example the parser could allocate a |
Also for |
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse`. Be more exacting about colons and whitespace around timezones. Issue chronotope#660
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse`. Be more exacting about colons and whitespace around timezones. Issue chronotope#660
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse`. Be more exacting about colons and whitespace around timezones. Issue chronotope#660
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse`. Be more exacting about colons and whitespace around timezones. Issue chronotope#660
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse`. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt` Be more exacting about colons and whitespace around timezones. Issue chronotope#660
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse`. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt` Be more exacting about colons and whitespace around timezones. Issue chronotope#660
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse`. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt` Be more exacting about colons and whitespace around timezones. Issue chronotope#660
Be exact about whitespace in parsing. This drastically changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt` Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Be exact about whitespace in parsing. This changes pattern matching in `format::parse::parse` as it does not allow arbitrary whitespace before, after, or between the datetime specifiers. `format/parse.rs:datetime_from_str` is exact about whitespace in the passed data `s` and passed strftime format `fmt`. Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. Issue chronotope#660
Constrain timezone middle-colon separator string from infinite intermixed whitespace and colons to possible patterns `":"`, `" "`, `" :"`, `": "`, or `" : "`. A reasonable trade-off of previous extreme flexibility for a little flexbility and concise input. Issue chronotope#660
Rename colon_or_space to maybe_colon_or_space. Refactor fn maybe_colon_or_space to be simpler. PR chronotope#807 feedback from @esheppa. Issue chronotope#660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue chronotope#660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue chronotope#660
Be exact about allowed whitespace around and between data and parsing formats for all parsing. Except RFC 2822 which explicitly allows arbitrary whitespace. Issue chronotope#660
Constrain timezone middle-colon separator string from infinite intermixed whitespace and colons to possible patterns `":"`, `" "`, `" :"`, `": "`, or `" : "`. A reasonable trade-off of previous extreme flexibility for a little flexbility and concise input. Issue chronotope#660
Rename colon_or_space to maybe_colon_or_space. Refactor fn maybe_colon_or_space to be simpler. PR chronotope#807 feedback from @esheppa. Issue chronotope#660
Be exact about allowed whitespace around and between data and parsing formats for all parsing. Except RFC 2822 which explicitly allows arbitrary whitespace. Issue chronotope#660
Constrain timezone middle-colon separator string from infinite intermixed whitespace and colons to possible patterns `":"`, `" "`, `" :"`, `": "`, or `" : "`. A reasonable trade-off of previous extreme flexibility for a little flexbility and concise input. Issue chronotope#660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue #660
Be exact about allowed whitespace around and between data and parsing formats for all parsing. Except RFC 2822 which explicitly allows arbitrary whitespace. Issue #660
Constrain timezone middle-colon separator string from infinite intermixed whitespace and colons to possible patterns `":"`, `" "`, `" :"`, `": "`, or `" : "`. A reasonable trade-off of previous extreme flexibility for a little flexbility and concise input. Issue #660
Constrain timezone parsing further. Only allow optional colon char `:` between timezone hour offset and timezone minute offset. Issue chronotope#660
Constrain timezone parsing further. Only allow optional colon char `:` between timezone hour offset and timezone minute offset. Issue #660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue chronotope#660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue chronotope#660
Be exact about allowed whitespace around and between data and parsing formats for all parsing. Except RFC 2822 which explicitly allows arbitrary whitespace. Issue chronotope#660
Constrain timezone middle-colon separator string from infinite intermixed whitespace and colons to possible patterns `":"`, `" "`, `" :"`, `": "`, or `" : "`. A reasonable trade-off of previous extreme flexibility for a little flexbility and concise input. Issue chronotope#660
Constrain timezone parsing further. Only allow optional colon char `:` between timezone hour offset and timezone minute offset. Issue chronotope#660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue #660
Be exact about allowed whitespace around and between data and parsing formats for all parsing. Except RFC 2822 which explicitly allows arbitrary whitespace. Issue #660
Constrain timezone middle-colon separator string from infinite intermixed whitespace and colons to possible patterns `":"`, `" "`, `" :"`, `": "`, or `" : "`. A reasonable trade-off of previous extreme flexibility for a little flexbility and concise input. Issue #660
Constrain timezone parsing further. Only allow optional colon char `:` between timezone hour offset and timezone minute offset. Issue #660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue chronotope#660
Add more varying testing for most parsing functions. Tests emphasize whitespace, literals, timezones, and timezone delimiters (colons and whitespace). Add tests for multiple-byte characters and combining characters in and around data and parsing formats. These tests are added to aid humans verifying the next commit that changes parsing behavior. Issue #660
Function
chrono::Local.datetime_from_str
ignores leading or trailing whitespace in the passed variablefmt
.Given code
results in unexpected successful parsing.
I expected
The passed variable
fmt
has a surrounding whitespace that is not present in the passeds
, yetdatetime_from_str
returnsOk
. These are errant matches.Using
chrono
version0.4.19
.Similar to Issue #548
The text was updated successfully, but these errors were encountered: