Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime_from_str is too lenient, doesn't roundtrip #548

Open
koivunej opened this issue Mar 22, 2021 · 3 comments
Open

datetime_from_str is too lenient, doesn't roundtrip #548

koivunej opened this issue Mar 22, 2021 · 3 comments
Labels

Comments

@koivunej
Copy link

koivunej commented Mar 22, 2021

For another crate cargo-fuzz found me the input "4\t3\t\u{c}\t\t3\t\u{c}\t5\t3\t60Z" which is parsed successfully with Utc.datetime_from_str(s, "%Y%m%d%H%M%S%.3fZ"). Minimal test case for version 0.4.19:

#[test]
fn fuzzed_roundtrip() {
    let fmt = "%Y%m%d%H%M%S%.3fZ";
    let input = "4\t3\t\u{c}\t\t3\t\u{c}\t5\t3\t60Z";
    let ts = Utc.datetime_from_str(input, fmt).unwrap();
    let fmted = ts.format(fmt).to_string(); // 00040303050360.000Z
    assert_eq!(input, fmted);
}

There are certainly more of these with fuzzing target:

#![no_main]
use libfuzzer_sys::fuzz_target;

use chrono::{TimeZone, Utc};

static FMT: &str = "%Y%m%d%H%M%S%.3fZ";

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        if let Ok(ts) = Utc.datetime_from_str(s, FMT) {
            let rt = ts.format(FMT).to_string();

            assert_eq!(s, rt);
        }
    }
});

As far as I understand by looking at the documentation, "4\t3" should not get through "%Y%m" as "%Y" should be zero padded 4 positive digits. Overall I think the format string in question should only allow input which is formatted exactly same, though I read that the %.xf might be lenient for the number of zeroes. I'll try to see if I can offer any help with this. Could be that what I am after is out of scope of chrono.

@koivunej
Copy link
Author

koivunej commented Mar 22, 2021

It looks like the \t and \u{c} are gotten rid of by str::trim_left in

s = s.trim_left();

Not sure why for just %Y the "4" is not accepted but with the larger format string it works ok. Nevermind the NotEnough comes from the to_naive_{date,time} meaning the fmt string is insufficient.

@koivunej
Copy link
Author

koivunej commented Mar 23, 2021

Looking with git blame the trim_left has been with us for 6 years so I guess it aims to give a more lenient than strict parsing, same goes with accepting 1..=4 digits for %Y instead. Still given the age of the s = s.trim_left() being the first step of Item::Numeric it does feel a bit strange as Item::Space(_) seems to exist only for this reason. I wonder if use cases where it is wanted to accept more lenient whitespace, shouldn't this Item::Space be used for that reason?

Just tested with against my local glibc (2.31-0ubuntu9.2) how does strptime react to "leading whitespace": it just skips over, and doesn't mind about trailing characters and overall is surprisingly fuzzy. I did not know this, which must be the origin for the whitespace behaviour.

For my case, I will probably match it to a regex before passing it over to Utc.datetime_from_str with some added machinery to same length string output under #[cfg(fuzzing)] or similar.

I still think the docs would probably need to changed at https://docs.rs/chrono/0.4.19/chrono/format/strftime/index.html:

  • %Y is specified as The full proleptic Gregorian year, zero-padded to 4 digits. whereas it will be parsed with or without leading zeros, and can be less than 4 digits
  • %Y also refers to a footnote %Y: Negative years are allowed in formatting but not in parsing. whereas in code sign prefixed values are accepted, and having a sign prefix makes the accepted digit count to be usize::MAX (I can't yet understand why) -- could it be that some later part in parsing denies the negative years?

Apologies if these are already collected somewhere.

@koivunej
Copy link
Author

I was looking at the failures if I were to comment the s = s.trim_left() out and found this

// `s` and `s_` may differ, but `s.parse()` and `s_.parse()` must be same

Sadly it does not make for a good property to fuzz given the negative %Y year parsing to up to unlimited digits or %.3f accepting more than 3 digits, which would be truncated using the format string "%Y%m%d%H%M%S%.3fZ" and the two parsed values wouldn't be equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants