Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change in RFC 3339 parser #1239

Closed
amousset opened this issue Aug 31, 2023 · 11 comments
Closed

Breaking change in RFC 3339 parser #1239

amousset opened this issue Aug 31, 2023 · 11 comments

Comments

@amousset
Copy link

amousset commented Aug 31, 2023

e985f08 changed the behavior of the RFC3339 parsers to switch to a relaxed implementation. The modifies notably the behavior of the parser, e.g. the following code fails before and succeeds starting from 0.4.27 (which actually broke some code using it in my case).

let d = "\n2019-05-09T13:36:46+00:00";
DateTime::<FixedOffset>::parse_from_str(d, "%+").unwrap();

I'm not sure if this impact was expected but it is probably worth a warning in the release notes (it currently only says "Small fixes to the RFC 3339 parsers").

@djc
Copy link
Member

djc commented Aug 31, 2023

Can you say more about the code it broke and how bad the impact is? Generally my operating assumption is that making parsers more relaxed is usually okay, but of course this is one of those areas where semver is tricky to navigate.

@pitdicker can you propose some changes to the release notes for this?

@amousset
Copy link
Author

It broke a nom-based parser for log files, that tries to parse a datetime at the beginning of the lines, and that ended up eating an unexpected \n in a test case. It has no serious impact and is easy to workaround in our case, and I agree semver is tricky in those cases.
I'd be happy with a clear message in the release notes.

@pitdicker
Copy link
Collaborator

pitdicker commented Aug 31, 2023

@amousset In your case the problem is the "\n" at the start, right? And not in the fields within the RFC3339 format?
I had to think about the other issues we have about accepting too much whitespace when parsing, #660 and #548.

Still parsing with the %+ specifier or the RFC3339 formatting item is not strict anymore. Is using the strict parsing in DateTime::<FixedOffset>::from_rfc3339 a solution?

@amousset
Copy link
Author

Is using the strict parsing in DateTime::::from_rfc3339 a solution?

Yes, it restores the behavior we were relying on 👍

@pitdicker
Copy link
Collaborator

@pitdicker can you propose some changes to the release notes for this?

"The parser for the %+ formatting specifier and the RFC3339 formatting item is switched from a strict to a relaxed parser. This matches the existing documentation, and the parser used by DateTime::from_str. If you need to validate the input, consider using DateTime::from_rfc3339."

@djc
Copy link
Member

djc commented Aug 31, 2023

Sounds good!

@pitdicker
Copy link
Collaborator

Updated the release notes.

@amousset
Copy link
Author

Thanks!

@tyranron
Copy link

tyranron commented Sep 9, 2023

Just another relevant note here for those who may seek answers:

I have a code breakage too.

Now it started to allow 1996-12-19 14:23:43Z input without T in-between. Looking at RFC 3339, it seems that T could be relaxed on choice:

NOTE: ISO 8601 defines date and time separated by "T".
Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.

@djc
Copy link
Member

djc commented Sep 9, 2023

Right, that is actually correct per the RFC.

@pitdicker
Copy link
Collaborator

pitdicker commented Sep 9, 2023

NOTE: ISO 8601 defines date and time separated by "T". Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character.

My understanding of this paragraph is:

  • An ISO 8601 parser will only accept "T" as a separator. We don't need that paragraph to say so, that is specified so in ISO 8601:

    The character [T] shall be used as time designator to indicate the start of the representation of the time of day component in these expressions.

    NOTE By mutual agreement of the partners in information interchange, the character [T] may be omitted in applications where there is no risk of confusing a date and time of day representation with others defined in this International Standard.

    ISO 8601 is a bit annoying because it specifies something, and then often puts in a note than you can deviated if both parties agree.

  • "Applications using this syntax" is about RFC 3339, which besides "T" as separator also allows other characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants