Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom time format implementations #5123

Merged

Conversation

straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Oct 15, 2017

The format patterns from Time::Format allow to define simple date formats but they are quite limited in flexibility. Many standardized formats include slight variations which must be taken into account for parsing such formats in a standards-compliant way.

This PR adds custom format implementations for some well-used time formats.
For a few it adds adds short-cut methods to Time class (like Time.parse_iso8601).

  • Time::Format::ISO_8601_DATETIME: Parser for standard time format with many variations. It should support most of the standard format including week dates, ordinal dates and optional separators. Uses RFC 3339 as formatter.
  • Time::Format::RFC_3339: More strict variation of ISO 8601, recommended for exchanging time data in computer systems.
  • Time::Format::RFC_2822: Format compatible to RFC 822 and RFC 1123 used for example in HTTP protocols.
  • Time::Format::HTTP_DATE: Proper implementation of HTTP.parse_date supporting time formats expressed as RFC 1123/RFC 2822, RFC 850 or asctime which are used by many HTTP protocols. Uses RFC_2822 as formatter.
  • Time::Format::YAML_DATE: Refactored implementation of parser for YAML datetime type.

Surpasses #5084, #4729

/cc @asterite @RX14

when Time::Kind::Local
time_zone_rfc2822
else
raise "invalid timezone"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize exception message, please.

@straight-shoota straight-shoota force-pushed the jm-time-formats-simplified branch 2 times, most recently from ef5042d to 216efdb Compare October 15, 2017 11:00
@straight-shoota straight-shoota force-pushed the jm-time-formats-simplified branch 4 times, most recently from 92e737f to be405fb Compare December 15, 2017 14:58
@straight-shoota
Copy link
Member Author

I've rebased this PR and refactored a bit. It would be great to get some feedback.

@straight-shoota straight-shoota force-pushed the jm-time-formats-simplified branch from be405fb to a98da2a Compare December 15, 2017 16:34
@straight-shoota straight-shoota force-pushed the jm-time-formats-simplified branch from a98da2a to 575f4eb Compare March 7, 2018 22:48
@straight-shoota
Copy link
Member Author

I've rebased this PR to incorporate the Time::Location API added in #5324.

Requires #5786

@straight-shoota straight-shoota changed the title Add custom time format implementations (simplified) Add custom time format implementations Mar 7, 2018
@straight-shoota straight-shoota force-pushed the jm-time-formats-simplified branch 3 times, most recently from 18fde5a to 1e1b76d Compare March 13, 2018 20:48
Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@straight-shoota

  1. may I ask to rebase this PR on master?

  2. if Ensure time parser raises if timezone is missing #5382 is merged we are mostly ok since we will have proper timezone parsing exception but adds a quick-hack, and it would be more proper to include this PR instead of Ensure time parser raises if timezone is missing #5382. Am I right?

  3. Is there anything pending that you have in mind as a follow up for this PR?

These adds some breaking-changes so having a rebase would be helpful to check the impact.

it "without time zone" do
time = Time.utc(1994, 11, 6, 8, 49, 37, nanosecond: 0)
HTTP.rfc1123_date(time).should eq("Sun, 06 Nov 1994 08:49:37 GMT")
HTTP.format_time(time).should eq("Sun, 6 Nov 1994 08:49:37 GMT")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred format is with two day digits as of https://tools.ietf.org/html/rfc7231#section-7.1.1.1 and https://tools.ietf.org/html/rfc2616#page-21

Somehow I would prefer rfc/iso in the function names for the format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name was changed because the methods doesn't implement one exact standard format but generally format and parse time instances for use in a HTTP protocol context (see previous comments in #4729 (comment)).

I'll look into the preferred digit format.

end

# DEPRECATED: Use `HTTP.format_time` instead.
def self.rfc1123_time(time : Time) : String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is left and marked as deprecated and rfc1123_date dropped completely? I think is either both or none.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it needs to be removed.

@@ -124,7 +124,7 @@ end

struct Time
def to_json(json : JSON::Builder)
json.string(Time::Format::ISO_8601_DATE_TIME.format(self))
json.string(Time::Format::RFC_3339.format(self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change if ISO_8601 is still defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is mostly negligible, but RFC 3339 is generally the better (stricter) format for this purpose. ISO_8601_DATE_TIME by default formats a offset as HHmm while RFC 3339 requires a colon: HH:mm

The JSON specification doesn't include a Date data type, so it's up to the implementation. JavaScript's Date.toJSON() actually is RFC 3339 compliant (YYYY-MM-DDTHH:mm:ss.sssZ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The references I found said ISO8601 is the format used in Javascript. I think we should stick to 8601 here. Actually been less stricter is better IMO since an API built on top of this would be less pedantic to the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this is a bit confusing because RFC 3339 actually is ISO 8601. But RFC 3339 is better suited as machine-parsed exchange format. That's why I'm pretty convinced we should use RFC 3339 as output format for #to_json.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left RFC_3339 here but it returns the exact same result as ISO_8601_DATE_TIME.

@straight-shoota
Copy link
Member Author

Thanks for taking a look at this!

  1. Done.
  2. Exactly. Ensure time parser raises if timezone is missing #5382 was supposed to be a quick bug fix, but it's been stalled for so long it would probably be better to just forget it and merge this one directly.
  3. I don't think I had any further plans. Maybe I forgot... it's been a long time.

@@ -169,7 +169,7 @@ describe "JSON serialization" do
end

it "deserializes Time" do
Time.from_json(%("2016-11-16T09:55:48-0300")).to_utc.should eq(Time.utc(2016, 11, 16, 12, 55, 48))
Time.from_json(%("2016-11-16T09:55:48-03:00")).to_utc.should eq(Time.utc(2016, 11, 16, 12, 55, 48))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't break this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to #5123 (comment)
I agree. The JSON format should be ISO8601 by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output should be the RFC 3339 flavour of ISO 8601, the parser should accept more variants. That's probably the best solution.

@bcardiff
Copy link
Member

bcardiff commented Jun 6, 2018

@straight-shoota

  1. Any further arguments on Add custom time format implementations #5123 (comment) ? Why emitting in RFC3339 is better since the default in JavaScript is ISO8601? Luckily in utc they both use Z suffix. Although the added : is still ISO8601 compatible, we should be fine. @RX14 do you agree?

  2. I noticed that now nanoseconds will be rendered in to_json while before only seconds were used. This was also aligned with go behaviour that includes a separata formatter for just seconds and nano seconds. Could you change the formatter to emit only seconds as before? I don't have a strong opinion to add a 3339Nano formatter but the extra precision might be problematic in some system where just milliseconds are supported.

After 2 is addressed is a 👍 from me (and a big 🙇 for all the time re work)

@straight-shoota
Copy link
Member Author

  1. There is no difference. ISO_8601_DATE_TIME would also use a colon in the offset, and that's how it should be. JavaScript's Date.toJSON() serializes in UTC only, by the way.
  2. Good catch! We need more specs for these things 😆 But: should the default format entirely omit sub-second precision or round to milliseconds? (that's what Date.toJSON() does). Though it probably won't be much of a difference, applications that understand milliseconds should probably be smart enough to digest a few extra digits.
    There is no specification for any of this anyway, so we might as well use the best prevision available. If that doesn't work for some use case, it's easy to employ a custom formatter with a different format.

@bcardiff
Copy link
Member

bcardiff commented Jun 6, 2018

  1. Ok
  2. I would prefer the default to be the option that will broadly work. So seconds without fraction. If needed later a 3339Nano formatter can be added. If a spec to ensure sub-second are drop is added definitely will be a plus

If you prefer me to made the changes in 2 let me know.

@straight-shoota
Copy link
Member Author

Done. To accomplish this, I added a fraction_digits parameter to RFC_3339.format which I think would be generally useful and could be expanded. But that's probably something for a subsequent PR because this needs proper thought.

@straight-shoota straight-shoota force-pushed the jm-time-formats-simplified branch from e569afc to cb2e4fa Compare June 6, 2018 20:07
Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that there are changes to time/format/formatter.cr and time/format/parser.cr, and I don't like that there are new methods just moneypatched in to the existing parser and formatter, instead of being inherited or stuff being put in a module.

# * **%3N**, **%L**: milliseconds, zero padded (000, 001, ..., 999)
# * **%6N**: microseconds, zero padded (000000, 000001, ..., 999999)
# * **%9N**: nanoseconds, zero padded (000000000, 000000001, ..., 999999999)
# * **%N**: second fraction, zero padded. (Same as `%9N` but may consume more than 9 digits while parsing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this doc change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps discoverability if the directives are lexically ordered. It's unrelated, though. If you like, I can make a separate PR for this.

@@ -23,6 +23,10 @@ struct Time::Format
io << time.year / 100
end

def full_or_short_year
year
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an alias?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser has a different implementation.

@@ -79,6 +83,12 @@ struct Time::Format
io << get_short_day_name.upcase
end

def short_day_name_with_comma?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the duplicate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@straight-shoota
Copy link
Member Author

straight-shoota commented Jun 7, 2018

@RX14 You mean more like #5084? That design has been abandoned for good, I hope. This is the best I can think of and I don't see any problems with monkey patching a few specific methods nor any benefits in extracting them to modules.
We could put the specialized methods directly into the formatter.cr/parser.cr types to avoid monkey patching, but I think having these implementations in separate files is better code organization. All pattern methods defined in a custom files are only used within that file and namespaces with the format name.

Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get why time/format/formatter.cr and time/format/parser.cr are changed. If these still have the same functionality, why do they suddenly have extra unused methods? If you're going tor refactor core time formatter stuff, it should at the very least be in a seperate commit.

# * **%m**: month number, zero padded (01, 02, ..., 12)
# * **%_m**: month number, blank padded (" 1", " 2", ..., "12")
# * **%-m**: month number (1, 2, ..., 12)
# * **%M**: minute, zero padded (00, 01, 02, ..., 59)
# * **%3N**, **%L**: milliseconds, zero padded (000, 001, ..., 999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate definition of L

@straight-shoota
Copy link
Member Author

They provide additional methods that are used by several specific formats. So there is added functionality for the core formatter/parser feature. Refactoring is actually in 0c242b0.

@RX14
Copy link
Contributor

RX14 commented Jun 9, 2018

Needs a rebase though

@bcardiff bcardiff force-pushed the jm-time-formats-simplified branch from cb2e4fa to fee9ec4 Compare June 9, 2018 17:00
@bcardiff
Copy link
Member

bcardiff commented Jun 9, 2018

@straight-shoota I rebase the changes already.

@bcardiff bcardiff merged commit 36647c8 into crystal-lang:master Jun 10, 2018
@bcardiff bcardiff added this to the 0.25.0 milestone Jun 10, 2018
@straight-shoota straight-shoota deleted the jm-time-formats-simplified branch June 10, 2018 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants