Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust Date/Time format patterns parsing #7826

Merged
merged 79 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
35ccbed
Adding structure for date time format
radeusgd Sep 12, 2023
dc0dec4
updates
radeusgd Sep 12, 2023
4aa2d62
clarifications
radeusgd Sep 13, 2023
68dfa25
improvements
radeusgd Sep 13, 2023
265c9f5
improvements 2
radeusgd Sep 13, 2023
51913c9
initial tokenizer implementation
radeusgd Sep 13, 2023
dbf5233
make it compile
radeusgd Sep 13, 2023
9ecf7ee
updates to doc
radeusgd Sep 13, 2023
c8cefa3
updates 2
radeusgd Sep 13, 2023
5f3ac78
checkpoint
radeusgd Sep 14, 2023
5ec972e
Implemented the parser
radeusgd Sep 14, 2023
f3f206e
Interpret IR as DateTimeFormatter using a builder (some stuff WIP), i…
radeusgd Sep 14, 2023
cc40663
export
radeusgd Sep 15, 2023
f6663ae
mark stuff as PRIVATE
radeusgd Sep 15, 2023
2ef283d
make_formatter does not have to be in Core
radeusgd Sep 15, 2023
2b767b8
mark make_formatter as deprecated
radeusgd Sep 15, 2023
db50444
checkpoint - adding builtin formats
radeusgd Sep 15, 2023
73a4447
checkpoint - reworked Formatter structure, got Date parsing using new…
radeusgd Sep 16, 2023
66cfa8c
fixing imports, compile errors, updating widgets WIP
radeusgd Sep 16, 2023
d78624c
checkpoint: fixes, widgets
radeusgd Sep 16, 2023
85ac640
checkpoint - widgets show up, but some weird issue with truncating me…
radeusgd Sep 18, 2023
a0737d7
Simple temporary workaround for https://github.com/enso-org/enso/issu…
radeusgd Sep 18, 2023
47c96c2
new format for Time_Of_Day
radeusgd Sep 18, 2023
aa10183
fixing imports
radeusgd Sep 18, 2023
1e0677d
make AM/PM actually textual
radeusgd Sep 18, 2023
221d9ea
add `locale` to `from_java`, add examples with locale on `from Text`
radeusgd Sep 18, 2023
31e27b0
update example after discussion: avoiding confusing `a`
radeusgd Sep 18, 2023
42233ba
New types, widgets for Data_Formatter.enso
radeusgd Sep 19, 2023
5340b39
fix: we were catching Any panic, but calling `getMessage` on it - tha…
radeusgd Sep 19, 2023
6678193
migrate underlying formatters/parsers to Date_Time_Formatter (EnsoDat…
radeusgd Sep 19, 2023
8ed22c0
update docs of parse
radeusgd Sep 19, 2023
c66a6da
update docs of format
radeusgd Sep 19, 2023
e9d6c03
update Text extensions for date parsing
radeusgd Sep 19, 2023
ce3de17
fixing tests
radeusgd Sep 19, 2023
79aa370
more fixes
radeusgd Sep 19, 2023
a2e74a4
more fixes 2
radeusgd Sep 19, 2023
7be1770
resolve Date_Time_Formatter conversion at Text Extensions definition …
radeusgd Sep 20, 2023
c917b2e
Make default format without 'T' but keep it flexible on parsing
radeusgd Sep 20, 2023
0ee8472
fixing Data_Formatter tests and some others too
radeusgd Sep 20, 2023
2a73ff6
updating Column.format
radeusgd Sep 20, 2023
c1a46a8
fix frame offset in Test lib
radeusgd Sep 20, 2023
2ff78ab
fixing Column.format
radeusgd Sep 20, 2023
13cfd3c
fix some tests
radeusgd Sep 20, 2023
b31968c
change default formatter
radeusgd Sep 20, 2023
2e0fa48
change default formatter pt. 2 (to_text)
radeusgd Sep 20, 2023
a837e1c
update test
radeusgd Sep 20, 2023
9e69c8c
cleanup
radeusgd Sep 20, 2023
5404c53
cleanup 2
radeusgd Sep 20, 2023
e9f665a
fixes - edge cases
radeusgd Sep 20, 2023
2dad866
javafmtAll
radeusgd Sep 20, 2023
a4c9fcd
remove separate conversions file - obsolete since
radeusgd Sep 20, 2023
76db70c
testing various new edge cases - parsing formats, customizing
radeusgd Sep 20, 2023
5ed449c
Conversions from polyglot symbols seem to not be allowed?
radeusgd Sep 20, 2023
91ee52a
fixes
radeusgd Sep 20, 2023
eb25515
fixes 2
radeusgd Sep 20, 2023
0945fb6
Merge branch 'develop' into wip/radeusgd/7461-safer-date-format-parser
radeusgd Sep 21, 2023
eaf8404
Remove workaround for #7824 - once #7845 got merged it is no longer n…
radeusgd Sep 21, 2023
74370c7
fix
radeusgd Sep 21, 2023
8d1c125
CR1
radeusgd Sep 21, 2023
fd65d09
CR2
radeusgd Sep 21, 2023
55aed40
a few more tests
radeusgd Sep 21, 2023
fb548a6
CR3
radeusgd Sep 21, 2023
df49e12
CR4
radeusgd Sep 21, 2023
ab74d6a
CR5
radeusgd Sep 21, 2023
e837fe2
CR6
radeusgd Sep 21, 2023
f597ecd
changelog
radeusgd Sep 21, 2023
b1f7332
fix test: default locale uses short day names even for long form...
radeusgd Sep 21, 2023
6e75815
add tests for week-based year (TODO), fix obsolete docs
radeusgd Sep 21, 2023
b9ec35e
move )
radeusgd Sep 21, 2023
fd9b867
Improving zone offset customization
radeusgd Sep 21, 2023
95a9897
add a test
radeusgd Sep 21, 2023
12f149c
Merge branch 'develop' into wip/radeusgd/7461-safer-date-format-parser
radeusgd Sep 21, 2023
c982793
Fix parsing year-week without day (defaulting to Monday)
radeusgd Sep 21, 2023
be58e5c
fix a test
radeusgd Sep 21, 2023
a21b94e
fix another test
radeusgd Sep 21, 2023
1d27c79
fix
radeusgd Sep 21, 2023
5ac1258
fix parsing quarters
radeusgd Sep 21, 2023
b3efabc
fix offset docs and test
radeusgd Sep 22, 2023
58a48f0
Merge branch 'develop' into wip/radeusgd/7461-safer-date-format-parser
radeusgd Sep 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import project.Data.Text.Text_Sub_Range.Codepoint_Ranges
import project.Data.Text.Text_Sub_Range.Text_Sub_Range
import project.Data.Time.Date.Date
import project.Data.Time.Date_Time.Date_Time
import project.Data.Time.Date_Time_Formatter.Date_Time_Formatter
import project.Data.Time.Time_Of_Day.Time_Of_Day
import project.Data.Time.Time_Zone.Time_Zone
import project.Data.Vector.Vector
Expand Down Expand Up @@ -1473,17 +1474,10 @@ Text.parse_json self = Json.parse self

Converts text containing a date into a Date object.

Arguments:
- format: An optional format describing how to parse the text.

Returns a `Time_Error` if `self`` cannot be parsed using the provided
`format`.
This method will return a `Time_Error` if the provided time cannot be parsed.

? Format Syntax
A custom format string consists of one or more custom date and time format
specifiers. For example, "d MMM yyyy" will format "2011-12-03" as
"3 Dec 2011". See https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/time/format/DateTimeFormatter.html
for a complete format specification.
Arguments:
- format: The format to use for parsing the input text.

? Default Date Formatting
Unless you provide a custom format, the text must represent a valid date
Expand All @@ -1500,6 +1494,34 @@ Text.parse_json self = Json.parse self
- Two digits for the day-of-month. This is pre-padded by zero to ensure two
digits.

? Pattern Syntax
If the pattern is provided as `Text`, it is parsed using the format
described below. See `Date_Time_Formatter` for more options.
- y: Year. The number of pattern letters determines the minimum number of
digits.
- y: The year using any number of digits.
- yy: The year, using at most two digits. The default range is
1950-2049, but this can be changed by including the end year in
braces e.g. `yy{2099}`.
- yyyy: The year, using exactly four digits.
- M: Month of year. The number of pattern letters determines the format:
- M: Any number (1-12).
- MM: Month number with zero padding required (01-12).
- MMM: Short name of the month (Jan-Dec).
- MMMM: Full name of the month (January-December).
The month names depend on the selected locale.
- d: Day. The number of pattern letters determines the format:
- d: Any number (1-31).
- dd: Day number with zero padding required (01-31).
- ddd: Short name of the day of week (Mon-Sun).
- dddd: Full name of the day of week (Monday-Sunday).
The weekday names depend on the selected locale.
Both day of week and day of month may be included in a single pattern -
in such case the day of week is used as a sanity check.
- Q: Quarter of year.
If only year and quarter are provided in the pattern, when parsing a
date, the result will be the first day of that quarter.

> Example
Parse the date of 23rd December 2020.

Expand Down Expand Up @@ -1533,32 +1555,27 @@ Text.parse_json self = Json.parse self
date = "1999-1-1".parse_date "yyyy-MM-dd"
date.catch Time_Error (_->Date.new 2000 1 1)
@format make_date_format_selector
@locale Locale.default_widget
Text.parse_date : Text -> Locale -> Date ! Time_Error
Text.parse_date self format:Text="" locale:Locale=Locale.default = Date.parse self format locale
Text.parse_date : Date_Time_Formatter -> Date ! Time_Error
Text.parse_date self format:Date_Time_Formatter=Date_Time_Formatter.iso_date =
Date.parse self format

## ALIAS date_time from text
GROUP Conversions

Obtains an instance of `Date_Time` from a text such as
"2007-12-03T10:15:30+01:00 Europe/Paris".

This method will return a `Time_Error` if the provided time cannot be parsed.

Arguments:
- format: The format to use for parsing the input text.
- locale: The locale in which the format should be interpreted.

? Format Syntax
A custom format string consists of one or more custom date and time format
specifiers. For example, "d MMM yyyy" will format "2011-12-03" as
"3 Dec 2011". See https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/time/format/DateTimeFormatter.html
for a complete format specification.

? Default Date_Time Format
The text must represent a valid date-time as defined by the ISO-8601
format. (See https://en.wikipedia.org/wiki/ISO_8601.) If a time zone is
present, it must be in the ISO-8601 Extended Date/Time Format (EDTF).
(See https://en.wikipedia.org/wiki/ISO_8601#EDTF.) The time zone format
consists of:
Unless you provide a custom format, the text must represent a valid
date-time as defined by the ISO-8601 format (see https://en.wikipedia.org/wiki/ISO_8601).
If a time zone is present, it must be in the ISO-8601 Extended Date/Time
Format (EDTF) (see https://en.wikipedia.org/wiki/ISO_8601#EDTF). The time
zone format consists of:

- The ISO offset date time.
- If the zone ID is not available or is a zone offset then the format is
Expand All @@ -1568,8 +1585,45 @@ Text.parse_date self format:Text="" locale:Locale=Locale.default = Date.parse se
sensitive.
- A close square bracket ']'.

This method will return a `Time_Error` if the provided time cannot be parsed
using the above format.
? Pattern Syntax
If the pattern is provided as `Text`, it is parsed using the format
described below. See `Date_Time_Formatter` for more options.
- y: Year. The number of pattern letters determines the minimum number of
digits.
- y: The year using any number of digits.
- yy: The year, using at most two digits. The default range is
1950-2049, but this can be changed by including the end year in
braces e.g. `yy{2099}`.
- yyyy: The year, using exactly four digits.
- M: Month of year. The number of pattern letters determines the format:
- M: Any number (1-12).
- MM: Month number with zero padding required (01-12).
- MMM: Short name of the month (Jan-Dec).
- MMMM: Full name of the month (January-December).
The month names depend on the selected locale.
- d: Day. The number of pattern letters determines the format:
- d: Any number (1-31).
- dd: Day number with zero padding required (01-31).
- ddd: Short name of the day of week (Mon-Sun).
- dddd: Full name of the day of week (Monday-Sunday).
The weekday names depend on the selected locale.
Both day of week and day of month may be included in a single pattern -
in such case the day of week is used as a sanity check.
- Q: Quarter of year.
If only year and quarter are provided in the pattern, when parsing a
date, the result will be the first day of that quarter.
- H: 24h hour of day (0-23).
- h: 12h hour of day (0-12). The `a` pattern is needed to disambiguate
between AM and PM.
- m: Minute of hour.
- s: Second of minute.
- f: Fractional part of the second. The number of pattern letters
determines the number of digits. If one letter is used, any number of
digits will be accepted.
- a: AM/PM marker.
- T: If repeated 3 or less times - Time zone ID (e.g. Europe/Warsaw, Z,
-08:30), otherwise - Time zone name (e.g. Central European Time, CET).
- Z: Zone offset (e.g. +0000, -0830, +08:30:15).

> Example
Parse UTC time.
Expand Down Expand Up @@ -1621,31 +1675,24 @@ Text.parse_date self format:Text="" locale:Locale=Locale.default = Date.parse se
example_parse =
"06 of May 2020 at 04:30AM".parse_date_time "dd 'of' MMMM yyyy 'at' hh:mma"
@format make_date_time_format_selector
@locale Locale.default_widget
Text.parse_date_time : Text -> Locale -> Date_Time ! Time_Error
Text.parse_date_time self format:Text="" locale:Locale=Locale.default = Date_Time.parse self format locale
Text.parse_date_time : Date_Time_Formatter -> Date_Time ! Time_Error
Text.parse_date_time self format:Date_Time_Formatter=Date_Time_Formatter.default_enso_zoned_date_time =
Date_Time.parse self format

## ALIAS time_of_day from text, to_time_of_day
GROUP Conversions

Obtains an instance of `Time_Of_Day` from a text such as "10:15".

This method will return a `Time_Error` if the provided time cannot be parsed.

Arguments:
- format: The format to use for parsing the input text.
- locale: The locale in which the format should be interpreted.

Returns a `Time_Error` if the provided text cannot be parsed using the
default format.

? Format Syntax
A custom format string consists of one or more custom date and time format
specifiers. For example, "d MMM yyyy" will format "2011-12-03" as
"3 Dec 2011". See https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/time/format/DateTimeFormatter.html
for a complete format specification.

? Default Time Format
The text must represent a valid time and is parsed using the ISO-8601
extended local time format. The format consists of:
Unless you provide a custom format, the text must represent a valid time
and is parsed using the ISO-8601 extended local time format.
The format consists of:

- Two digits for the hour-of-day. This is pre-padded by zero to ensure two
digits.
Expand All @@ -1662,6 +1709,19 @@ Text.parse_date_time self format:Text="" locale:Locale=Locale.default = Date_Tim
- One to nine digits for the nano-of-second. As many digits will be output
as required.

? Pattern Syntax
If the pattern is provided as `Text`, it is parsed using the format
described below. See `Date_Time_Formatter` for more options.
- H: 24h hour of day (0-23).
- h: 12h hour of day (0-12). The `a` pattern is needed to disambiguate
between AM and PM.
- m: Minute of hour.
- s: Second of minute.
- f: Fractional part of the second. The number of pattern letters
determines the number of digits. If one letter is used, any number of
digits will be accepted.
- a: AM/PM marker.

> Example
Get the time 15:05:30.

Expand Down Expand Up @@ -1692,9 +1752,9 @@ Text.parse_date_time self format:Text="" locale:Locale=Locale.default = Date_Tim

example_parse = "4:30AM".parse_time_of_day "h:mma"
@format make_time_format_selector
@locale Locale.default_widget
Text.parse_time_of_day : Text -> Locale -> Time_Of_Day ! Time_Error
Text.parse_time_of_day self format:Text="" locale:Locale=Locale.default = Time_Of_Day.parse self format locale
Text.parse_time_of_day : Date_Time_Formatter -> Time_Of_Day ! Time_Error
Text.parse_time_of_day self format:Date_Time_Formatter=Date_Time_Formatter.iso_time =
Time_Of_Day.parse self format

## ALIAS time_zone from text, to_time_zone
GROUP Conversions
Expand Down
104 changes: 69 additions & 35 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Time/Date.enso
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import project.Data.Text.Text
import project.Data.Time.Date_Period.Date_Period
import project.Data.Time.Date_Range.Date_Range
import project.Data.Time.Date_Time.Date_Time
import project.Data.Time.Date_Time_Formatter.Date_Time_Formatter
import project.Data.Time.Day_Of_Week.Day_Of_Week
import project.Data.Time.Day_Of_Week_From
import project.Data.Time.Duration.Duration
Expand Down Expand Up @@ -105,17 +106,11 @@ type Date

Arguments:
- text: The text to try and parse as a date.
- pattern: An optional pattern describing how to parse the text.
- locale: The locale in which the pattern should be interpreted.
- format: A pattern describing how to parse the text,
or a `Date_Time_Formatter`.

Returns a `Time_Error` if the provided `text` cannot be parsed using the
provided `pattern`.

? Pattern Syntax
A custom pattern string consists of one or more custom date and time
format specifiers. For example, "d MMM yyyy" will format "2011-12-03"
as "3 Dec 2011". See https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/time/format/DateTimeFormatter.html
for a complete format specification.
provided `format`.

? Default Date Formatting
Unless you provide a custom format, the text must represent a valid date
Expand All @@ -132,6 +127,34 @@ type Date
- Two digits for the day-of-month. This is pre-padded by zero to ensure two
digits.

? Pattern Syntax
If the pattern is provided as `Text`, it is parsed using the format
described below. See `Date_Time_Formatter` for more options.
- y: Year. The number of pattern letters determines the minimum number of
digits.
- y: The year using any number of digits.
- yy: The year, using at most two digits. The default range is
1950-2049, but this can be changed by including the end year in
braces e.g. `yy{2099}`.
- yyyy: The year, using exactly four digits.
- M: Month of year. The number of pattern letters determines the format:
- M: Any number (1-12).
- MM: Month number with zero padding required (01-12).
- MMM: Short name of the month (Jan-Dec).
- MMMM: Full name of the month (January-December).
The month names depend on the selected locale.
- d: Day. The number of pattern letters determines the format:
- d: Any number (1-31).
- dd: Day number with zero padding required (01-31).
- ddd: Short name of the day of week (Mon-Sun).
- dddd: Full name of the day of week (Monday-Sunday).
The weekday names depend on the selected locale.
Both day of week and day of month may be included in a single pattern -
in such case the day of week is used as a sanity check.
- Q: Quarter of year.
If only year and quarter are provided in the pattern, when parsing a
date, the result will be the first day of that quarter.

> Example
Parse the date of 23rd December 2020.

Expand Down Expand Up @@ -164,17 +187,10 @@ type Date
example_parse_err =
date = Date.parse "1999-1-1" "yyyy-MM-dd"
date.catch Time_Error (_->Date.new 2000 1 1)
@pattern make_date_format_selector
@locale Locale.default_widget
parse : Text -> Text -> Locale -> Date ! Time_Error
parse text:Text pattern:Text="" locale:Locale=Locale.default =
result = Panic.recover Any <|
formatter = if pattern.is_empty then Time_Utils.default_date_formatter else
Time_Utils.make_formatter pattern locale.java_locale
Time_Utils.parse_date text.trim formatter
result . map_error <| case _ of
err : JException -> Time_Error.Error err.getMessage
ex -> ex
@format make_date_format_selector
parse : Text -> Date_Time_Formatter -> Date ! Time_Error
parse text:Text format:Date_Time_Formatter=Date_Time_Formatter.iso_date =
format.parse_date text

## GROUP Metadata
Get the year field.
Expand Down Expand Up @@ -709,15 +725,36 @@ type Date
Format this date using the provided format specifier.

Arguments:
- pattern: The text specifying the format for formatting the date.
- locale: The locale in which the format should be interpreted.
(Defaults to Locale.default.)
- format: A pattern describing how to format the text,
or a `Date_Time_Formatter`.

? Pattern Syntax
A custom pattern string consists of one or more custom date and time
format specifiers. For example, "d MMM yyyy" will format "2011-12-03"
as "3 Dec 2011". See https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/time/format/DateTimeFormatter.html
for a complete format specification.
If the pattern is provided as `Text`, it is parsed using the format
described below. See `Date_Time_Formatter` for more options.
- y: Year. The number of pattern letters determines the minimum number of
digits.
- y: The year using any number of digits.
- yy: The year, using at most two digits. The default range is
1950-2049, but this can be changed by including the end year in
braces e.g. `yy{2099}`.
- yyyy: The year, using exactly four digits.
- M: Month of year. The number of pattern letters determines the format:
- M: Any number (1-12).
- MM: Month number with zero padding required (01-12).
- MMM: Short name of the month (Jan-Dec).
- MMMM: Full name of the month (January-December).
The month names depend on the selected locale.
- d: Day. The number of pattern letters determines the format:
- d: Any number (1-31).
- dd: Day number with zero padding required (01-31).
- ddd: Short name of the day of week (Mon-Sun).
- dddd: Full name of the day of week (Monday-Sunday).
The weekday names depend on the selected locale.
Both day of week and day of month may be included in a single pattern -
in such case the day of week is used as a sanity check.
- Q: Quarter of year.
If only year and quarter are provided in the pattern, when parsing a
date, the result will be the first day of that quarter.

> Example
Format "2020-06-02" as "2 Jun 2020"
Expand Down Expand Up @@ -749,14 +786,11 @@ type Date
> Example
Format "2020-06-21" with French locale as "21. juin 2020"

example_format = Date.new 2020 6 21 . format "d. MMMM yyyy" (Locale.new "fr")
@pattern (value-> make_date_format_selector value)
@locale Locale.default_widget
format : Text -> Locale -> Text
format self pattern:Text locale=Locale.default =
formatter = if pattern.is_empty then Time_Utils.default_date_formatter else
Time_Utils.make_formatter pattern locale.java_locale
Time_Utils.date_format self formatter
example_format = Date.new 2020 6 21 . format (Date_Time_Formatter.from "d. MMMM yyyy" (Locale.new "fr"))
@format (value-> make_date_format_selector value)
format : Date_Time_Formatter -> Text
format self format:Date_Time_Formatter=Date_Time_Formatter.iso_date =
format.format_date self

## PRIVATE
week_days_between start end =
Expand Down
Loading