Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Adding support for locale into our time parsing library #32977

Closed
michalpristas opened this issue May 10, 2024 · 3 comments
Closed
Labels
enhancement New feature or request pkg/ottl

Comments

@michalpristas
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

This is a followup to #32140

When timestamp is in non-english language normal golang parsing will fail.

example:
when the tool i'm observing is producing Italian logs in a form of mercoledì set 4 2024 parsing with a layout Monday Jan 2 2006 should yield 4th September 2024 correctly

Describe the solution you'd like

Similar support as implemented in https://github.com/ruang-guru/monday for our stanza time library

We could use CLDR data and generate mapping using ca-generic.json file from specified locale to English.

then use golang standard library to parse translated English date

Describe alternatives you've considered

No response

Additional context

No response

@michalpristas michalpristas added enhancement New feature or request needs triage New item requiring triage labels May 10, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@michalpristas michalpristas changed the title [pkg/ottl] Adding support for locale into our time library [pkg/ottl] Adding support for locale into our time parsing library May 10, 2024
@TylerHelmuth TylerHelmuth removed the needs triage New item requiring triage label May 10, 2024
@edmocosta
Copy link
Contributor

Hi there! may I get this issue assigned? I'm working on a lib for doing exactly this and could be submitting a PR soon.

TylerHelmuth pushed a commit that referenced this issue Sep 6, 2024
…34353)

**Description:** <Describe what has changed.>
Added support for localized time parsing into the
`coreinternal/timeutils` package.

The [tracking
issue](#32977)
is a following up to
#32140,
and the added function (`ParseLocalizedStrptime`) can be used later to
[add locale
support](#32978)
to the ottl `Time` function.

As discussed in the related issues, the plan is to have a similar
support as implemented by the library
[monday](https://github.com/goodsign/monday), making it possible to
parse non-english time strings into `time.Time` objects.
Elastic has built a new OSS library for that same purpose
([lunes](https://github.com/elastic/lunes)), that considering the
discussed requirements, seems to be a better option than `monday`. Both
libraries are just wrapper to the `time.Parse` and
`time.ParseInLocation` features. They work by translating the foreign
language value to English before calling the standard parsing functions.

The main `lunes` differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse
operations:

```
BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op
```

Given `ParseLocalizedStrptime` uses `lunes.Translate` under the hood,
its performance is similar to the existing `ParseStrptime`, adding an
extra overhead of ~303 ns/op for translating the value before parsing:

```
BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
```

```
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op
```

2 - Translations are based on the [CLDR](https://cldr.unicode.org/)
project, and it does support 900+ locales (vs 45+), including locales in
draft stage. Those lunes translations are
[generated](https://github.com/elastic/lunes/blob/main/generator.go)
from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard `time.format_test.go` test
cases, parsing foreign language values with and without layout
replacements in all available locales and supported layouts
(https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

**Link to tracking Issue:**
#32977

**Testing:** 
- Added unit tests

For now, the only way of manually testing this functionality is by
invoking the `ParseLocalizedStrptime` function manually through tests.

**Documentation:**
@edmocosta
Copy link
Contributor

@michalpristas @TylerHelmuth, I think we should close this issue as it was done by #34353. Thanks!

f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…pen-telemetry#34353)

**Description:** <Describe what has changed.>
Added support for localized time parsing into the
`coreinternal/timeutils` package.

The [tracking
issue](open-telemetry#32977)
is a following up to
open-telemetry#32140,
and the added function (`ParseLocalizedStrptime`) can be used later to
[add locale
support](open-telemetry#32978)
to the ottl `Time` function.

As discussed in the related issues, the plan is to have a similar
support as implemented by the library
[monday](https://github.com/goodsign/monday), making it possible to
parse non-english time strings into `time.Time` objects.
Elastic has built a new OSS library for that same purpose
([lunes](https://github.com/elastic/lunes)), that considering the
discussed requirements, seems to be a better option than `monday`. Both
libraries are just wrapper to the `time.Parse` and
`time.ParseInLocation` features. They work by translating the foreign
language value to English before calling the standard parsing functions.

The main `lunes` differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse
operations:

```
BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op
```

Given `ParseLocalizedStrptime` uses `lunes.Translate` under the hood,
its performance is similar to the existing `ParseStrptime`, adding an
extra overhead of ~303 ns/op for translating the value before parsing:

```
BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
```

```
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op
```

2 - Translations are based on the [CLDR](https://cldr.unicode.org/)
project, and it does support 900+ locales (vs 45+), including locales in
draft stage. Those lunes translations are
[generated](https://github.com/elastic/lunes/blob/main/generator.go)
from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard `time.format_test.go` test
cases, parsing foreign language values with and without layout
replacements in all available locales and supported layouts
(https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

**Link to tracking Issue:**
open-telemetry#32977

**Testing:** 
- Added unit tests

For now, the only way of manually testing this functionality is by
invoking the `ParseLocalizedStrptime` function manually through tests.

**Documentation:**
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…pen-telemetry#34353)

**Description:** <Describe what has changed.>
Added support for localized time parsing into the
`coreinternal/timeutils` package.

The [tracking
issue](open-telemetry#32977)
is a following up to
open-telemetry#32140,
and the added function (`ParseLocalizedStrptime`) can be used later to
[add locale
support](open-telemetry#32978)
to the ottl `Time` function.

As discussed in the related issues, the plan is to have a similar
support as implemented by the library
[monday](https://github.com/goodsign/monday), making it possible to
parse non-english time strings into `time.Time` objects.
Elastic has built a new OSS library for that same purpose
([lunes](https://github.com/elastic/lunes)), that considering the
discussed requirements, seems to be a better option than `monday`. Both
libraries are just wrapper to the `time.Parse` and
`time.ParseInLocation` features. They work by translating the foreign
language value to English before calling the standard parsing functions.

The main `lunes` differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse
operations:

```
BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op
```

Given `ParseLocalizedStrptime` uses `lunes.Translate` under the hood,
its performance is similar to the existing `ParseStrptime`, adding an
extra overhead of ~303 ns/op for translating the value before parsing:

```
BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
```

```
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op
```

2 - Translations are based on the [CLDR](https://cldr.unicode.org/)
project, and it does support 900+ locales (vs 45+), including locales in
draft stage. Those lunes translations are
[generated](https://github.com/elastic/lunes/blob/main/generator.go)
from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard `time.format_test.go` test
cases, parsing foreign language values with and without layout
replacements in all available locales and supported layouts
(https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

**Link to tracking Issue:**
open-telemetry#32977

**Testing:** 
- Added unit tests

For now, the only way of manually testing this functionality is by
invoking the `ParseLocalizedStrptime` function manually through tests.

**Documentation:**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl
Projects
None yet
Development

No branches or pull requests

3 participants