Import very slow #533

skjerns · 2019-06-28T07:51:10Z

I figured that the import of this library is very slow (500-1000ms on my system).
Given that the library probably less complex than e.g. numpy or similarly large packages: Do you think the import times can be optimized in some way?

The text was updated successfully, but these errors were encountered:

noviluni · 2020-05-20T14:01:04Z

Hi @skjerns , could you confirm that this is happening with the last version? I can't reproduce it.

noviluni · 2020-05-20T14:02:20Z

On the other side, the import is a little slower than other libraries because it initializes some parser to be faster when executing it, so I think there isn't too much we can do.

skjerns · 2020-05-20T17:21:36Z

@noviluni it's faster now (280ms), but I've also upgraded my PC, so I can't really tell ;-)

using python: 100ms (kind of acceptable)
using ipython: 280ms (this is still slower than scipy or numpy)

Maybe you could implement some lazy loading to the parsers? Or are they all needed?

noviluni · 2020-11-26T15:47:48Z

I tried with the last version of dateparser (1.0.0) and it takes less time than pandas, so I think it shouldn't be considered an issue.

I will close it. Feel free to comment or reopen it if you thing there is something that should be improved 🙂

skjerns · 2020-11-27T14:51:01Z

Yes! Seems to be much faster now :) thanks alot!

Natureshadow · 2023-07-03T19:39:09Z

dateparser 1.1.8 takes around 600ms to import here.

skjerns · 2023-07-13T08:47:49Z

dateparser 1.1.8 takes around 600ms to import here.

cannot replicate: Elapsed context: 275 ms on v1.1.8

Natureshadow · 2023-07-13T09:16:29Z

cannot replicate: Elapsed context: 275 ms

So, your system is faster than mine. 275ms import time is still unacceptable (basically everything >10ms on an average system is a bug)

…

On 13 July 2023 10:47:59 CEST, Simon Kern ***@***.***> wrote: > dateparser 1.1.8 takes around 600ms to import here. cannot replicate: `Elapsed context: 275 ms` -- Reply to this email directly or view it on GitHub: #533 (comment) You are receiving this because you commented. Message ID: ***@***.***>

beda42 · 2023-08-11T12:52:50Z

I have to agree that hundreds of milliseconds are a problem. For me the import is about 250 ms and about 200 ms from that is the generation of _search_regex_ignorecase and related intermediate values at the end of timezone_parser.py:

_search_regex_parts = []
_tz_offsets = list(build_tz_offsets(_search_regex_parts))
_search_regex = re.compile('|'.join(_search_regex_parts))
_search_regex_ignorecase = re.compile(
    '|'.join(_search_regex_parts), re.IGNORECASE)

(from that 140 ms is just _tz_offsets = list(build_tz_offsets(_search_regex_parts)), but that does not really matter).

From my point of view, it would be better not to precompute this list on import, but rather do so on first use of pop_tz_offset_from_string which is the only function using it. That way the import time would be significantly shortened and the overall running time of an app using the pop_tz_offset_from_string function won't be affected.

Would it be interesting for you if I attempted this change and sent a pull request?

beda42 · 2023-09-07T12:10:25Z

OK, I created an MR for this issue. It postpones the compilation of regexps in timezone_parser.py and reduces the import time to about 20 % of the original import time.

Any feedback is welcome.

this is different from pr scrapinghub#1181. that pr only makes import faster but still incurs cost on the first usage. this one leverages an optional cache. closes scrapinghub#533

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

Gallaecio added the performance label Jul 3, 2019

noviluni closed this as completed Nov 26, 2020

beda42 mentioned this issue Sep 7, 2023

postpone timezone regex evaluation until first use - shaves off time from package import #1181

Open

tobymao mentioned this issue Jan 30, 2025

feat: add caching for timezone offsets, significantly speeds up import #1250

Open

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025

feat: add caching for timezone offsets, significantly speeds up import

dbda38d

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025

feat: add caching for timezone offsets, significantly speeds up import

f423124

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025

feat: add caching for timezone offsets, significantly speeds up import

1683155

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025

feat: add caching for timezone offsets, significantly speeds up import

08ce4e2

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025

feat: add caching for timezone offsets, significantly speeds up import

3c1307b

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025

feat: add caching for timezone offsets, significantly speeds up import

3b98ab4

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025

feat: add caching for timezone offsets, significantly speeds up import

6b7d31e

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025

feat: add caching for timezone offsets, significantly speeds up import

1426199

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

tobymao added a commit to tobymao/dateparser that referenced this issue Feb 13, 2025

feat: add caching for timezone offsets, significantly speeds up import

1f6c7c6

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import very slow #533

Import very slow #533

skjerns commented Jun 28, 2019

noviluni commented May 20, 2020 •

edited

Loading

noviluni commented May 20, 2020

skjerns commented May 20, 2020 •

edited

Loading

noviluni commented Nov 26, 2020

skjerns commented Nov 27, 2020 •

edited

Loading

Natureshadow commented Jul 3, 2023

skjerns commented Jul 13, 2023 •

edited

Loading

Natureshadow commented Jul 13, 2023 via email

beda42 commented Aug 11, 2023

beda42 commented Sep 7, 2023

Import very slow #533

Import very slow #533

Comments

skjerns commented Jun 28, 2019

noviluni commented May 20, 2020 • edited Loading

noviluni commented May 20, 2020

skjerns commented May 20, 2020 • edited Loading

noviluni commented Nov 26, 2020

skjerns commented Nov 27, 2020 • edited Loading

Natureshadow commented Jul 3, 2023

skjerns commented Jul 13, 2023 • edited Loading

Natureshadow commented Jul 13, 2023 via email

beda42 commented Aug 11, 2023

beda42 commented Sep 7, 2023

noviluni commented May 20, 2020 •

edited

Loading

skjerns commented May 20, 2020 •

edited

Loading

skjerns commented Nov 27, 2020 •

edited

Loading

skjerns commented Jul 13, 2023 •

edited

Loading