Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import very slow #533

Closed
skjerns opened this issue Jun 28, 2019 · 10 comments · May be fixed by #1250
Closed

Import very slow #533

skjerns opened this issue Jun 28, 2019 · 10 comments · May be fixed by #1250

Comments

@skjerns
Copy link

skjerns commented Jun 28, 2019

I figured that the import of this library is very slow (500-1000ms on my system).
Given that the library probably less complex than e.g. numpy or similarly large packages: Do you think the import times can be optimized in some way?

@noviluni
Copy link
Collaborator

noviluni commented May 20, 2020

Hi @skjerns , could you confirm that this is happening with the last version? I can't reproduce it.

@noviluni
Copy link
Collaborator

On the other side, the import is a little slower than other libraries because it initializes some parser to be faster when executing it, so I think there isn't too much we can do.

@skjerns
Copy link
Author

skjerns commented May 20, 2020

@noviluni it's faster now (280ms), but I've also upgraded my PC, so I can't really tell ;-)

using python: 100ms (kind of acceptable)
using ipython: 280ms (this is still slower than scipy or numpy)

Maybe you could implement some lazy loading to the parsers? Or are they all needed?

@noviluni
Copy link
Collaborator

I tried with the last version of dateparser (1.0.0) and it takes less time than pandas, so I think it shouldn't be considered an issue.

I will close it. Feel free to comment or reopen it if you thing there is something that should be improved 🙂

@skjerns
Copy link
Author

skjerns commented Nov 27, 2020

Yes! Seems to be much faster now :) thanks alot!

@Natureshadow
Copy link

dateparser 1.1.8 takes around 600ms to import here.

@skjerns
Copy link
Author

skjerns commented Jul 13, 2023

dateparser 1.1.8 takes around 600ms to import here.

cannot replicate: Elapsed context: 275 ms on v1.1.8

@Natureshadow
Copy link

Natureshadow commented Jul 13, 2023 via email

@beda42
Copy link

beda42 commented Aug 11, 2023

I have to agree that hundreds of milliseconds are a problem. For me the import is about 250 ms and about 200 ms from that is the generation of _search_regex_ignorecase and related intermediate values at the end of timezone_parser.py:

_search_regex_parts = []
_tz_offsets = list(build_tz_offsets(_search_regex_parts))
_search_regex = re.compile('|'.join(_search_regex_parts))
_search_regex_ignorecase = re.compile(
    '|'.join(_search_regex_parts), re.IGNORECASE)

(from that 140 ms is just _tz_offsets = list(build_tz_offsets(_search_regex_parts)), but that does not really matter).

From my point of view, it would be better not to precompute this list on import, but rather do so on first use of pop_tz_offset_from_string which is the only function using it. That way the import time would be significantly shortened and the overall running time of an app using the pop_tz_offset_from_string function won't be affected.

Would it be interesting for you if I attempted this change and sent a pull request?

@beda42
Copy link

beda42 commented Sep 7, 2023

OK, I created an MR for this issue. It postpones the compilation of regexps in timezone_parser.py and reduces the import time to about 20 % of the original import time.

Any feedback is welcome.

tobymao added a commit to tobymao/dateparser that referenced this issue Jan 30, 2025
this is different from pr scrapinghub#1181. that pr only makes import faster but
still incurs cost on the first usage. this one leverages an optional
cache.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Jan 30, 2025
this is different from pr scrapinghub#1181. that pr only makes import faster but
still incurs cost on the first usage. this one leverages an optional
cache.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Jan 31, 2025
this is different from pr scrapinghub#1181. that pr only makes import faster but
still incurs cost on the first usage. this one leverages an optional
cache.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 5, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 10, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
tobymao added a commit to tobymao/dateparser that referenced this issue Feb 13, 2025
this is different from pr scrapinghub#1181. it builds a cache at install time which
can be distributed.

closes scrapinghub#533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants