Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Querying a string column is very slow #2972

Open
1 task done
moonvalley-matt opened this issue Oct 18, 2024 · 2 comments
Open
1 task done

[BUG] Querying a string column is very slow #2972

moonvalley-matt opened this issue Oct 18, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@moonvalley-matt
Copy link

Severity

P1 - Urgent, but non-breaking

Current Behavior

I have a dataset of ~1M rows that has a column of np.str_ in the metadata. It takes 4 seconds / 1000 records to load this column, while it takes seconds for 1,000,000 records for integer columns.

Steps to Reproduce

Create a dataset of 1,000,000 rows with a metadata of a mixture of strings and integers.

Expected/Desired Behavior

Strings should load approximately as fast as integers, otherwise are there other recommendations? Trying to understand the nature of the problem

Python Version

No response

OS

No response

IDE

No response

Packages

No response

Additional Context

No response

Possible Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR (Thank you!)
@moonvalley-matt moonvalley-matt added the bug Something isn't working label Oct 18, 2024
@davidbuniat
Copy link
Member

Thanks @moonvalley-matt, I believe also reported by our users, if you change to htype=="text", then speed should be much faster.

@levonohanyan is looking into making the performance uniformly fast across all string types.

@levonohanyan
Copy link

Hi @moonvalley-matt,

Seems the issue is not generally reproducible and depends on the specific version of deeplake, python or numpy. Can you please provide more details about the versions you used. If there’s a reproducible script that’d be better.

Regards,
Levon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants