[BUG] Querying a string column is very slow #2972

moonvalley-matt · 2024-10-18T23:57:19Z

Severity

P1 - Urgent, but non-breaking

Current Behavior

I have a dataset of ~1M rows that has a column of np.str_ in the metadata. It takes 4 seconds / 1000 records to load this column, while it takes seconds for 1,000,000 records for integer columns.

Steps to Reproduce

Create a dataset of 1,000,000 rows with a metadata of a mixture of strings and integers.

Expected/Desired Behavior

Strings should load approximately as fast as integers, otherwise are there other recommendations? Trying to understand the nature of the problem

Python Version

No response

OS

No response

IDE

No response

Packages

No response

Additional Context

No response

Possible Solution

No response

Are you willing to submit a PR?

I'm willing to submit a PR (Thank you!)

davidbuniat · 2024-10-25T16:29:27Z

Thanks @moonvalley-matt, I believe also reported by our users, if you change to htype=="text", then speed should be much faster.

@levonohanyan is looking into making the performance uniformly fast across all string types.

levonohanyan · 2024-10-26T18:05:12Z

Hi @moonvalley-matt,

Seems the issue is not generally reproducible and depends on the specific version of deeplake, python or numpy. Can you please provide more details about the versions you used. If there’s a reproducible script that’d be better.

Regards,
Levon

moonvalley-matt added the bug Something isn't working label Oct 18, 2024

davidbuniat assigned levonohanyan Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Querying a string column is very slow #2972

[BUG] Querying a string column is very slow #2972

moonvalley-matt commented Oct 18, 2024

davidbuniat commented Oct 25, 2024

levonohanyan commented Oct 26, 2024

[BUG] Querying a string column is very slow #2972

[BUG] Querying a string column is very slow #2972

Comments

moonvalley-matt commented Oct 18, 2024

Severity

Current Behavior

Steps to Reproduce

Expected/Desired Behavior

Python Version

OS

IDE

Packages

Additional Context

Possible Solution

Are you willing to submit a PR?

davidbuniat commented Oct 25, 2024

levonohanyan commented Oct 26, 2024