-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] ArrowInvalid: cannot construct ChunkedArray from empty vector and omitted type #3633
Comments
Hi @nick-youngblut ! It's definitely the case that there aren't new rows to add -- given you're registering with
and your
didn't modify So that's one issue -- if you want to add more The other issue though -- you shoudn't be getting the I'll investigate. |
A third possible issue: append-mode ingest is for more data with all the same column schema. Does |
Thanks @johnkerl for the detailed feedback!
It does. Still, I will double check. I'll also do some investigation on my end. If you need the data, I could provide it, since the data is already published on the SRA (hence, the SRX accessions). |
@nick-youngblut yes, if it's not too much to ask, having access to the data would indeed be super-helpful 🙏 |
I haven't been able to reproduce this issue. I'm not sure why it happened, but it hasn't happened since. 🤷 |
I'm seeing the same issue (after fixing #3641) in a similar write-then-append scenario. All datasets have the same obs columns schema, and the same gene IDs (padded across datasets, introducing NaNs in X and var columns for the padded genes). I haven't been able to come up with minimal datasets yet, but will post here once that happens. |
Here are some updates: Using the code linked in the above PR (just re-written to ingest using Update: it worked for those two files, but appending others subset in the same way started failing with the ChunkedArray issue again. |
@cbrueffer Thanks! If we at TileDB can get an on-demand repro I think we can solve this pretty quickly ... 🤔 |
I think it's more likely there's a threshold being crossed (total number of bytes, say) wherein the number of chunks in a particular We and other customer have ingested a huge range of sizes of data over the years and yet I haven't see this particular symptom before so there must be something engagingly/puzzlingly corner-casey going on here ...... |
@nick-youngblut @cbrueffer can you please share your |
Thanks John, I'll keep trying to come up with a good reproduction case. My package versions:
|
Thanks @cbrueffer ! We'll keep trying for a repro as well |
I wasn't able to reproduce the issue, but I think I identified at least part of the issue. We are converting sufficiently large categorical pandas data to non-categorical, but then we need to convert it back when we append it to a The arrow array for the categorical column is also getting dropped in your case, and I wasn't able to reproduce that part, but I suspect it is caused by the casting to non-categorical and back again. I'm working on a fix to remove this round-trip, and hopefully that will fix the bug. |
That great news Julia! I can reproduce the issue, so I should be able to verify whether a fix works or not. I'm currently working on clearing the datasets in question internally so I may be able to share them soon. |
Hi @jp-dark , thanks for looking into this issue! Here are links to the two test files (both small, 1 cell x 40505 genes each): https://insilicoconsultingab-my.sharepoint.com/:u:/g/personal/christian_brueffer_insilico_consulting/EWqTilqnmEJOsON1tcl_DK8BrnA81zMx_jTWE8uvUgOXog?e=EBxWLo Using
|
Thanks @cbrueffer! I was able to reproduce the issue with this data. |
We merged in a fix that will go out with the next release. |
I am hoping for our 1.16.0 sometime next week. |
Thanks a lot everyone! |
Describe the bug
The error when running
tiledbsoma.io.from_anndata
:To Reproduce
Versions (please complete the following information):
Additional context
The data is ingested, but there is still the error. So, I have to use:
I'm wondering if the issue is due to the 2nd dataset gene set matching perfectly with the 1st gene set, so there are zero rows to add (empty vector).
The text was updated successfully, but these errors were encountered: