-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling tabix on S3 files via jupyter notebook fails #1625
Comments
The "Resource temporarily unavailable" message comes from You could try running Otherwise, if I were debugging this I'd attempt to find out what's going on at the point where |
Did you manage to get any more information on this problem? |
Hi, I tried increasing the verbosity, but I don't think there is much that's helpful there, the connection just seems to close (I replaced sensitive info and hashes by If it's indeed not much help, I can go on adding tracers and recompiling tabix, but this is likely to take a long time on my side because I have to juggle between several projects at the moment.
|
Unfortunately there's not much to go on there. It does look like the connection got closed before the "Invalid BGZF header" message was printed, but it's not clear why. It looks like it hangs up very quickly - it that the case? One other mystery is that the request said:
but the error shows offset 3565615. Where did the 677 go? Was it lost when copying the messages over to GitHub? Is the version of libcurl you're using the same when running on the command line and in the Jupyter notebook? 7.58.0 is quite old now, it might be worth trying a more up-to-date version. Apart from that, I fear you may have to resort to using |
Hopefully this was fixed by #1676. Please add a comment if you're still having problems. |
I am experiencing a very peculiar issue. I am using jupyter notebook to test some R code that calls tabix to query some files stored in S3 buckets.
The following command is an example of what I would be running on the bash command line:
The above command works. Now wrapping it in R (4.1.2):
This also works.
But now, using the exact same code in a jupyter notebook running IRKernel crashes. The log of the kernel reports the following:
I am querying 100s of files but this happens only for a handful of them, systematically, for seemingly any region. I tried reindexing the file, the bug still occurs. I tried with htslib 1.12 and the latest 1.17, same thing.
This is probably going to be difficult to debug because there are many moving parts: tabix, R, and jupyter. Just checking if you have an idea why specifically calling something from jupyter would cause the error above? I cannot send the tsv files as they are confidential (and very large) but I am happy to do more exploration/debugging if you tell me what to do.
may be related to #1037
TIA!
The text was updated successfully, but these errors were encountered: