-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gunzip won't unzip a symlink #2
Comments
Interesting.
I did a little test:
[RichardsMacBook2021:Projects/Darwin/SAMPLES] rd% ln -s FOOBAR.gz ZZ.gz
[RichardsMacBook2021:Projects/Darwin/SAMPLES] rd% gunzip ZZ.gz
gunzip: ZZ.gz is not a regular file
[RichardsMacBook2021:Projects/Darwin/SAMPLES] rd% gzip -dc ZZ.gz > ZZ
So you are correct that gunzip will not uncompress a symlink, but you can write a temporary unzipped file with gzip -dc even from a symlink.
I prefer this solution anyway. Gene can write the temporary file into the local temp directory, which he has anyway for other operations, either /tmp by default or set by user. Then when he is done he can remove this file. That doesn’t touch the original. I suggest this pattern in all cases where he currently decompresses. It is not nice to decompress the input file in situ.
Richard
… On 1 Apr 2021, at 14:04, Bob Harris ***@***.***> wrote:
(This doesn't seem like it needs to be addressed in the short term, if at all).
I tried "FastK orange.fa.gz" where orange.fa.gz was a symlink. The result is
gunzip: ./orange.fa.gz is not a regular file
FastK: Cannot get stats for ./orange.fa
Apparently gunzip doesn't like symlinks, so it fails to create the unzipped file. The downstream code in FastK, I guess, doesn't notice that gunzip failed, but a later sanity check notices the unzipped file doesn't exist.
This would be an issue for the use case where the user has read access to a shared directory of gzipped read data or assemblies, but doesn't have write access. They can't give FastK a path to the original (because, I presume, it would try to write the unzipped file in that directory). Traditionally a symlink would be the 'right' solution, to avoid wasted disk space. But perhaps disk deduplication technology makes this less of an issue?
I suspect this is only an issue for gzip'd files. I assume for the other compressed formats you are able to decompress on-the-fly and don't need to write an uncompressed file.
Bob H
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2FXZQXRBQX3YBDA6OMKELTGRVNZANCNFSM42HB35TQ>.
|
Yep. In my own stuff I usually use gzip -dc and pipe it into the tool, so there's no file created. When the program expects a filename, this can still be accomplished with But those only give the program the uncompressed file as a read-once stream, anyway. |
Hmmm. Yes it's only an issue for gzip'd files an only the case when
there less
than 3. So you are hitting the jack pot for corner cases.
If there are more than 3 gzip files, then I assign a thread to each one
and stream the entire
file, uncompressing it as I go. But if there are just one or two gzip
files then I try to unzip
them first and then assign as many threads as specified on *parts* of
the uncompressed files.
I will look a bit deeper into this and try to fix it so that when I
can't do the unpack for some
reason, I just do the best I can with a single thread. If I may I will
put it on my todo list
and hopefully for now you are willing to work around this?
Thanks, Gene
…On 4/1/21, 3:04 PM, Bob Harris wrote:
(This /doesn't/ seem like it needs to be addressed in the short term,
if at all).
I tried "FastK orange.fa.gz" where orange.fa.gz was a symlink. The
result is
|gunzip: ./orange.fa.gz is not a regular file
FastK: Cannot get stats for ./orange.fa
|
Apparently gunzip doesn't like symlinks, so it fails to create the
unzipped file. The downstream code in FastK, I guess, doesn't notice
that gunzip failed, but a later sanity check notices the unzipped file
doesn't exist.
This would be an issue for the use case where the user has read access
to a shared directory of gzipped read data or assemblies, but doesn't
have write access. They can't give FastK a path to the original
(because, I presume, it would try to write the unzipped file in that
directory). Traditionally a symlink would be the 'right' solution, to
avoid wasted disk space. But perhaps disk deduplication technology
makes this less of an issue?
I suspect this is only an issue for gzip'd files. I assume for the
other compressed formats you are able to decompress on-the-fly and
don't need to write an uncompressed file.
Bob H
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINQLHTTQYGNF2ZQV2MTTGRVNZANCNFSM42HB35TQ>.
|
I'm absolutely willing to work around this! I only reported it so it would be a known issue. (See the first line in the first post). I'm probably hitting the edge cases because, as I get my feet wet with the package, I've been running small not-very-realistic test cases so I can quickly understand how the piece of the package fit together. I'm just about done with that now. |
Thanks Richard, you gave me enough insight to fix this problem. FastK now uses -c (I though -k would signal the Bob, Richard, let me know if there are any further problems. |
(This doesn't seem like it needs to be addressed in the short term, if at all).
I tried "FastK orange.fa.gz" where orange.fa.gz was a symlink. The result is
Apparently gunzip doesn't like symlinks, so it fails to create the unzipped file. The downstream code in FastK, I guess, doesn't notice that gunzip failed, but a later sanity check notices the unzipped file doesn't exist.
This would be an issue for the use case where the user has read access to a shared directory of gzipped read data or assemblies, but doesn't have write access. They can't give FastK a path to the original (because, I presume, it would try to write the unzipped file in that directory). Traditionally a symlink would be the 'right' solution, to avoid wasted disk space. But perhaps disk deduplication technology makes this less of an issue?
I suspect this is only an issue for gzip'd files. I assume for the other compressed formats you are able to decompress on-the-fly and don't need to write an uncompressed file.
Bob H
The text was updated successfully, but these errors were encountered: