Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BCF code makes no attempt to account for endianness #355

Closed
jmarshall opened this issue Mar 16, 2016 · 23 comments · Fixed by #1023
Closed

BCF code makes no attempt to account for endianness #355

jmarshall opened this issue Mar 16, 2016 · 23 comments · Fixed by #1023

Comments

@jmarshall
Copy link
Member

jmarshall commented Mar 16, 2016

The vcf.c bcf_hdr_read() and bcf_hdr_write() functions work only on a little-endian host — hlen is read/written assuming that the host and on-disk representations are the same.

(I have not audited the other BCF code, e.g., record reading/writing.)

See also bcftools/vcfconcat.c's naive_concat() which similarly reads hlen without accounting for endianness.

@jmarshall
Copy link
Member Author

The particular hlen bugs previously identified appear to have been ameliorated by @daviesrob's 37c85e4 and 70622cf, which presumably were part of PR #459.

Has the BCF code been further audited to check whether BCF I/O is now free from endianness bugs?

@jkbonfield
Copy link
Contributor

We did have functioning big-endian variants of htslib back in 2014 (eg see comments in #99), but I don't know what happened to those PRs. I know I did a SPARC branch, which was subsequently replaced by Rob's complete rewrite, but I'm not sure either ever made it in as a completed work.

Realistically it all needs retesting to be sure.

@jmarshall
Copy link
Member Author

While the HTSlib code is improved (as previously noted: #355 (comment)), note for example over in BCFtools on current develop bcftools/vcfconcat.c's naive_concat() still reads hlen without accounting for endianness.

@jkbonfield
Copy link
Contributor

See https://github.com/daviesrob/htslib/tree/SPARC

I'm not sure why this work was never put in. I did a lot of work on this, subsequently followed up by Rob, but no PR ever got made. Odd.

@daviesrob
Copy link
Member

sam/bam/cram reading should be OK, but there are almost certainly issues with vcf/bcf.

Unfortunately my big-endian testing platform (a netbsd SPARC VM) lost its ability to talk to Github (maybe endian issues in some of the newer encryption code?) so I haven't been able to do much on this recently.

@jmarshall
Copy link
Member Author

Please don't get sidetracked at this time 😄

That comment was just to record that the BCFtools situation is unchanged, with trivial endianness infelicities.

@junaruga
Copy link
Contributor

Shall we add the big endian case arch: s390x to Travis to debug and maintain this kind of issue easily? The new feature was announced yesterday.
https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z

@daviesrob
Copy link
Member

It looks like a Travis alpha feature, so I'm not sure if we would want to do that at the moment (and it would make all our builds break due to the existing endianness issues). But it might be worth considering in the future.

@junaruga
Copy link
Contributor

it would make all our builds break due to the existing endianness issues

What do you think that the feature breaks all the (existing) builds?
I think that if we have the unit test to verify on the possible architectures, adding s390x case does not break existing test cases running on x86_64 (amd64) in Travis.

@daviesrob
Copy link
Member

It won't break the amd64 test cases, but it will break the build as a whole. Once the remaining issues have been tracked down and fixed it will be useful to turn this on so that they don't regress, but it would be rather distracting at the moment.

@junaruga
Copy link
Contributor

but it will break the build as a whole.

No, it does not break the build as a whole.
There are settings allow_failures option to ignore the result of the s390x test case, and fast_finish: true option that we can finish the build without waiting s390x case.

I sent a pull-request just in case.
#970

@junaruga
Copy link
Contributor

It looks like a Travis alpha feature, so I'm not sure if we would want to do that at the moment (and it would make all our builds break due to the existing endianness issues). But it might be worth considering in the future.

I verified the Travis multiple CPU architectures feature today, by using following repository.
This is a simple example for the feature.
https://github.com/junaruga/ci-multi-arch-native-test

So far, I did not see a big problem. The performance is slow. And I saw a job was rarely not started in Travis community.
https://travis-ci.community/c/environments/multi-cpu-arch

@junaruga
Copy link
Contributor

My behavior to suggest the feature was not inclusive but forcible. So I closed the PR.
I am sorry about it. I would like you do it at a timing when you like it in the future.

@mr-c
Copy link

mr-c commented Dec 16, 2019

I can confirm that the 1.10 release causes a regression for s390x https://buildd.debian.org/status/logs.php?pkg=htslib&arch=s390x

@mp15
Copy link
Member

mp15 commented Jan 29, 2020

We should probably consider having another look at this issue as it is now preventing 1.10 from progressing into Ubuntu Focal Fossa too. This indeed seems to be a regression from 1.9. I'm inclined to agree we should be testing this from time to time as we aren't noticing these breaks.

@jmarshall
Copy link
Member Author

The BCF code still doesn't work on big-endian hosts (for example, bcf_read1_core() makes no attempt to account for endianness) and never has done, so this is not a library regression per se. Presumably there has been a test added more recently (after 1.9 perhaps) that now exposes the failure — which could be rather impractically termed a test regression.

Samtools 1.10-3 had been stuck on Debian due to something downstream wanting to build on s390x but that has now been walloped and 1.10-3 is now in Debian testing apparently. Perhaps this means it will soon progress in Ubuntu too (I am not familiar with how closely Ubuntu packages track Debian packages these days…).

So the choices would be:

  1. Fix the remaining endianness bugs. Obviously the ideal and moral approach, but is anyone really motivated to spend time making these tools work on s390x? Does anyone actually run bioinformatics analyses on big-endian hosts?

  2. Have distro packagers disable htslib/samtools/bcftools on s390x and anything that depends on them (unfortunately that's a pretty long tail!). Maybe on big-endian targets, HTSlib should even fail at configure time (too harsh really as the BAM/CRAM code is in pretty good shape) or just unconditionally return -5 from bcf_read1_core() etc?

  3. Identify the failing test and comment it out, so distro packagers can continue to pretend everything is fine on s390x like they used to. 😄

@mr-c
Copy link

mr-c commented Jan 30, 2020

On Debian, the s390x build of htslib (and all packages that depend on htslib, and so forth) was removed from the testing distribution. We still attempt to build on s390x, but it no longer block migrations. So a version of option 2, though other distributions will have to put in similar work.

However, option 1 would be better.

@daviesrob
Copy link
Member

I've fired up a big-endian machine and will see how far I get.

(Currently that's not far - it failed while testing tabix...)

@jmarshall jmarshall changed the title BCF header code makes no attempt to account for endianness BCF code makes no attempt to account for endianness Jan 30, 2020
@LocutusOfBorg
Copy link

looks like after git bisect, I found that the first commit introducing the issue is
418a183

@jkbonfield
Copy link
Contributor

Interesting. There's lots of changes to have it does things like bcf_read1_core, but neither before nor after that commit is there any attempt at endianness correction. I suspect VCF support never worked on big endian systems, rather than being broken by that specific commit.

@daviesrob
Copy link
Member

See #1023 which sorts out all of the remaining issues I've found.

@nealef
Copy link

nealef commented May 27, 2020

In case this is of interest, I built from master on s390x today:

Number of tests:
    total   .. 154
    passed  .. 154
    failed  .. 0

@jkbonfield
Copy link
Contributor

Thanks for the independent confirmation. We now have s390x in our travis config too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants