Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic TBI index creation #213

Closed
holtgrewe opened this issue Oct 24, 2023 · 1 comment
Closed

Nondeterministic TBI index creation #213

holtgrewe opened this issue Oct 24, 2023 · 1 comment
Assignees
Labels
csi enhancement New feature or request

Comments

@holtgrewe
Copy link
Contributor

holtgrewe commented Oct 24, 2023

I believe the use of HashMap in the following place in noodles-csi/src/index/reference_sequence.rs create nonderministic behaviour when creating TBI indices. I observe this when creating indices of the same .vcf.gz file multiple times and comparing the resulting binary index files.

/// A CSI reference sequence.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct ReferenceSequence {
    bins: HashMap<usize, Bin>,
    linear_index: Vec<bgzf::VirtualPosition>,
    metadata: Option<Metadata>,
}

Edit -- probably, indexmap::IndexMap would be a better choice here?

@zaeleus zaeleus added the csi label Oct 24, 2023
@zaeleus zaeleus self-assigned this Oct 24, 2023
@zaeleus zaeleus added the enhancement New feature or request label Oct 24, 2023
@zaeleus
Copy link
Owner

zaeleus commented Oct 24, 2023

Agreed, this can be changed to an ordered map to preserve insertion order, which will allow indices to be (re)serialized in the same way. Thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csi enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants