Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search] Shapes are the wrong way around #2832

Open
MitraDarja opened this issue Sep 24, 2021 · 8 comments
Open

[Search] Shapes are the wrong way around #2832

MitraDarja opened this issue Sep 24, 2021 · 8 comments
Assignees
Labels
question a user question how to do certain things

Comments

@MitraDarja
Copy link
Contributor

MitraDarja commented Sep 24, 2021

I think the kmer hash view does not work with views, which have shapes in the form 1101 or 1011.

#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/search/views/kmer_hash.hpp>

int main()
{
    using namespace seqan3::literals;

    std::vector<seqan3::dna4> const text{"ACGTC"_dna4};
    auto const shape1011 = seqan3::shape{seqan3::bin_literal{0b1011}};
    auto const shape1101 = seqan3::shape{seqan3::bin_literal{0b1101}};

    seqan3::debug_stream << "shape1011\nExpected: [11,29]\nActual  : "
                         << (text | seqan3::views::kmer_hash(shape1011)) << "\n\n";

    seqan3::debug_stream << "shape1101\nExpected: [7,25]\nActual  : "
                         << (text | seqan3::views::kmer_hash(shape1101)) << '\n';
}
shape1011
Expected: [11,29]
Actual  : [7,25]

shape1101
Expected: [7,25]
Actual  : [11,29]

My assumption is that the shift does not work correctly with these shapes, the only shapes we tested so far are shapes of the sort 1(any number of zeros)1.

@MitraDarja MitraDarja added the bug faulty or wrong behaviour of code label Sep 24, 2021
@eseiler
Copy link
Member

eseiler commented Sep 24, 2021

The values seem correct to me?

1011
ACGTC
 1011

------------------------------

16 4  1           16 4  1 
A  G  T           C  T  C
0  2  3           1  3  1

------------------------------

0 + 8 + 3 = 11    16 + 12 + 1 = 29
1101
ACGTC
 1101

------------------------------

16 4  1           16 4  1 
A  C  T           C  G  T
0  1  3           1  2  1

------------------------------

0 + 4 + 3 = 7     16 + 8 + 1 = 25

@MitraDarja
Copy link
Contributor Author

@eseiler

Maybe, I misunderstand the shape, isn't 0b1101 1101 (so your second example) and we get the result of the first and vice visa?

@eseiler
Copy link
Member

eseiler commented Sep 24, 2021

Ok, I get it now that the snippet is correct :)

The problem is that this inherits from the dynamic_bitset (which is just a constexpr-capable version of std::bitset, and implements the same API).

This means that

seqan3::dynamic_bitset t1{0b1011'1000'1111};
seqan3::debug_stream << t1 << '\n'; // 1011'1000'1111
seqan3::debug_stream << t1[1] << '\n'; // 1

I.e., it prints left to right, but the indices go right to left (unless this is the actual bug, would need to check std::bitset).

So, we could just call this->flip() in the shape-ctor https://github.com/seqan/seqan3/blob/master/include/seqan3/search/kmer_index/shape.hpp#L105
but then the shape would be printed the wrong way around, except when we overload.

@MitraDarja
Copy link
Contributor Author

Sorry, for the wrong snippet, I got myself confused. 😅

Maybe, we mention this in the documentation?

@eseiler
Copy link
Member

eseiler commented Sep 28, 2021

I reopen this because that shapes are "the wrong way around" might be a problem :)

@eseiler eseiler reopened this Sep 28, 2021
@eseiler eseiler changed the title [BUG] Kmer_hash does not work with every shape Shapes are the wrong way around Sep 28, 2021
@eseiler eseiler added question a user question how to do certain things and removed bug faulty or wrong behaviour of code labels Sep 28, 2021
@SGSSGene
Copy link
Contributor

Maybe just adding some documentation would be enough? #2981

@eseiler
Copy link
Member

eseiler commented May 16, 2022

Maybe just adding some documentation would be enough? #2981

Yes, at the very least 👍 . We can talk in the meeting about it.
Either we find that this is enough, or we want some special behaviour that makes it behave like you would expect :D

Edit:

How do shapes work in SeqAn2?

@eseiler eseiler self-assigned this Oct 20, 2022
@eseiler eseiler changed the title Shapes are the wrong way around [Search] Shapes are the wrong way around Oct 20, 2022
@smehringer
Copy link
Member

Might be of interest when discussing this seqan/product_backlog#388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question a user question how to do certain things
Projects
None yet
Development

No branches or pull requests

4 participants