Duplicated Sequence for MJN #87

Chatchamew · 2024-03-11T15:50:23Z

Excuse me, Mr. Emmanuel. I am trying to your function mjn() but it always tells me “maybe there are duplicated sequences in your data”. My data is a set of 108 sequences; some of which are identical to each other. Does this mean that I have to make each haplotype have only one sequence? Or is there something I miss? Thank you in advance.

emmanuelparadis · 2024-03-12T04:36:01Z

Hello,
Try the function haplotype() (in pegas too) on your 108 sequences. Once you did it, you can check all sequences are distinct with:

h <- haplotype(<<you sequence data name>>)
all(dist.dna(h, "n") > 0)

If the result is TRUE, you should be able to use mjn(h).
Best,
E.

Chatchamew · 2024-03-12T08:52:37Z

It came out as “FALSE”. I have already used “strict = TRUE” in the haplotype() function. By the way, some haplotypes are different through only deletions. Is this also why mjn() didn’t work?

emmanuelparadis · 2024-03-12T11:26:46Z

I suggest you try:

all(dist.dna(h, "n", pairwise.deletion = TRUE) > 0)

And also:

image(h)

It seems you read the help page ?haplotype so that you understand that trailing/leading gaps are a problem when identifying haplotypes. The above commands should help you to assess the situation with your data.

Chatchamew · 2024-03-12T12:09:20Z

I have tried all() and the result came up as “FALSE”, unfortunately. I have also tried image() and there are several gaps and degenerate bases (namely N and Y). I also tried trailingGapAsN = TRUE, and the all() still came up as “FALSE”. Any advice? I’m so sorry for wasting your time.

emmanuelparadis · 2024-03-13T10:25:34Z

It seems you have a very difficult data set, so your inferences will be necessarily limited.

Chatchamew · 2024-03-13T10:43:14Z

Roger that. I have dived into the data and found out that all pairs of haplotypes that have dist.dna = 0 differ only in either deletion or insertion. I really wish that you might try giving us an option to use mjn() despite the deletion, because the haplotype() function can separate them nicely.

emmanuelparadis · 2024-03-13T11:10:00Z

Have a look at this function in ape: DNAbin2indel. With it, you can then create a binary matrix indicating presence/absence of indels. pegas::mjn() can also analyse binary (0/1) data.

Chatchamew · 2024-03-14T05:56:23Z

I have tried a subset of my data with only sixteen sequences. I have checked the all(dist.dna()) function. If I used pairwise.deletion = TRUE, the all() function came up as TRUE. If I used pairwise.deletion = FALSE, the all() function came up as FALSE. When I used mjn(), it said “duplicate” again. Is there anything I can do?

By the way, can mjn() consider both base difference and deletion/insertion at the same time? I have tried DNAbin2indel() and the matrix I got is concerned only on deletion/insertion? Can I mix this matrix with base difference matrix and make it go through mjn()?

emmanuelparadis · 2024-03-21T09:37:05Z

Two other functions from ape that could help you with your data: latag2n() and solveAmbiguousBases() (maybe you already found them in the meantime).

You can try rmst() (also in pegas): it requires distances (unlike mjn()). It's not the same algorithm of course but it can sometimes give the same network (see the last example in ?rmst).

Chatchamew · 2024-03-22T04:33:50Z

Thank you so much for your response. I am now using rmst(). The only problem I have is that it doesn't generate median vectors. I will try to tackle with more sequences in the future. By the way, I would love to try mjn() that considers both base difference and base deletion/insertion at the same time. That would be revolutionary!

emmanuelparadis · 2024-04-08T01:14:56Z

By the way, I would love to try mjn() that considers both base difference and base deletion/insertion at the same time. That would be revolutionary!

There is the difficulty that indels within or on the head/tail of sequences should be treated differently. Another difficulty is that indels may also have substitutions (either before the deletion, or after the insertion). This could affect the time-reversibility of the model but I'm not sure how this is critical to the median-vectors.
The revolution has to wait a bit!

emmanuelparadis mentioned this issue Oct 28, 2024

Haplotype network gaps #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated Sequence for MJN #87

Duplicated Sequence for MJN #87

Chatchamew commented Mar 11, 2024

emmanuelparadis commented Mar 12, 2024

Chatchamew commented Mar 12, 2024

emmanuelparadis commented Mar 12, 2024

Chatchamew commented Mar 12, 2024

emmanuelparadis commented Mar 13, 2024

Chatchamew commented Mar 13, 2024

emmanuelparadis commented Mar 13, 2024

Chatchamew commented Mar 14, 2024

emmanuelparadis commented Mar 21, 2024

Chatchamew commented Mar 22, 2024

emmanuelparadis commented Apr 8, 2024

Duplicated Sequence for MJN #87

Duplicated Sequence for MJN #87

Comments

Chatchamew commented Mar 11, 2024

emmanuelparadis commented Mar 12, 2024

Chatchamew commented Mar 12, 2024

emmanuelparadis commented Mar 12, 2024

Chatchamew commented Mar 12, 2024

emmanuelparadis commented Mar 13, 2024

Chatchamew commented Mar 13, 2024

emmanuelparadis commented Mar 13, 2024

Chatchamew commented Mar 14, 2024

emmanuelparadis commented Mar 21, 2024

Chatchamew commented Mar 22, 2024

emmanuelparadis commented Apr 8, 2024