Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple fasta sequences, but of different lengths #93

Open
SheepwormJM opened this issue Dec 3, 2024 · 1 comment
Open

Multiple fasta sequences, but of different lengths #93

SheepwormJM opened this issue Dec 3, 2024 · 1 comment

Comments

@SheepwormJM
Copy link

Hi,

I think I may have an idea of the right answer to my question, but hoping to check I'm not going to do something stupid. I'm wondering if I need to create an multisequence alignment file rather than using a multifasta. My ultimate aim is to produce a haplotype network.

I have a locus which I've amplified, but which has lots of small indels within an intronic part of the sequence.

I have used the following code to load a multifasta file into R, but cannot convert it to a matrix because the sequences are uneven lengths. I can't see any way to get a matrix to load with 'NA' for gaps, and I'm not sure if pegas would subsequently re-align the sequences, or assume they were aligned, if the matrix stuck a lot of NAs on the end of each sequence, rather than internally.

library("apex")
library("adegenet")
library("pegas")
library("mmod")
library("poppr")

# To get a SINGLE fasta file in: 
myseq<-read.FASTA("ASV_multifasta.fa")
myseq # Provides the summary information of the file
2172 DNA sequences in binary format stored in a list.

Mean sequence length: 329.453 
   Shortest sequence: 294 
    Longest sequence: 350 

Labels:
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
ASV3 BO_04_M
...

Base composition:
    a     c     g     t 
0.282 0.204 0.190 0.324 
(Total: 715.57 kb)
# We need to make it as a matrix:
myseqmatrix<-as.matrix(myseq)

Then I get the error telling me it won't work because the sequences are different lengths.

Error in as.matrix.DNAbin(myseq) : 
  DNA sequences in list not of the same length.

If I make a multifasta file that has a sequence from an Multisequence alignment instead of the sequence itself, would that work for pegas and a haplotype network? Or would it then change the output? Would it even work for the conversion to a matrix?

What do people do with uneven sequence lengths?

Many thanks!

@SheepwormJM
Copy link
Author

Ah sorry, I had totally missed the #92 issue raised by FischHa. From looking at their data I'm assuming that loading a MSA might work to get it into a matrix. So I guess my remaining question is whether it will work to then plot a haplotype network? Or will indels etc be discarded?

Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant