Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTION: Is GenomeFasta reverse complementing negative strand sequences on write? #13

Open
adamklie opened this issue Dec 22, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@adamklie
Copy link
Collaborator

I noticed that for the following lines, it seems we are automatically reverse complementing reverse (-) strand sequences, on write to Zarr:

https://github.com/ML4GLand/SeqData/blob/main/seqdata/_io/readers/fasta.py#L220
https://github.com/ML4GLand/SeqData/blob/main/seqdata/_io/readers/fasta.py#L249

I wanted to better understand the logic here. If I pass in a bed file with strand information, are we effectively writing everything as if it were on the forward (+) strand?

I think this may be a problem if so. If I read back in this same dataset, I'm going to get the (+) strand returned, but the metadata for the dataset is going to indicate that it is coming from the original strand, which could be (-).

@adamklie adamklie added the question Further information is requested label Dec 22, 2024
@adamklie
Copy link
Collaborator Author

Also seems like we are assuming positive strand if that column isn't provided. I think this is fine. We should just have it documented clearly:

https://github.com/ML4GLand/SeqData/blob/main/seqdata/xarray/seqdata.py#L345

@d-laub
Copy link
Collaborator

d-laub commented Dec 25, 2024

Yes, the intent is that negative stranded sequences should be RC'd and values should be reversed. Agree that this should be documented clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants