Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Length fix #78

Closed
wants to merge 1 commit into from
Closed

Read Length fix #78

wants to merge 1 commit into from

Conversation

kachulis
Copy link

Max read length is tested against alignment span (end - start), so can end up being incorrect. This PR fixes this, but testing against true read length.

@agraubert
Copy link
Collaborator

alignmentSize is really only used for checking an edge case in legacy mode.
You may notice that on line 277 we use alignmentSize rather than alignment.Length() for overriding the readLength, which I also admit is strange, but I can't convince myself that it changes the behavior in any meaningful way (other than perhaps overwriting readLength more often than is necessary). readLength itself is then only used at the end where it is reported directly.

@agraubert agraubert closed this May 17, 2023
@kachulis
Copy link
Author

@agraubert I think that is correct, IF the data being used has a constant read length. If that is not the case, due to trimming, or a technology which produces variable length reads, then the readLength metric will end up being somewhat randomly chosen from the distribution of observed read lengths.

I guess the question is what a good behavior would be in these variable read length cases. This PR is more intended to address the trimming case, with the thought being that probably the length of the longest observed read is the most meaningful value for "read length", with all shorter reads having been trimmed down from there. In the case where the sequencing tech itself produces variable read length, this still doesn't quite work, but I'm not sure there's really a good option for representing read length with a single metric in that case anyway.

@agraubert
Copy link
Collaborator

I see your point. It's intended to be the maximum observed read length, but as currently implemented, I suppose it's the read length of the maximum spanning read. We never ran into problems during testing because this was designed against fixed-length 150bp illumina reads. I'll patch that line when I get a chance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants