Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New: added Barcode qualities keeping in FastqToBam #932

Closed
wants to merge 1 commit into from

Conversation

galaxy001
Copy link

For issue #931.

This is my first Java code.
The unit test is a bit difficult for me. I tried on real data with and without -b, the output bam seems fine.

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galaxy001: this is fantastic, and you've done a really good job.

I'd change the command line option to be a boolean (default false) to opt-in to adding the sample barcode base qualities, and then hardcode the tag as QT as per the SAM spec (just like BC). Thoughts?

And then once we have this approved, I can add a unit test so don't worry about that.

@@ -92,6 +92,7 @@ class FastqToBam
@arg(flag='s', doc="If true, queryname sort the BAM file, otherwise preserve input order.") val sort: Boolean = false,
@arg(flag='u', doc="Tag in which to store molecular barcodes/UMIs.") val umiTag: String = ConsensusTags.UmiBases,
@arg(flag='q', doc="Tag in which to store molecular barcode/UMI qualities.") val umiQualTag: Option[String] = None,
@arg(flag='b', doc="Tag in which to store sample barcode qualities.") val smpQualTag: Option[String] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have a short flag, it's not the worst to have a fully qualified long name

Suggested change
@arg(flag='b', doc="Tag in which to store sample barcode qualities.") val smpQualTag: Option[String] = None,
@arg(flag='b', doc="Tag in which to store sample barcode qualities.") val sampleBarcodeQualTag: Option[String] = None,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you thin about just making this a boolean value, and then hard code QT below (like we do for BC) since thats the recommended tag from the SAM spec?

@@ -157,6 +158,7 @@ class FastqToBam
try {
val subs = fqs.iterator.zip(structures.iterator).flatMap { case(fq, rs) => rs.extract(fq.bases, fq.quals) }.toIndexedSeq
val sampleBarcode = subs.iterator.filter(_.kind == SampleBarcode).map(_.bases).mkString("-")
val smpQual = subs.iterator.filter(_.kind == SampleBarcode).map(_.quals).mkString(" ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. updated the name as per above
  2. Assuming you'll use QT as per the SAM spec, in which case we want it hyphen delimited:

The recommended implementation concatenates all the barcodes and places a hyphen (‘-’) between the barcodes from the same template.

Suggested change
val smpQual = subs.iterator.filter(_.kind == SampleBarcode).map(_.quals).mkString(" ")
val sampleBarcodeQualTag = subs.iterator.filter(_.kind == SampleBarcode).map(_.quals).mkString("-")

@@ -181,6 +183,7 @@ class FastqToBam
}

if (sampleBarcode.nonEmpty) rec("BC") = sampleBarcode
if (smpQual.nonEmpty) smpQualTag.foreach(rec(_) = smpQual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hard code to QT as per above

@galaxy001
Copy link
Author

galaxy001 commented Aug 30, 2023

Hard code to QT is fine since STAR allow setting the input tag name, and the spec is here.
Using hyphen (‘-’) between is recommended for BC, and for quality values, the spec always recommends to use space as in QT, QX, CY and BZ.

@galaxy001
Copy link
Author

And for -q, the spec says QX.
I find -u is default to RX but remains configurable, is this better to follow this manner ?

@nh13
Copy link
Member

nh13 commented Aug 30, 2023

Lets follow the spec as closely as possible, and I misread it! I also want to keep backwards compatibility at this point.

How about:

  • BC: sample barcode bases, hyphen-delimited (keep-hard-coded)
  • QT: sample barcode qualities, space delimited (make hard-coded in this PR, boolean command line option to opt-in)
  • RX: molecular barcode bases, hyphen-delimited (keep-hard-coded default as a command line option)
  • QX: molecular barcode qualities, space-delimited (keep opt-in with as a command line option)

@galaxy001
Copy link
Author

galaxy001 commented Aug 31, 2023

Lets follow the spec as closely as possible, and I misread it! I also want to keep backwards compatibility at this point.

How about:

* BC: sample barcode bases, hyphen-delimited (keep-hard-coded)

* QT: sample barcode qualities, space delimited (make hard-coded in this PR, boolean command line option to opt-in)

* RX: molecular barcode bases, hyphen-delimited (keep-hard-coded default as a command line option)

* QX: molecular barcode qualities, space-delimited (keep opt-in with as a command line option)

If we choose to follow the spec recommendation, the BC - QT pair should be hard-coded, but the RX - QX pair is not, since there is also OX -BZ pair for Raw (uncorrected) UMI. The spec in fact stats two sets of UMI tags, thus it is better to make them at least switchable.

I think the OX -BZ pair is less common just because there is no famous correcting tools available. So considering the future, we should provide them.

So, there are two options:

  1. offer the default value and allow user to configure, either to OX -BZ or to anything else.
  2. hard-coded four value, add another option for raw UMI to switch RX - QX to OX -BZ.

@nh13
Copy link
Member

nh13 commented Aug 31, 2023

It's probably easier for me to explain in code, how about this: #933?

@galaxy001
Copy link
Author

It is fine to consider barcode only this time. And barcode is preferred to use hard-coded tag name.

I made a comment on L95.

@nh13 nh13 closed this Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants