Super high accuracy basecalling eliminate read reversal? #257

jennieli421 · 2023-09-17T19:35:38Z

jennieli421
Sep 17, 2023

I run a simulation, with super high accuracy basecall mode, and with readfish enrichment on. My length histogram looks almost identical to "no enrichment" condition because there is only few reads rejected. I also noticed that there are a lot more "skipped" reads (log). Also, the total number of read sequenced is almost half of the "no enrichment" condition. Would be helpful to know why this might happen.

Answered by mattloose

Sep 17, 2023

Hi,

There are a few issues here.

Firstly, super accuarcy basecalling is not fast enough to call the data in real time. In real terms, the basecaller has to be faster than real time for adaptive sampling to keep up. If not, you will end up processing reads once they are already complete - i.e they have alwready finished sequencing.

In addition, the histogram of the bulk file above seem to only contain short data. Adaptiver sampling is not going to be particularly effective on this sample as most reads are too short.

The reason you are getting a lot of skipped reads is that if MinKNOW can't call the reads quickly enough it will skip them for later calling - if you are using super accuracy y…

View full answer

mattloose · 2023-09-17T20:05:24Z

mattloose
Sep 17, 2023
Maintainer

Hi,

There are a few issues here.

Firstly, super accuarcy basecalling is not fast enough to call the data in real time. In real terms, the basecaller has to be faster than real time for adaptive sampling to keep up. If not, you will end up processing reads once they are already complete - i.e they have alwready finished sequencing.

In addition, the histogram of the bulk file above seem to only contain short data. Adaptiver sampling is not going to be particularly effective on this sample as most reads are too short.

The reason you are getting a lot of skipped reads is that if MinKNOW can't call the reads quickly enough it will skip them for later calling - if you are using super accuracy you are going to see lots of these skipped reads.

0 replies

jennieli421 · 2023-09-19T20:45:48Z

jennieli421
Sep 19, 2023
Author

Forgot to clarify that I only changed the basecall mode in MinKNOW, not in Readfish. The TOML still sets config as fast.

[caller_settings]
config_name = "dna_r10.4.1_e8.2_400bps_5khz_fast"

I do not understand why change of mode in MinKNOW will affect Readfish decision. Are the two basecalling processes separate? Will MinKNOW setting override Readfish setting?

0 replies

mattloose · 2023-09-19T20:50:09Z

mattloose
Sep 19, 2023
Maintainer

Wether or not the two basecalling processes are seperate will depend on your setup. I suspect that they are not seperate. Readfish will use the existing guppy server unless you have configured an indepedent server. The super accuracy basecalling will slow everything down. If you have multiple GPUs you could configure one for running readfish and the other for running basecalling for MinKNOW but this is quite advanced.

Unfortunately the super accuracy mode is really GPU intensive. To test this you could repeat the same experiment and use fast base calling everywhere. My expectation is that you will have fewer skipped reads, but you may not see any improvement in read unblocking due to the short reads you are looking at.

0 replies

jennieli421 · 2023-10-17T02:26:44Z

jennieli421
Oct 17, 2023
Author

I have thought about this and think I should turn off the basecalling option in MinKNOW when I use readfish, and basecall the fast5 files post-sequencing. Hopefully this would avoid any uncertainty.
Also, I have noticed in my data and several papers that the false rejection rate is high. so I would like to inquire about your opinion on using high accuracy caller setting for readfish. I am considering it because it might increase on-target rate by decreasing false rejections. Would that also significantly slow everything down (I don't think so)?

0 replies

Adoni5 · 2023-10-17T10:18:07Z

Adoni5
Oct 17, 2023
Maintainer

HAC base-calling should be fast enough depending on what compute you are running on. You will probably see some reduction in false rejections.

Turning off real time base calling in MinKNOW again will definitely help minimise the time to base-call, align and make a decision for a read, and will definitely help in the event that you are using HAC and don't have a lot of extra GPU power lying around.

1 reply

jennieli421 Oct 17, 2023
Author

I'm using a computer with good GPU spec NVIDIA RTX A4000 + 16GB memory, which should be fast enough.

mattloose · 2023-10-17T10:30:38Z

mattloose
Oct 17, 2023
Maintainer

It's worth thinking about what you mean by "False Rejection Rate" as well.

A false rejection is a read which is unblocked by the sequencer which was on target and so should have been sequenced.

Things that can cause this are:

Incorrect identification of a molecule by the fast basecaller compared with the hac basecaller (this is very rare in my experience).
A short chunk of a read that maps to multiple places and may be unblocked as a consequence (this is user setting dependent).
A short chunk of a read that doesn't map to the correct location at the point of analysis by read until but then a few more bases of sequence allows it to later map and appear to be sequenced.
A read that was so short it was never seen by the read until API and is misinterpreted as being unblocked when in fact it was just short.

This last case (4) is usually the dominant problem in many papers and reports, followed by incorrect configuration (2) followed by (3).

You will see a small imrpovement as Rory says above but I would be surprised if it made a significant difference.

Other options like changes to alignment parameters (coming soon) may improve things more.

5 replies

jennieli421 Oct 17, 2023
Author

In case (4), how short a read is to cause this issue?

mattloose Oct 17, 2023
Maintainer

Something less than 400 bases.

mattloose Oct 17, 2023
Maintainer

Just to add - if you want to explore different mapping strategies you can configure this in the TOML file - please have a look here for more guidance: https://looselab.github.io/readfish/toml.html#minimap2

jennieli421 Oct 17, 2023
Author

I originally thought if a read is too short it should be sequenced without being classified by the software, so it would not be in the rejection list. But what you said is the opposite. Slightly confused.

mattloose Oct 17, 2023
Maintainer

It depends how you are calculating your rejected reads. If you are looking at the unblocked read id list written out from readfish then that read has been seen. If you are using any other methd to identify unblocked reads it may be incorrect.

jennieli421 · 2023-10-17T20:41:59Z

jennieli421
Oct 17, 2023
Author

Follow up on turning off basecalling by MinKNOW, I realize I cannot turn it off:

while I could deselect "output fastq", not sure if that actually disables basecalling:

2 replies

mattloose Oct 17, 2023
Maintainer

It doesn't make sense that you can't turn it off.

I think you need to raise this problem with Nanopore customer support.

jennieli421 Oct 17, 2023
Author

I figured that out! because I was using the setting from previous run. After "reload script" several times, all settings are back to default, and there I can turn off basecalling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Super high accuracy basecalling eliminate read reversal? #257

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Super high accuracy basecalling eliminate read reversal? #257

jennieli421 Sep 17, 2023

Replies: 7 comments · 8 replies

mattloose Sep 17, 2023 Maintainer

jennieli421 Sep 19, 2023 Author

mattloose Sep 19, 2023 Maintainer

jennieli421 Oct 17, 2023 Author

Adoni5 Oct 17, 2023 Maintainer

jennieli421 Oct 17, 2023 Author

mattloose Oct 17, 2023 Maintainer

jennieli421 Oct 17, 2023 Author

mattloose Oct 17, 2023 Maintainer

mattloose Oct 17, 2023 Maintainer

jennieli421 Oct 17, 2023 Author

mattloose Oct 17, 2023 Maintainer

jennieli421 Oct 17, 2023 Author

mattloose Oct 17, 2023 Maintainer

jennieli421 Oct 17, 2023 Author

jennieli421
Sep 17, 2023

Replies: 7 comments 8 replies

mattloose
Sep 17, 2023
Maintainer

jennieli421
Sep 19, 2023
Author

mattloose
Sep 19, 2023
Maintainer

jennieli421
Oct 17, 2023
Author

Adoni5
Oct 17, 2023
Maintainer

jennieli421 Oct 17, 2023
Author

mattloose
Oct 17, 2023
Maintainer

jennieli421 Oct 17, 2023
Author

mattloose Oct 17, 2023
Maintainer

mattloose Oct 17, 2023
Maintainer

jennieli421 Oct 17, 2023
Author

mattloose Oct 17, 2023
Maintainer

jennieli421
Oct 17, 2023
Author

mattloose Oct 17, 2023
Maintainer

jennieli421 Oct 17, 2023
Author