Super high accuracy basecalling eliminate read reversal? #257
-
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 8 replies
-
Hi, There are a few issues here. Firstly, super accuarcy basecalling is not fast enough to call the data in real time. In real terms, the basecaller has to be faster than real time for adaptive sampling to keep up. If not, you will end up processing reads once they are already complete - i.e they have alwready finished sequencing. In addition, the histogram of the bulk file above seem to only contain short data. Adaptiver sampling is not going to be particularly effective on this sample as most reads are too short. The reason you are getting a lot of skipped reads is that if MinKNOW can't call the reads quickly enough it will skip them for later calling - if you are using super accuracy you are going to see lots of these skipped reads. |
Beta Was this translation helpful? Give feedback.
-
Forgot to clarify that I only changed the basecall mode in MinKNOW, not in Readfish. The TOML still sets config as fast.
I do not understand why change of mode in MinKNOW will affect Readfish decision. Are the two basecalling processes separate? Will MinKNOW setting override Readfish setting? |
Beta Was this translation helpful? Give feedback.
-
Wether or not the two basecalling processes are seperate will depend on your setup. I suspect that they are not seperate. Readfish will use the existing guppy server unless you have configured an indepedent server. The super accuracy basecalling will slow everything down. If you have multiple GPUs you could configure one for running readfish and the other for running basecalling for MinKNOW but this is quite advanced. Unfortunately the super accuracy mode is really GPU intensive. To test this you could repeat the same experiment and use fast base calling everywhere. My expectation is that you will have fewer skipped reads, but you may not see any improvement in read unblocking due to the short reads you are looking at. |
Beta Was this translation helpful? Give feedback.
-
I have thought about this and think I should turn off the basecalling option in MinKNOW when I use readfish, and basecall the fast5 files post-sequencing. Hopefully this would avoid any uncertainty. |
Beta Was this translation helpful? Give feedback.
-
HAC base-calling should be fast enough depending on what compute you are running on. You will probably see some reduction in false rejections. Turning off real time base calling in MinKNOW again will definitely help minimise the time to base-call, align and make a decision for a read, and will definitely help in the event that you are using HAC and don't have a lot of extra GPU power lying around. |
Beta Was this translation helpful? Give feedback.
-
It's worth thinking about what you mean by "False Rejection Rate" as well. A false rejection is a read which is unblocked by the sequencer which was on target and so should have been sequenced. Things that can cause this are:
This last case (4) is usually the dominant problem in many papers and reports, followed by incorrect configuration (2) followed by (3). You will see a small imrpovement as Rory says above but I would be surprised if it made a significant difference. Other options like changes to alignment parameters (coming soon) may improve things more. |
Beta Was this translation helpful? Give feedback.
-
Follow up on turning off basecalling by MinKNOW, I realize I cannot turn it off: while I could deselect "output fastq", not sure if that actually disables basecalling: |
Beta Was this translation helpful? Give feedback.
Hi,
There are a few issues here.
Firstly, super accuarcy basecalling is not fast enough to call the data in real time. In real terms, the basecaller has to be faster than real time for adaptive sampling to keep up. If not, you will end up processing reads once they are already complete - i.e they have alwready finished sequencing.
In addition, the histogram of the bulk file above seem to only contain short data. Adaptiver sampling is not going to be particularly effective on this sample as most reads are too short.
The reason you are getting a lot of skipped reads is that if MinKNOW can't call the reads quickly enough it will skip them for later calling - if you are using super accuracy y…