You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.
One cancer use case is to mapping reads to both human and viral reference genomes simultaneous to detect the presences of viral DNA or RNA in samples.
A similar use case in including decoy sequences of known human genomic DNA sequence that have not been incorporated in reference genome. This is to help prevent incorrect mapping of READS to homologous DNA that is in the reference sequence.
The current approach with file based referees is to both of these issues is to create a combined reference file. For instance, TCGA create a composite reference genomes which consistent of GRCh37-lite and number of viral genomes, for example GRCh37-lite_WUGSC_variant_2:
This results in a proliferation of composite references that have no standard method of identification.
A clear and more robust approach would be to support multiple mapping targets in the API. Instead of a single ReferenceSet as mapping targets, a list of ReferenceSets could address this issue.
The text was updated successfully, but these errors were encountered:
If we go with this approach then we have to enforce that reference IDs are used when constructing position requests. The reason being, we no longer have uniqueness guarantees of the reference names used. #616
An alternative would be to document that when multiple references are used, the reference names must be unique from the super set.
To immediately support this use case you might prepare a reference set that contains both the viral sequences and the base assembly. Although that link is now dead, I suspect that is what folks do in practice, make a FASTA with everything they want to align to and go.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
One cancer use case is to mapping reads to both human and viral reference genomes simultaneous to detect the presences of viral DNA or RNA in samples.
A similar use case in including decoy sequences of known human genomic DNA sequence that have not been incorporated in reference genome. This is to help prevent incorrect mapping of READS to homologous DNA that is in the reference sequence.
The current approach with file based referees is to both of these issues is to create a combined reference file. For instance, TCGA create a composite reference genomes which consistent of GRCh37-lite and number of viral genomes, for example GRCh37-lite_WUGSC_variant_2:
https://browser.cghub.ucsc.edu/help/assemblies/#GRCh37-lite_WUGSC_variant_2
This results in a proliferation of composite references that have no standard method of identification.
A clear and more robust approach would be to support multiple mapping targets in the API. Instead of a single ReferenceSet as mapping targets, a list of ReferenceSets could address this issue.
The text was updated successfully, but these errors were encountered: