Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No reference-sense paths available" in minigraph/cactus graphs #4

Open
jdidion opened this issue Aug 25, 2022 · 3 comments
Open

"No reference-sense paths available" in minigraph/cactus graphs #4

jdidion opened this issue Aug 25, 2022 · 3 comments

Comments

@jdidion
Copy link

jdidion commented Aug 25, 2022

I am trying to use the GRCh38 minigraph/cactus graph to align reads and output them in GRCh38 coordinates. However, I get the following warning when running vg giraffe: warning:[vg::get_sequence_dictionary] No reference-sense paths available in the graph; falling back to generic paths. This is a bit cryptic, but I take it to mean that the reference paths cannot be found in the graph files and so the output will not be in GRCh38 coordinates. Is this the case, and if so is there a fix?

For reference, here is my command line:

vg giraffe \
  -m hprc-v1.0-mc-grch38.min \
  -d hprc-v1.0-mc-grch38.dist \
  -g hprc-v1.0-mc-grch38.gg \
  -H hprc-v1.0-mc-grch38.gbwt \
  -x hprc-v1.0-mc-grch38.xg \
  -f my.fastq.gz \
  -P -o SAM -t 14 \
| samblaster --ignoreUnmated \
| samtools sort -@2 -m 4G -M -o my.bam
@jdidion
Copy link
Author

jdidion commented Aug 25, 2022

It looks like you just have to pass the reference path names using the --ref-paths of vg giraffe.

For anyone else that runs into this issue, the command to generate these path names is:

vg paths -g hprc-v1.0-mc-grch38.gbwt -L | grep GRCh38 > GRCh38_refpaths.txt

It might be useful to provide the reference paths file for both GRCh38 and CHM13 in the table on this repo's readme page.

@glennhickey
Copy link
Collaborator

@jdidion Thanks for the feedback. I imagine this is due to an issue of compatibility with these indexes (which were produced with an older version of vg) and recent versions of vg which are beginning to introduce some new notiions of path metadata.

@adamnovak What's the best thing to do here? Can that warning be ignored? I know that I for one need to updatte minigraph-cactus to use the latest vg, but that won't help the published indexes referenced above.

@adamnovak
Copy link

The warning just means that the graph doesn't have any paths that are explicitly marked as representing a reference. If all the paths in the graph are from the same reference anyway, you can just ignore it.

If the graph has paths from multiple references, like both GRCh38 and CHM13, and if neither sample is marked as being "the" reference, the SAM output will freely mix paths from GRCh38 and CHM13 to explain read coordinates. If you don't want that, you would need to use --ref-paths like you show to explain which paths you want the SAM in.

The best argument to --ref-paths is probably an HTSLib sequence dictionary file, which captures path order in the assembly and also total path length independent of how much of the path is covered in the graph; it would be a good idea to post .dict files for the two assemblies used here. @glennhickey Do you have those around?

As describe in https://github.com/vgteam/vg/wiki/VG-GBWT-Subcommand#setting-tags, you can also make a copy of the GBWT that marks the sample you want as a reference. Something like:

vg gbwt --set-tag "reference_samples=GRCh38" -o hprc-v1.0-mc-grch38-grch38ref.gbwt hprc-v1.0-mc-grch38.gbwt

vg might need a way to let you tell Giraffe and vg surject at runtime to just consider a sample a reference and override the file tags, and also a succinct way to use just one particular reference when a file has multiple reference samples stored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants