.anchor file format #141

hyl317 · 2019-06-20T17:58:25Z

Hi, I understand that the first two columns are the homologs identified via LAST, but I am wondering what the third column (those integer numbers) of .anchor file means.

GSVIVT01012028001 ppa011886m 515
GSVIVT01012027001 ppa026797m 297
GSVIVT01012026001 ppa006860m 609
GSVIVT01012023001 ppa000608m 2780
GSVIVT01012018001 ppa012865m 123
GSVIVT01012018001 ppa025457m 93
GSVIVT01012012001 ppa010496m 568
GSVIVT01012008001 ppa002064m 1180

tanghaibao · 2019-06-20T19:09:02Z

@hyl317

These are scores coming from LAST, which were used in steps prior to the .anchors such as filtering based on C-score, or prioritize matches in a series of matches among tandem repeats.

Sometimes when the .anchors file is an output from "liftover" that enrich the synteny signal, i.e. .lifted.anchors. Certain pairs can have the third column ending in L, for example,

GSVIVT01012008001 ppa002064m 1180L

The L simply highlights the fact that these are low quality anchors close to high quality anchors.

Haibao

tanghaibao · 2019-09-01T17:17:33Z

Adding a link in the wiki in case someone else has the same question.
Closing issue.

ohdongha · 2021-01-29T03:05:27Z

Dear Haibao ( @tanghaibao ),
First, thank you for providing this excellent toolkit - the clarity of graphics is amazing.

Regarding this thread, could you explain a bit more about C-score and the criteria to "lift" a pair with weaker alignment?

After running the jcvi.compara.catalog, following the instruction to create a microsynteny visualization, I can see that there was a step to filter alignments based on "cscore>=0.70" and I wonder what this means. Is it something similar to the chain-score mentioned in the MCscan paper, or something totally different like the coverage of alignment over the (shorter of the two) protein length?

Also, I wonder what is the criteria of "lifting" weaker alignment. I guess if a pair can be included in a co-linear block, a weaker alignment is allowed. Will the "weak" alignment based on a more relaxed e-value or bit-score cutoff? Does it something to do with "dist=10"?

Basically, when we generate a nice microsynteny plot (like the one comparing a stretch of co-linear genes between grape, cacao, etc. in the tutorial), what we can say in the figure legend? It would be nice if we can say "Gray ribbons connect co-linear ortholog pairs identified based on ." And the could be "C-score >= 0.7" (with an explanation or reference about C-score, in the Methods section) or "e-value <1E-5" etc.

Thanks again!

tanghaibao · 2021-01-29T06:58:04Z

@ohdongha

C-score = score(A, B) / max(score(A,), score(,B)), this has range between 0 to 1.
i.e. how the score of current pair A-B compares against all gene pairs that touch either A or B.
C-score generalizes the idea of the reciprocal best match, you can see that the reciprocal best will have a C-score of 1. Anything weaker than reciprocal best is lower than 1, the default in jcvi.compara.catalog is 0.7, which is considered "strong" enough.

So the initial synteny block is defined over "strong" pairs (C-score >= 0.7, as you saw). Then the "liftover" adds more gene pairs that are weaker (in terms of C-value <0.7) but are sufficiently close to the high-quality synteny chain (within a distance of 10, by default). This second step aims at adding more synteny signal.

Finally, checkout for example a genome paper here: https://www.nature.com/articles/ng.3435
Figure 3c is a microsynteny plot, consult the figure legends there .. and yes the C-score cutoff typically goes in the Methods section.

ohdongha · 2021-01-29T07:43:45Z

@tanghaibao thanks a lot.

C-score sounds like a clever way to rank blast-type hits. Is there a reference I can site when mentioning this concept (c-score filter + liftover) in the Methods? I searched around with “c-score” but couldn’t find a paper immediately. Is this mentioned in the 2008 MCscan paper (and I missed it?)

@ohdongha

C-score = score(A, B) / max(score(A,), score(,B)), this has range between 0 to 1.
i.e. how the score of current pair A-B compares against all gene pairs that touch either A or B.
C-score generalizes the idea of the reciprocal best match, you can see that the reciprocal best will have a C-score of 1. Anything weaker than reciprocal best is lower than 1, the default in jcvi.compara.catalog is 0.7, which is considered "strong" enough.

So the initial synteny block is defined over "strong" pairs (C-score >= 0.7, as you saw). Then the "liftover" adds more gene pairs that are weaker (in terms of C-value <0.7) but are sufficiently close to the high-quality synteny chain (within a distance of 10, by default). This second step aims at adding more synteny signal.

Finally, checkout for example a genome paper here: https://www.nature.com/articles/ng.3435
Figure 3c is a microsynteny plot, consult the figure legends there .. and yes the C-score cutoff typically goes in the Methods section.

tanghaibao · 2021-01-29T12:52:34Z

@ohdongha

I did not invent the use of c-score, or c-value, although arguably I was among the earliest to use it in the context of synteny inference. Reference:
https://www.pnas.org/content/107/1/472

The initial credit goes to the Amphioxus genome paper:
https://pubmed.ncbi.nlm.nih.gov/18563158/

For the lift-over approach, it is an implementation detail and I tend to gloss over it in various genome papers that I worked on over the years.

ohdongha · 2021-01-29T15:49:39Z

@tanghaibao Cool! Thanks again for the quick replies,
Dong-Ha

tanghaibao closed this as completed Sep 1, 2019

tanghaibao mentioned this issue Jun 13, 2020

Can I use jcvi to plot the MCSCanX #244

Closed

tanghaibao reopened this Jan 29, 2021

tanghaibao closed this as completed Jan 29, 2021

Li-Tianran mentioned this issue May 21, 2022

--cscore=1: A total of 0 anchor was found #471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.anchor file format #141

.anchor file format #141

hyl317 commented Jun 20, 2019

tanghaibao commented Jun 20, 2019 •

edited

Loading

tanghaibao commented Sep 1, 2019

ohdongha commented Jan 29, 2021

tanghaibao commented Jan 29, 2021

ohdongha commented Jan 29, 2021

tanghaibao commented Jan 29, 2021 •

edited

Loading

ohdongha commented Jan 29, 2021

.anchor file format #141

.anchor file format #141

Comments

hyl317 commented Jun 20, 2019

tanghaibao commented Jun 20, 2019 • edited Loading

tanghaibao commented Sep 1, 2019

ohdongha commented Jan 29, 2021

tanghaibao commented Jan 29, 2021

ohdongha commented Jan 29, 2021

tanghaibao commented Jan 29, 2021 • edited Loading

ohdongha commented Jan 29, 2021

tanghaibao commented Jun 20, 2019 •

edited

Loading

tanghaibao commented Jan 29, 2021 •

edited

Loading