-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.anchor file format #141
Comments
These are scores coming from LAST, which were used in steps prior to the Sometimes when the
The L simply highlights the fact that these are low quality anchors close to high quality anchors. Haibao |
Adding a link in the wiki in case someone else has the same question. |
Dear Haibao ( @tanghaibao ), Regarding this thread, could you explain a bit more about C-score and the criteria to "lift" a pair with weaker alignment? After running the jcvi.compara.catalog, following the instruction to create a microsynteny visualization, I can see that there was a step to filter alignments based on "cscore>=0.70" and I wonder what this means. Is it something similar to the chain-score mentioned in the MCscan paper, or something totally different like the coverage of alignment over the (shorter of the two) protein length? Also, I wonder what is the criteria of "lifting" weaker alignment. I guess if a pair can be included in a co-linear block, a weaker alignment is allowed. Will the "weak" alignment based on a more relaxed e-value or bit-score cutoff? Does it something to do with "dist=10"? Basically, when we generate a nice microsynteny plot (like the one comparing a stretch of co-linear genes between grape, cacao, etc. in the tutorial), what we can say in the figure legend? It would be nice if we can say "Gray ribbons connect co-linear ortholog pairs identified based on ." And the could be "C-score >= 0.7" (with an explanation or reference about C-score, in the Methods section) or "e-value <1E-5" etc. Thanks again! |
C-score = score(A, B) / max(score(A,), score(,B)), this has range between 0 to 1. So the initial synteny block is defined over "strong" pairs (C-score >= 0.7, as you saw). Then the "liftover" adds more gene pairs that are weaker (in terms of C-value <0.7) but are sufficiently close to the high-quality synteny chain (within a distance of 10, by default). This second step aims at adding more synteny signal. Finally, checkout for example a genome paper here: https://www.nature.com/articles/ng.3435 |
@tanghaibao thanks a lot. C-score sounds like a clever way to rank blast-type hits. Is there a reference I can site when mentioning this concept (c-score filter + liftover) in the Methods? I searched around with “c-score” but couldn’t find a paper immediately. Is this mentioned in the 2008 MCscan paper (and I missed it?)
|
I did not invent the use of c-score, or c-value, although arguably I was among the earliest to use it in the context of synteny inference. Reference: The initial credit goes to the Amphioxus genome paper: For the lift-over approach, it is an implementation detail and I tend to gloss over it in various genome papers that I worked on over the years. |
@tanghaibao Cool! Thanks again for the quick replies, |
Hi, I understand that the first two columns are the homologs identified via LAST, but I am wondering what the third column (those integer numbers) of .anchor file means.
GSVIVT01012028001 ppa011886m 515
GSVIVT01012027001 ppa026797m 297
GSVIVT01012026001 ppa006860m 609
GSVIVT01012023001 ppa000608m 2780
GSVIVT01012018001 ppa012865m 123
GSVIVT01012018001 ppa025457m 93
GSVIVT01012012001 ppa010496m 568
GSVIVT01012008001 ppa002064m 1180
The text was updated successfully, but these errors were encountered: