-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract annotations from IntAct / ComplexPortal #25
Comments
I'm here! Just a small, political, correction: there are 10 members of the IMEx consortium that curate into IntAct. MINT and IntAct itself are just 2 of them. E.g., DIP have also contributed many SARS publications in this last month. And yes, we have also decided to annotate to the longer polyprotein in SARS-CoV and SARS-CoV-2 (e.g. R1AB, P0DTD1) except for the small protein nsp11 that is only translated from the short polyprotein of SARS-CoV-2 (R1a, P0DTC1). The long polyprotein codes for nsp12 at the ribosomal slippage site. For Complex Portal you can find the data via our organism page: Any questions, please ask! Slack ID is the same as GitHub. |
@all-contributors please add @bmeldal for ideas, content |
I've put up a pull request to add @bmeldal! 🎉 |
Thank you! |
@gtauriello so the annotations are only for the virus proteins, right? |
IntAct & ComplexPortal have both, virus and human proteins. |
@D-Barradas also unsure about the question. Personally, I would start by looking at all interactions returned in the query above (or the download) and extract any positional data you can find. The query should restrict it to coronavirus-relevant interactions. The annotation system works for any UniProtKB AC and not just the virus proteins. So you can safely have annotations mapped e.g. on structures for the human proteins involved in those interactions... |
Hi @gtauriello @bmeldal : |
Yes for the polyproteins, you will need to do some extra mapping. Assuming you have a position within P0DTD1-PRO_... (or P0DTC1-PRO_...) you need to proceed as follows:
As an example: say you have position 10 in P0DTD1-PRO_0000449623. From UniProt you see that PRO_0000449623 covers positions 3264-3569. That means that pos. 10 in P0DTD1-PRO_0000449623 corresponds to pos. 3273 in P0DTD1. Also any position that you find in P0DTC1, should be mapped to P0DTD1 as long as it's not in the "Non-structural protein 11" (i.e. position >= 4393 of P0DTC1). Technically you could also duplicate all those annotations but it's easier to have them just once... @bmeldal I am assuming above that your positions are 1-indexed: i.e. that the first AA of a protein is at position "1" and not "0". Is that correct? |
Morning, Yes, that is all correct! |
A nice example is here (thx @D-Barradas for pointing me to it). I quickly turned it manually into an annotation (see project link here):
I will make sure that on our side we can nicely display annotations on both subunits of heteromers (currently you can see either ACE2 or spike annotations but not both at the same time). Having a script that scans IntAct to extract a csv like above automatically (with some clever coloring logic) would be a really useful addition. |
As a starting point here some files (thx @D-Barradas ): Archive.zip It contains:
Still TODO:
|
So we ended up doing another script to extract PPI between SARS-CoV-2 and human proteins from IntAct. The script is loosely based on the one above and attached here: PPI-IntAct.zip The result of it is a dedicated page on our server listing the structural coverage for all those interaction partners: https://swissmodel.expasy.org/repository/species/2697049/interactions |
There's a typo on https://swissmodel.expasy.org/repository/species/2697049 "IntAct lists interactions derived from literature curation or direct user submissions. We extracted those interactions and list the ones between SARS-CoV-2 and human host proteins with their structural coverage in a decicated interaction page." should read dedicated Freudian slip??? I know the data is not yet saturated... ;-) Great work! Please remember to cite IntAct in any resulting manuscripts. |
Feature suggestion: On the interactions page: https://swissmodel.expasy.org/repository/species/2697049/interactions Allow the user to collapse the list for a given protein again without having to open another one. When the list is long (eg spike) it becomes difficult to navigate the page. |
Oops good point with the typo. I must have been thirsty when I wrote that... ;-) |
The two EBI resources IntAct and ComplexPortal contain curated data on experimentally observed interactions between proteins.
From the EBI webpage you find links to query the IntAct webpage or download the IntAct data in PSI-MI TAB format here: [ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/datasets/Coronavirus.zip].
Notes:
Also: Birgit Meldal from the IntAct / ComplexPortal team is available in the Slack channel for questions and I will update this comment if we get new input and links that can be of general use.
The text was updated successfully, but these errors were encountered: