-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T008: Current query misses existing RCSB ligands #248
Comments
Hi @BJWiley233, Thanks a lot for raising this issue, 3ZLS is an interesting case! As a general remark: Compared to what we are currently doing in T008, I have the feeling there should be a nicer way to check for all non-polymer residues in a PDB entry. For example, parsing all non-polymer But --- building on what we already have in T008 and what you suggested here, we could use both queries and filter the results by ligand size (to extract solvents and ions). Ligand size could be a user-defined value here. # Example PDBs
pdb_id = "5UG9"
pdb_id = "3ZLS"
def get_ligands(pdb_id, ligand_min_size=100):
"""
RCSB has not provided a new endpoint for ligand information yet. As a
workaround we are obtaining extra information from ligand-expo.rcsb.org,
using HTML parsing. Check Talktorial T011 for more info on this technique!
"""
info = pypdb.get_info(pdb_id)
# Extract ligands (not-so-nice workaround)
# - marked as non-polymer in the `rcsb_entry_info` field
# - since not all non-polymers are listed in the `rcsb_entry_info` field,
# also look for residues with unchecked bond angle geometry
# (see discussion at https://github.com/volkamerlab/teachopencadd/issues/248)
_nonpolymers1 = info.get("rcsb_entry_info", {}).get("nonpolymer_bound_components", [])
_nonpolymers2 = info.get("pdbx_vrpt_summary", {}).get("restypes_notchecked_for_bond_angle_geometry",[])
nonpolymers = list(set(_nonpolymers1 + _nonpolymers2))
# Extract ligand annotations from ligand-expo.rcsb.org
ligands = {}
for ligand_expo_id in nonpolymers:
url = f"http://ligand-expo.rcsb.org/reports/{ligand_expo_id[0]}/{ligand_expo_id}/"
print(url)
r = requests.get(url)
r.raise_for_status()
html = BeautifulSoup(r.text)
info = {}
for table in html.find_all("table"):
for row in table.find_all("tr"):
cells = row.find_all("td")
if len(cells) != 2:
continue
key, value = cells
if key.string and key.string.strip():
info[key.string.strip()] = "".join(value.find_all(string=True))
# Only keep ligands that
# - are of component type non-polymer
# - have a MW of more than 100 Da
if info["Component type"].lower() == "non-polymer" and float(info["Molecular weight"].split()[0]) >= ligand_min_size:
ligands[ligand_expo_id] = info
print(_nonpolymers1)
print(_nonpolymers2)
print(ligands.keys())
return ligands
get_ligands(pdb_id) For reference: |
I have actually been using import urllib3
http = urllib3.PoolManager()
query = '''{
entry(entry_id: "%s") {
nonpolymer_entities {
pdbx_entity_nonpoly {
comp_id
name
rcsb_prd_id
}
}
}
}''' % pdb_id
query_url='https://data.rcsb.org/graphql?query=%s' % query
r = http.request('GET', query_url, preload_content=False)
r.status
my_json = json.loads(r.data) |
Thanks @BJWiley233 and Rachel Green for the fix.
Thanks, @BJWiley233, using GraphQL is a great idea! Will incorporate the changes to T008 (#259). |
T008: Use GraphQL for PDB ligand search #248
Hi @dominiquesydow , When I am running the same coddes for pdb_id = '3POZ', it is giving the following error: Traceback (most recent call last): How this could be an empty sequence when there are two bound ligands 03P and SO4? Out of which 03P is the bigger one. |
Did you pull the latest changes from the The updated get_ligands("3POZ") returns the following:
|
@dominiquesydow Thanks. I haven't seen the recent changes in the codes. Its working fine, now. |
Hi,
For talktorial 8 for accessing RCSB I think RCSB is having issue with not having all the ligands under the field
["rcsb_entry_info"]["nonpolymer_bound_components"]
. For instance I am using MAP2K1 but I'm for sure it will affect other proteins and for entry https://data.rcsb.org/rest/v1/core/entry/3ZLS there is only["NA"]
value fornonpolymer_bound_components
when the structure has '92P' as the largest ligand. I found an alternative under["pdbx_vrpt_summary"]["restypes_notchecked_for_bond_angle_geometry"]
so the following should work better as an alternative in talktorial 8.This is also kind of interesting to find out that most ligands are not checked for the bond angle geometry. All the more reason to use TinkerTools/poltype2 for ligand geometry optimization! 😄
The text was updated successfully, but these errors were encountered: