pdbsearch is a Python library for searching for PDB structures using the RCSB web services.
>>> import pdbsearch >>> codes = pdbsearch.search(limit=5, ligand_name="CU") >>> codes ['3HW7', '2WKO', '2WOF', '2WOH', '2WO0']
pdbsearch can be installed using pip (you may need to use pip3
):
$ pip install pdbsearch
If you get permission errors, try using sudo
:
$ sudo pip install pdbsearch
The repository for pdbsearch, containing the most recent iteration, can be found here. To clone the pdbsearch repository directly from there, use:
$ git clone git://github.com/samirelanduk/pdbsearch.git
pdbsearch requires requests.
To test a local version of pdbsearch, cd to the pdbsearch directory and run:
$ python -m unittest discover tests
You can opt to only run unit tests or integration tests:
$ python -m unittest discover tests.unit
$ python -m unittest discover tests.integration
pdbsearch is a Python library for searching for PDB structures using the RCSB web services.
You can get all PDB codes without any particular search expression like so:
>>> import pdbsearch >>> codes = pdbsearch.search(limit=None) >>> len(codes) 174994
This will take a few seconds, and requires downloading a rather large JSON object over the network. Generally it is better to paginate the results:
>>> first_ten_codes = pdbsearch.search(limit=10) >>> second_ten_codes = pdbsearch.search(start=10, limit=10) >>> third_ten_codes = pdbsearch.search(start=20, limit=10)
You can sort the results by any of the terms at https://search.rcsb.org/structure-search-attributes.html:
>>> most_recent_codes = pdbsearch.search(sort="rcsb_accession_info.deposit_date") >>> earliest_codes = pdbsearch.search(sort="-rcsb_accession_info.deposit_date")
As these are somewhat cumbersome, some of them have a shorthand:
>>> pdbsearch.search(limit=5, sort="code") ['9XIM', '9XIA', '9WGA', '9RUB', '9RSA'] >>> pdbsearch.search(limit=5, sort="-resolution") ['3NIR', '5D8V', '1EJG', '3P4J', '5NW3']
You can sort by multiple criteria:
>>> pdbsearch.search(limit=5, sort=["-atoms", "released"]) ['1ANP', '6UOU', '6UOW', '1Q7O', '6QTF']
You can search by passing keywords to the search function:
>>> pdbsearch.search(limit=5, ligand_name="ZN") ['3HW7', '3I7I', '3I7G', '2WFX', '2WGT']
You can modify the operator used with double underscores:
>>> pdbsearch.search(limit=5, ligand_name__in=["ZN", "CU"]) ['3HW7', '3I7I', '3I7G', '2WFX', '2WGT'] >>> pdbsearch.search(limit=5, resolution__lt=2) ['3HW3', '3I83', '3HVS', '3HW4', '3HW5'] >>> pdbsearch.search(limit=5, atoms__within=[200, 300]) ['2WH9', '2WPY', '395D', '396D', '2X8Q']
These are some shorthands, but you can search by any of the terms in the above linked list by replacing the dot with a double underscore:
>>> pdbsearch.search(limit=5, citation__rcsb_authors="Sula, A.") ['4CAH', '4CAI', '4X8A', '4X88', '4X89']
If you use more than one term, they will be combined with AND operators:
>>> pdbsearch.search(limit=5, ligand_name="ZN", atoms__within=[200, 300]) ['3WUP', '3ZNF', '2YTA', '2YTB', '2YSV']
24 Jul 2022
- Updated library for v2 of the RCSB search API.
29 May 2021
- Added search criteria.
- Added AND chaining for search criteria.
25 April 2021
- Added ability to sort results.
- Created shorthand system for common sort criteria.
2 March 2021
- Started library.
- Added ability to fetch all PDB codes.
- Basic pagination.