SDF parser written in Java running from command-line interface (CLI). SDFEater not only eats parse your SDF files, but also can add additional data to the output.
If you need more detailed information, take a look at these publications and resources. There you will find detailed description of the parser, performance tests and example Cypher outputs.
- Ł. Szeremeta, "SDFEater: A Parser for Chemoinformatics Formats" 9 2018 [Online]. Available: https://doi.org/10.26434/chemrxiv.7123193.
- D. Tomaszuk and Ł. Szeremeta, "Named Property Graphs" in Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 15. IEEE, 2018, pp. 173–177. (2018) [Online]. Available: http://dx.doi.org/10.15439/2018F103.
- Ł. Szeremeta and D. Tomaszuk, “SDFParser example Cypher outputs”. figshare, 10-May-2018 [Online]. Available: https://doi.org/10.6084/m9.figshare.6249962.
Simply download one of the ready to use JAR file from project releases. You can also clone this repository and build the project yourself.
- Clone this repository:
git clone https://github.com/lszeremeta/SDFEater.git
- Build SDFEater using Apache Maven:
cd SDFEater
mvn clean package
Built JAR files can be found in the target directory.
java -jar SDFEater-version-jar-with-dependencies.jar -i ../examples/chebi_special_char_test.sdf -f cypher -up
Example above reads SDF input file, adds periodic table data for atoms, try to replace chemical database IDs with URL and give Cypher file in the output.
In examples directory you can find example SDF files based on data from ChEBI (CC BY 4.0) and DrugBank open structures (CC0 1.0) databases.
Running SDFEater without parameters displays help.
-i,--input <arg>
- input SDF file path (required)-f,--format <arg>
- output format (cypher
,cvme
,smiles
,inchi
) (required)-p,--periodic
- add additional atoms data from periodic table (forcypher
output format)-u,--urls
- try to generate full database URLs instead of IDs (enabled incvme
)
You can specify the output format using -f,--format
. Available output formats:
cypher
- Cypher compound, atoms, bonds and relation ready to import to the Neo4j graph database,cvme
- CVME file format based on SKOS,smiles
- plain text SMILES (if available in the compound property)inchi
- plain text InChI (if available in the compound property)
- Apache Commons CLI as CLI controller (Apache License 2.0),
- Gson as periodic table JSON parser (Apache License 2.0),
- periodic-table - base JSON periodic table file (ISC License).
The sample SDF files in the examples directory are based on data from ChEBI (CC BY 4.0) and DrugBank open structures (CC0 1.0) databases.
Would you like to improve the SDFEater? Great! We are waiting for your help and suggestions. If you are new in open source contributions, read How to Contribute to Open Source.
Distributed under MIT license.