Skip to content

Latest commit

 

History

History
68 lines (50 loc) · 4.83 KB

README.md

File metadata and controls

68 lines (50 loc) · 4.83 KB

SDFEater logo

Codacy Badge

SDF parser written in Java running from command-line interface (CLI). SDFEater not only eats parse your SDF files, but also can add additional data to the output.

Publications and resources

If you need more detailed information, take a look at these publications and resources. There you will find detailed description of the parser, performance tests and example Cypher outputs.

  1. Ł. Szeremeta, "SDFEater: A Parser for Chemoinformatics Formats" 9 2018 [Online]. Available: https://doi.org/10.26434/chemrxiv.7123193.
  2. D. Tomaszuk and Ł. Szeremeta, "Named Property Graphs" in Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 15. IEEE, 2018, pp. 173–177. (2018) [Online]. Available: http://dx.doi.org/10.15439/2018F103.
  3. Ł. Szeremeta and D. Tomaszuk, “SDFParser example Cypher outputs”. figshare, 10-May-2018 [Online]. Available: https://doi.org/10.6084/m9.figshare.6249962.

How to start?

Simply download one of the ready to use JAR file from project releases. You can also clone this repository and build the project yourself.

Build project yourself

  1. Clone this repository:
git clone https://github.com/lszeremeta/SDFEater.git
  1. Build SDFEater using Apache Maven:
cd SDFEater
mvn clean package

Built JAR files can be found in the target directory.

Example usage

java -jar SDFEater-version-jar-with-dependencies.jar -i ../examples/chebi_special_char_test.sdf -f cypher -up

Example above reads SDF input file, adds periodic table data for atoms, try to replace chemical database IDs with URL and give Cypher file in the output.

In examples directory you can find example SDF files based on data from ChEBI (CC BY 4.0) and DrugBank open structures (CC0 1.0) databases.

CLI options

Running SDFEater without parameters displays help.

  • -i,--input <arg> - input SDF file path (required)
  • -f,--format <arg> - output format (cypher, cvme, smiles, inchi) (required)
  • -p,--periodic - add additional atoms data from periodic table (for cypher output format)
  • -u,--urls - try to generate full database URLs instead of IDs (enabled in cvme)

Output formats

You can specify the output format using -f,--format. Available output formats:

  • cypher - Cypher compound, atoms, bonds and relation ready to import to the Neo4j graph database,
  • cvme - CVME file format based on SKOS,
  • smiles - plain text SMILES (if available in the compound property)
  • inchi - plain text InChI (if available in the compound property)

Used open source projects

The sample SDF files in the examples directory are based on data from ChEBI (CC BY 4.0) and DrugBank open structures (CC0 1.0) databases.

Contribution

Would you like to improve the SDFEater? Great! We are waiting for your help and suggestions. If you are new in open source contributions, read How to Contribute to Open Source.

License

Distributed under MIT license.