OpenPIL is a non-profit organisation with an AI at its core. The AI, maintained and developed by Malik Ahmed (MPharm), extracts essential drug information from Summary of Product Characteristics (SmPC) documents. These are drug documents which hold all the important information that doctors and pharmacists use to make decisions about prescribing medicine. OpenPIL AI requires the user to write one line of code, and a path to the SmPC .pdf file. It then processes the natural language in the document using datasets curated by Malik, sourced from copyright-free libraries (see references below), to show information on active-substances, active-excipients, formulation, drug-drug interactions, and drug-class interactions. It took the OpenPIL team of clinical advisors about 1 hour on average to extract that information into an excel spreadsheet manually per SmPC; the AI run time is approx. 4 minutes for a medium length SmPC document, so it's pretty fast, especially considering the volume of data it's processing through.
Currently this essential clinical medication information is highly-privatised, which restricts access to healthcare technology developers who need it to create ground-breaking products for patients. This restriction limits the current state of healthcare-technology, and indirectly is putting peoples health at greater risk. This is particularly of concern for those in developing and war-torn countries, whose access to up-to-date medicinal information is limited, even though it doesn't have to be. The aim of making the OpenPIL AI open-source is to accelerate the development of affordable drug-databases and healthcare technology around the world!
These are the instructions to install the OpenPIL AI locally and get started with analysing those Summary of Product Characteristics Documents (.pdf). NOTE: The AI currently only works for SmPC's in European format.
The OpenPIL AI is really easy to install. Simply type the below command into your terminal.
pip install OpenPIL
If this doesn't work, make sure you have the dependencies, as can be seen below.
You will need the latest version of python.
pip install --upgrade python
You will need the following modules (nltk, PyPDF2, pdftotext):
pip install nltk
pip install PyPDF2
pip install pdftotext
All other modules should come pre-installed with Python3, they are as follows incase you are missing any:
- re
- string
- math
- ctypes
- sys
- platform
The OpenPIL AI requires only one line of code to run, so it's really easy! Here is how to set it up in a python environment.
from OpenPIL import OpenPIL
date = OpenPIL.AI("/path/to/the/SmPC.pdf")
print(data)
and approx. 4 minutes later, you should see this in your python terminal!
Compiling positive class interactions...
Compiling negative class interactions...
Compiling caution classes...
Compiling caution drugs...
Compiling positive interaction drugs...
Compiling negative interaction drugs...
SmPC Complete!
{
'SMPC NAME': '/path/to/the/SmPC.pdf',
'BRAND NAME': 'drug's brand name',
'ACTIVE SUBSTANCE(S)': ['array of all active substances in drug'],
'ACTIVE EXCIPIENT(S)': ['array of all active excipients in drug'],
'FORMULATION': ['form of drug e.g. tablet'],
'INTERACTIVE DRUG CLASSES': ['array of any drug-classes that interact with the drug'],
'INTERACTIVE DRUGS': ['comprehensive array of all drug's that interact, including those contained within each drug-class that interacts'],
'CAUTIONS': ['array of drugs that are cautioned for use']
}
And that's it! Get a group of summary of product characteristic documents in the .pdf format stored locally, run a simple for-loop through them, sit back 🪑😎, wait, and then BOOM 💥🤯! You're very own clinical drug-information database!
Please note, that the accuracy and reliability hasn't been fully tested yet, although, OpenPIL are working on a research paper to publish that will verify the current results. So, OpenPIL makes no guarantees to the safety of the information extracted, and does not recommend its use in clinical practice. The Apache License 2.0 applies.
The datasets used for the OpenPIL AI were curated by Malik Ahmed and they are as follows:
- Add Active Substance Detection
- Add Active Excipient Detection
- Add Formulation Detection
- Add Drug-Class Interaction Detection
- Add Drug-Drug Interaction Detection
- Replace python similarity algorithm with C to improve performance from ~40 minutes/SmPC to ~4 minutes/SmPC
- Launch OpenPIL AI open source!
- Add Side-Effects Detection
- Add Use in Pregnancy and Breastfeeding Detection
- Add Storage Conditions Detection
- Publish peer-reviewed research to validate the accuracy and reliability of the AI
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/CoolFeature
) - Commit your Changes (
git commit -m 'Add some CoolFeature'
) - Push to the Branch (
git push origin feature/CoolFeature
) - Open a Pull Request
Distributed under the under the Apache License 2.0. See LICENSE.txt
for more information.
Malik Ahmed - [email protected]
Project Link: https://github.com/OpenPIL/OpenPIL
Below are all the resources listed that were used to compile the OpenPIL AI Datasets, with their respective licensing information as of January 27 2022.
- drugNameDataset.py was compiled by extracting the drug and supplement names listed under the European Medicines Agency, OpenFDA NDC (CC0) and Drugs@FDA (CC0), NHS BSA (Open Government License), Netherlands Medicines Agency (Re-use of Government Information Act).
- drugClassSynonymDataset.py was compiled using the ChEBI, listed under 'CC0' for 'Synonyms' in the User Manual.
- drugClassDataset.py was compiled using the OpenFDA NDC API (CC0) and the OpenFDA Drugs@FDA API (CC0). The malik_similarity_algorithm.c includes two sources of external code: the jaro winkler distance algorithm (GNU General Public License V3 or Later) and the ratcliff obershelp distance algorithm (terms of unlicense).
All project code other than that mentioned above, was written by Malik Ahmed, and is hereby placed under the Apache License 2.0.