Skip to content

Commit

Permalink
Merge pull request #131 from monarch-initiative/pubmed_retrieve
Browse files Browse the repository at this point in the history
New PubMed eutil functions
  • Loading branch information
caufieldjh authored Jun 15, 2023
2 parents 245a124 + d318df0 commit 21985c6
Show file tree
Hide file tree
Showing 6 changed files with 392 additions and 60 deletions.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,21 @@ Currently three different strategies for knowledge extraction have been implemen
* OpenAI account
* Optionally, [BioPortal](https://bioportal.bioontology.org/) account (for grounding)

You will need to set both API keys using the [Ontology Access Kit](https://github.com/INCATools/ontology-access-kit)
You will need to set API keys using the [Ontology Access Kit](https://github.com/INCATools/ontology-access-kit):

```bash
poetry run runoak set-apikey -e openai <your openai api key>
poetry run runoak set-apikey -e bioportal <your bioportal api key>
poetry run runoak set-apikey -e ncbi-email <your email address>
poetry run runoak set-apikey -e ncbi-key <your NCBI api key>
```

The OpenAI key is necessary for using OpenAI's GPT models. This is a paid API and you will be charged based on usage. If you do not have an OpenAI account, [you may sign up here](https://platform.openai.com/signup).

The BioPortal key is necessary for using ontologies from [BioPortal](https://bioportal.bioontology.org/). You may get a key by signing up for an account on their web site.

The NCBI email address and API key are used for retrieving text and metadata from PubMed. You may still access these resources without identifying yourself, but you may encounter rate limiting and errors. [Details on NCBI accounts and keys are here.](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/)

## Setup

For feature development and contributing to the package:
Expand Down
16 changes: 8 additions & 8 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ jsonlines = "^3.1.0"
python-multipart = "^0.0.5"
linkml-owl = "^0.2.7"
beautifulsoup4 = "^4.11.1"
eutils = "^0.6.0"
class-resolver = ">=0.4.2"
inflect = "^6.0.2"
bioc = "^2.0.post5"
Expand Down Expand Up @@ -50,6 +49,7 @@ langchain = "^0.0.167"
pygpt4all = {version = "^1.1.0", extras = ["gpt4all"], optional = true}
streamlit = "^1.22.0"
gpt4 = "^0.0.1"
requests = "^2.31.0"

[tool.poetry.dev-dependencies]
pytest = "^7.1.2"
Expand Down
14 changes: 7 additions & 7 deletions src/ontogpt/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,8 @@ def write_extraction(
"-m",
"--model",
help="Model name to use, e.g. openai-text-davinci-003."
" The first part of this name must be the source of the model."
" The second part must be the model name.",
" The first part of this name must be the source of the model."
" The second part must be the model name.",
)
prompt_template_option = click.option(
"--prompt-template", help="Path to a file containing the prompt."
Expand Down Expand Up @@ -270,7 +270,7 @@ def extract(
@output_format_options
@click.argument("pmid")
def pubmed_extract(pmid, template, output, output_format, **kwargs):
"""Extract knowledge from a pubmed ID."""
"""Extract knowledge from a single PubMed ID."""
logging.info(f"Creating for {template}")
pmc = PubmedClient()
text = pmc.text(pmid)
Expand All @@ -288,14 +288,13 @@ def pubmed_extract(pmid, template, output, output_format, **kwargs):
@output_format_options
@click.argument("search")
def pubmed_annotate(search, template, output, output_format, **kwargs):
"""Retrieve pubmed IDs for a search term, then annotate them using a template."""
"""Retrieve a collection of PubMed IDs for a search term, then annotate them using a template."""
logging.info(f"Creating for {template}")
pmc = PubmedClient()
pmids = pmc.get_pmids(search)
# for pmid in pmids:
textlist = pmc.text(pmids)
for index in range(25):
# text = pmc.text(str(pmid))
text = pmc.text(str(pmids[index]))
text = textlist[index]
ke = SPIRESEngine(template, **kwargs)
logging.debug(f"Input text: {text}")
results = ke.extract_from_text(text)
Expand Down Expand Up @@ -1159,5 +1158,6 @@ def list_models():
else:
print(modelname[0])


if __name__ == "__main__":
main()
Loading

0 comments on commit 21985c6

Please sign in to comment.