Skip to content

Commit

Permalink
Adding MRCOC to the graph for the selected articles linked to clinica…
Browse files Browse the repository at this point in the history
…l trials.
  • Loading branch information
sandeepkunkunuru committed Jan 30, 2022
1 parent 4341b07 commit e682601
Show file tree
Hide file tree
Showing 7 changed files with 125 additions and 13 deletions.
41 changes: 41 additions & 0 deletions data/open_knowledge_graph_on_clinical_trials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,44 @@
- Import the .csv file from local machine to PostgreSQL using following command :

`\COPY "table_name" FROM '/file_location/file_name.csv' DELIMITER ',' CSV QUOTE AS '"' HEADER;`


## MRCOC

- EDA - exploratory data analysis

```
$ wc -l data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021.txt
1734959115 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021.txt
$ cut -d '|' -f1,9,15 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021.txt > data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields.txt
$ sort -u data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields.txt > data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields_sorted.txt
$ wc -l data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields_sorted.txt
1734950157 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields_sorted.txt
$ wc -l data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields.txt
1734959115 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields.txt
$ cut -d'|' -f 7 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021.txt | grep 2020 | wc -l
65919771
$ grep -F "|2020|" data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021.txt > data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020.txt
$ wc -l data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020.txt
65919771 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020.txt
$ cut -d '|' -f1,9,15 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020.txt > data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020_selected_fields.txt
$ wc -l data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020_selected_fields.txt
65919771 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020_selected_fields.txt
$ head -n 5 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_with_Mesh_year_2020_selected_fields.txt
31493778|D000001|D000067128
30557043|D000001|D000069261
31926265|D000001|D000073658
31926265|D000001|D000074662
30557043|D000001|D000075702
```
6 changes: 6 additions & 0 deletions docs/open_knowledge_graph_on_clinical_trials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@ Below is a brief specification
- PostgreSQL [upsert](https://www.postgresqltutorial.com/postgresql-upsert/) statement
- PostgreSQL - pg_restore - restore only one selected [schema](https://stackoverflow.com/a/970491/294552)
- PostgreSQL - [Array functions](https://www.postgresql.org/docs/8.4/functions-array.html)
- [escape pipe character in grep](https://stackoverflow.com/a/23772497/294552)
- Execute PostgreSQL using psql [non-interactively](https://stackoverflow.com/a/6405296/294552)
- Check if string exists in file in [bash](https://stackoverflow.com/a/4749368/294552)
- Save psql inline query output to a [file](https://stackoverflow.com/a/11870348/294552)
- In PostgreSQL formulate a query to get [all items](https://stackoverflow.com/a/34592639/294552) in an array column
- Prevent grep from exiting when match is [not](https://unix.stackexchange.com/a/330662/47615) found

# MeSH

Expand Down
15 changes: 15 additions & 0 deletions src/main/bash/MRCOC_for_trial_linked_articles.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash
set -ex

PGPASSWORD=0Jg7GdFObf psql -U postgres -h 10.240.64.9 -d aact -t -A -F"|" -c "select p from public.trial_article , unnest(pubmed_articles) p order by p asc" > data/open_knowledge_graph_on_clinical_trials/pubmed_articles.txt

cut -d '|' -f1,9,15 data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021.txt > data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields.txt
sort -u data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields.txt > data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields_sorted.txt

# https://unix.stackexchange.com/a/330662/47615
#In-efficient way of searching articles in MRCOC. Instead file co-parsing approach suggested by https://www.linkedin.com/in/mandarapu-madhulatha-72bb6b2a/ is used in java.
cat data/open_knowledge_graph_on_clinical_trials/pubmed_articles.txt | while read f; do
article=$(echo ${f} | cut -d'|' -f 1 );
echo "${article}|";
grep -F "${article}|" data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields_sorted.txt >> data/open_knowledge_graph_on_clinical_trials/filtered_co_occurrence.txt || true;
done;
69 changes: 59 additions & 10 deletions src/main/java/com/vaidhyamegha/data_cloud/kg/App.java
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ public class App {
@Option(name = "-v", aliases = "--mesh-vocab-rdf", usage = "Path to the downloaded MeSH Vocabulary Turtle file.", required = false)
private String meshVocab = "data/open_knowledge_graph_on_clinical_trials/vocabulary_1.0.0.ttl";

@Option(name = "-co", aliases = "--mrcoc-sorted-file", usage = "Path to sorted MRCOC detailed co occurrence file for selected fields.", required = false)
private String mrcoc = "data/open_knowledge_graph_on_clinical_trials/detailed_CoOccurs_2021_selected_fields_sorted.txt";

@Option(name = "-m", aliases = "--mesh-rdf", usage = "Path to the downloaded MeSH RDF file.", required = false)
private String meshRDF = "data/open_knowledge_graph_on_clinical_trials/mesh2022.nt";

Expand Down Expand Up @@ -69,7 +72,7 @@ public void doMain(String[] args) throws IOException {

Model model = ModelFactory.createDefaultModel();

addAllTrials(model, prop);
addAllTrials(model);

FileManager.getInternal().addLocatorClassLoader(cl);

Expand All @@ -79,12 +82,13 @@ public void doMain(String[] args) throws IOException {
Model meshModel = ModelFactory.createDefaultModel();
meshModel.read(meshRDF, "NT");

addTrialConditions(model, meshModel, prop);
addTrialInterventions(model, meshModel, prop);
addTrialConditions(model, meshModel);
addTrialInterventions(model, meshModel);

addMeSHCoOccurrences(model, meshModel);
RDFDataMgr.write(new FileOutputStream(out), model, Lang.NT);
} else {
throw new UnsupportedOperationException("Query mode is not yet supported");
throw new UnsupportedOperationException("Non-build modes are not yet supported");
}
} catch (CmdLineException e) {
System.err.println(e.getMessage());
Expand All @@ -96,7 +100,49 @@ public void doMain(String[] args) throws IOException {
}
}

private void addAllTrials(Model model, Properties prop) throws IOException {
private void addMeSHCoOccurrences(Model model, Model meshModel) { //TODO: we will use mesHModel more appropriately soon to pick the RDF node directly from there.
Property pMeSHDUI = model.createProperty("MeSH_DUI");
String qAllArticles = prop.getProperty("all_articles");
String line = "";

try (BufferedReader br = new BufferedReader(new FileReader(mrcoc));
Connection conn = DriverManager.getConnection(prop.getProperty("aact_url"),
prop.getProperty("user"), prop.getProperty("password"));
PreparedStatement sAllArticles = conn.prepareStatement(qAllArticles); ) {

ResultSet resultSet = sAllArticles.executeQuery();

while (resultSet.next()) {
String article = resultSet.getString("article");

if(!article.equals(line))
while((line = br.readLine())!= null) if (article.equals(line)) break;

if (line == null) break;

do {
String[] ids = line.split("\\|");

Resource r = createResource(model, ids[0], RESOURCE.PUBMED_ARTICLE);
Resource dui1 = createResource(model, ids[1], RESOURCE.MESH_DUI);
Resource dui2 = createResource(model, ids[2], RESOURCE.MESH_DUI);

model.add(r, pMeSHDUI, dui1);
model.add(r, pMeSHDUI, dui2);

line = br.readLine();
} while (article.equals(line));
}
} catch (SQLException e) {
System.err.format("SQL State: %s\n%s", e.getSQLState(), e.getMessage());
throw new RuntimeException("Sorry, unable to connect to database");
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("Sorry, couldn't read MeSH co-occurrence links");
}
}

private void addAllTrials(Model model) {
Property pTrialId = model.createProperty("TrialId");
String qTrialIds = prop.getProperty("trial_ids");
String qTrialArticles = prop.getProperty("select_trial_articles");
Expand Down Expand Up @@ -151,6 +197,9 @@ private Resource createResource(Model model, String rId, RESOURCE rType) {
case PUBMED_ARTICLE:
uri = "https://pubmed.ncbi.nlm.nih.gov/" + rId;
return model.createResource(uri);
case MESH_DUI:
uri = "https://meshb.nlm.nih.gov/record/ui?ui=" + rId;
return model.createResource(uri);
default:
throw new RuntimeException("Unsupported resource type " + rType);
}
Expand Down Expand Up @@ -196,21 +245,21 @@ private void insertTrialPubMedArticles(String trialId, List<Integer> s) {
}
}

private void addTrialConditions(Model model, Model meshModel, Properties prop) {
private void addTrialConditions(Model model, Model meshModel) {
String query = prop.getProperty("aact_browse_conditions");
Property p = model.createProperty("Condition");

addTrialToMeSHLinks(model, meshModel, prop, query, p);
addTrialToMeSHLinks(model, meshModel, query, p);
}

private void addTrialInterventions(Model model, Model meshModel, Properties prop) {
private void addTrialInterventions(Model model, Model meshModel) {
String query = prop.getProperty("aact_browse_interventions");
Property p = model.createProperty("Intervention");

addTrialToMeSHLinks(model, meshModel, prop, query, p);
addTrialToMeSHLinks(model, meshModel, query, p);
}

private void addTrialToMeSHLinks(Model model, Model meshModel, Properties prop, String query, Property p) {
private void addTrialToMeSHLinks(Model model, Model meshModel, String query, Property p) {
try (Connection conn = DriverManager.getConnection(prop.getProperty("aact_url"),
prop.getProperty("user"), prop.getProperty("password"));
PreparedStatement preparedStatement = conn.prepareStatement(query)) {
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/com/vaidhyamegha/data_cloud/kg/RESOURCE.java
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
package com.vaidhyamegha.data_cloud.kg;

enum RESOURCE {TRIAL, PUBMED_ARTICLE}
enum RESOURCE {TRIAL, PUBMED_ARTICLE, MESH_DUI}
3 changes: 2 additions & 1 deletion src/main/resources/config.properties
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ select_trial_articles=select trial, pubmed_articles from trial_article;
insert_trial_articles=INSERT INTO trial_article(trial, pubmed_articles) VALUES (?,?) \
ON CONFLICT (trial) \
DO \
UPDATE SET pubmed_articles = EXCLUDED.pubmed_articles;
UPDATE SET pubmed_articles = EXCLUDED.pubmed_articles;
all_articles=select article from public.trial_article , unnest(pubmed_articles) p order by p asc
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ public void tearDown() throws Exception {
}

public void testTestGetPubMedIds() {

ESearchResult r = EntrezClient.getPubMedIds("NCT01874691");
System.out.println(r);
System.out.println(Arrays.toString(r.getIdList().toArray()));
assertEquals(r.getIdList().toArray().length, 9);
}
}

0 comments on commit e682601

Please sign in to comment.