-
Import TA1 AIF to GraphDB repository default graph - this will be the source repo (TA2 input)
- Make sure the repo has read access enabled
-
Import TA1 AIF to another GraphDB repository default graph - this will be the destination repo (TA2 output)
- Make sure the repo has read and write access enabled
On the TA2 output repo, the TA1 AIF is imported to the default graph. Each clustering on the same TA1 output is kept in the same TA2 repo on different named graph
python ta2_runner.py <input.param>
[DEFAULT]
endpoint=http://gaiadev01.isi.edu:7200/repositories # graphdb endpoint
repo_src=jchen-test-ta1 # repo for TA1 output (read only)
repo_dst=jchen-test-ta2 # repo where TA2 output will be inserted
graph=http://www.isi.edu/002 # graphdb named graph for the clustering
version=002 # string to distinguish different clustering version runs
delete_existing_clusters=False # True will delete all the existing clusters in TA2 repo (including named graph)
outdir=/nas/home/jchen/store_data/jchen-test-ta2 # a directory where output files will be stored (dataframes, clustering files)
cluster_nb=/lfs1/jupyterhub_data_dir/share/yixiangy/ta2-er.ipynb # Notebook used to run clustering
The pipeline does the followings:
-
Generate dataframe from source repo [Done from Jupyter Notebook]
-
Add translation columns to dataframe [Done from Jupyter Notebook]
-
Clustering [Done from Jupyter Notebook specified in the param]
-
Generate singleton event clusters (
gen_event_clusters.py
) -
Generate AIF (
gaia-knowledge-graph/update_kg/Updater.py
)Triples are created and inserted into the TA2 output repo, specified named graph Use
updater.run_all()
to insert all the data:- Delete existing clusters in TA2 repo if
delete_existing_clusters
is true. This should only be done once on the repo.updater.run_delete_ori()
- Add TA2 system
updater.run_system()
- Insert clusters (entity, event, relation)
upater.run_clusters()
- Insert cluster prototypes
upater.run_insert_proto()
- Insert superedges
upater.run_super_edge()
- Insert informative justifications for clusters
upater.run_inf_just_nt()
- Insert links for entity clusters
upater.run_links_nt()
- Delete existing clusters in TA2 repo if
To export from the GraphDB GUI
- select the TA2 output repository
- Go to Explore --> Graphs overview
- Check
The default graph
and the named graphgraph
specified in the parameter file - Click on the
Export repository
dropdown and chooseTurtle
- Select
Yes
on the warning dialog and AIF will download
# Either build the docker image:
$ docker build . -t gaiaaida/ta2:latest
# or pull the image from dockerhub
$ docker pull gaiaaida/ta2:latest
# Prepare data:
$ mkdir ./input
$ mkdir ./output
$ cp ./test/for_ta2_pipeline_test.zip ./input/
$ cp ./test/kg_test.zip ./input/
$ cd ./input/
$ unzip kg_test.zip
$ tree
.
├── for_ta2_pipeline_test.zip
├── kg
│ ├── alternate_names.tab
│ ├── entities.tab
│ └── member_states.tab
└── kg_test.zip
1 directory, 5 files
$ cd ..
# Run the docker container:
$ docker run --name=ta2-test -v $(pwd)/input/:/input/ -v $(pwd)/output/:/output/ gaiaaida/ta2:latest
# Observer the results:
$ tree ./output/
./output/
├── export.ttl
└── ta2-test-ta2
├── document_001.h5
├── entity_valid_001.h5
├── entity_with_labels_001.h5
├── GenerateDataframe2019.out.ipynb
├── relation_001.h5
└── relation_roles_001.h5
1 directory, 7 files