-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.md.old
158 lines (142 loc) · 7.63 KB
/
README.md.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
### <a href="https://arxiv.org/abs/2303.02166"> KGNET</a> Accepted at <a href="https://icde2023.ics.uci.edu/papers-special-track/">ICDE-2023</a>
# KGNET - A GML-Enabled RDF Engine
<p><b>Note: We are working on a full release of KGNET with all working functionalities by end of Augest</b></p>
<img alt="kgnet_architecture" src="docs/imgs/kgnet.png" width="500"/>
<div style="text-align: justify">
<p>This vision paper proposes KGNet, an on-demand graph machine learning (GML) as a service on top of RDF engines to support GML-enabled SPARQL queries. KGNet automates the training of GML models on a KG by identifying a task-specific subgraph. This helps reduce the task-irrelevant KG structure and properties for better scalability and accuracy. While training a GML model on KG, KGNet collects metadata of trained models in the form of an RDF graph called KGMeta, which is interlinked with the relevant subgraphs in KG. Finally, all trained models are accessible via a SPARQL-like query. We call it a GML-enabled query and refer to it as SPARQLML. KGNet supports SPARQLML on top of existing RDF engines as an interface for querying and inferencing over KGs using GML models. The development of KGNet poses research opportunities in several areas, including meta-sampling for identifying task-specific subgraphs, GML pipeline automation with computational constraints, such as limited time and memory budget, and SPARQLML query optimization. KGNet supports different GML tasks, such as node classification, link prediction, and semantic entity matching. We evaluated KGNet using two real KGs of different application domains. Compared to training on the entire KG, KGNet significantly reduced training time and memory usage while maintaining comparable or improved accuracy. The KGNet source-code1 is available for further study</p></div>
## SPARQL-ML Demo Vedio
<a href="https://www.youtube.com/watch?v=DJaDhJ-OW-Q&ab_channel=hussienshahata" target="_blank"><img alt="SPARQL-ML Demo" src="docs/imgs/SPARQLML-Demo.png" width="300"/></a>
<br/>
<a href="https://colab.research.google.com/drive/1mBVNdAGYi7V6giMYCrYhz7oCvZpTvDyI?usp=sharing" target="_blank">
<img style="vertical-align:middle" src="https://img.icons8.com/?size=512&id=lOqoeP2Zy02f&format=png" width="40"/>
<span style="vertical-align:top" >SAPRQL-ML Demo Colab notebook.</span> </a>
## Installation
* Clone the `kgnet` repo
* Create `kgnet` Conda environment (Python 3.8) and install pip requirements.
* Activate the `kgnet` environment
```commandline
conda activate kgnet
```
## Quickstart
<ul>
<li>
<b>Install <a href="https://github.com/openlink/virtuoso-opensource">openlink Virtuoso Version 07.20.3229 </a> and load the knowledge graphs used in this paper. We use DBLP version <a href="https://dblp.org/rdf/release/">2022-06-01</a>, and <a href="https://data.deepai.org/FB15K.zip">FB15K</a>. </b>
</li>
<li>
<b> Prepaare endpoint URI for each graph to be used with kgnet. </b>
</li>
</ul>
<b>Generating the Meta-sampled graph dataset:</b>
1. under dataSampling the node sampler and edge sampler methods are used to sample large datasets for node classification and link prediction respectively. the sampling methods take the sull graph as input and the traget node or target edge and generate the task sampled subgraph
```python
# run node sampler
python nodesampler.py --KG DBLP --targetNode https://dblp.org/rdf/schema-publishedIn
python edgesampler.py --KG FB15K --targetEdge /people/person/profession
```
2. Run data transformer [GML dataTransformer](/DataTransform/kgnet_data_transformer.py)
transform your dataset into adjaceny metrics for node classification task by providing
1. the triples (CSV/TSV) dataset
2. the splitting criterria i.e. https://dblp.org/rdf/schema#yearOfEvent
3. the target nodes labels i.e. https://dblp.org/rdf/schema#publishedIn
The generated splits are
```
knoweldge graph dataset /
├── mapping
│ └── nodex_entid2name.csv
│ └── ....
├── raw
│ └── node-label
│ └── node-feat
│ └── relations
│ └── nodex_rel_nodey
│ └── ....
├── split
│ └── train
│ └── test
│ └── valid
...
```
```python
# run data transformer sampler
python kgnet_data_transformer.py --KG DBLP --splittingEdge https://dblp.org/rdf/schema-yearOfEvent --labelsEdge https://dblp.org/rdf/schema-publishedIn
```
3. Run your GML-Queries [GML Queries](/GMLOperators_old)
- GML **Insert (Model Train)** Query
```python
gml_query=""" prefix dblp:<https://www.dblp.org/>
prefix kgnet:<https://www.kgnet.com/>
Insert into <kgnet> { ?s ?p ?o }
where {select * from kgnet.TrainGML(
{modelName: 'MAG_Paper-Venue_Classifer',
GML-Operator: kgnet:NodeClassifier,
TargetNodes: dblp:publication,
NodesLables: dblp:venue,
aggregator: 'mean',
activationFunction: 'sigmoid',
hyperParameters:{ batchSize: 50,
n-layers:3,
h-dim:100 },
budget:{ MaxMemory:50GB,
MaxTime:1h,
Priority:ModelScore} } )}"""
python InsertOperator.py --query gml_query
```
- GML **Node Classifier** Query
```python
gml_query="""prefix dblp: <https://www.dblp.org/>
prefix kgnet: <https://www.kgnet.com/>
select ?title ?venue
where {
?paper a dblp:Publication.
?paper dblp:title ?title.
?paper ?NodeClassifier ?venue.
?NodeClassifier a kgnet:NodeClassifier.
?NodeClassifier kgnet:TargetNode dblp:paper.
?NodeClassifier kgnet:NodeLabel dblp:venue.}"""
python NodeClassifier.py --query gml_query
```
- GML **Delete** Query
```python
gml_query="""prefix dblp:<https://www.dblp.org/>
prefix kgnet:<https://www.kgnet.com/>
delete {?NodeClassifier ?p ?o}
where {
?NodeClassifier a kgnet:NodeClassifier.
?NodeClassifier kgnet:classifierTarget dblp:publication.
?NodeClassifier kgnet:classifierLabel dblp:venue.}"""
python InsertOperator.py --query gml_query
```
- GML **Link Predictor** Query
```python
gml_query="""prefix dblp: <https://www.dblp.com/>
prefix kgnet: <https://www.kgnet.com/>
select ?person ?affiliation
where { ?person a fb:person.
?person ?LinkPredictor ?affiliation.
?LinkPredictor a kgnet:LinkPredictor.
?LinkPredictor kgnet:SourceNode dblp:person.
?LinkPredictor kgnet:DestinationNode dblp:affiliation.
?LinkPredictor kgnet:TopK-Links 10.}"""
python LinkPredictor.py --query gml_query
```
## Using the Kgnet Web Interface
Kgnet provides predefined operators in form of python apis that allow seamless integration with a conventional data science pipeline.
Checkout our [rep](https://github.com/hussien/KGNet-Interface) and [KGNET APIs](GMLWebServiceApis)
## kgnet APIs
See the full list of supported GML-Operators [here](docs/kgnet_gml_operators.md).
## Citing Our Work
If you find our work useful, please cite it in your research:
<br>
@INPROCEEDINGS{10184515,
author={Abdallah, Hussein and Mansour, Essam},
booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)},
title={Towards a GML-Enabled Knowledge Graph Platform},
year={2023},
volume={},
number={},
pages={2946-2954},
doi={10.1109/ICDE55515.2023.00225}}
## Publicity
This repository is part of our submission to CIDR-2023. We will make it available to the public research community upon acceptance.
## Questions
For any questions please contact us at:<br/> [email protected] , [email protected]