forked from foodkg/foodkg.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
kbqa.html
238 lines (187 loc) · 8.99 KB
/
kbqa.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
<!DOCTYPE html>
<html lang="en">
<head>
<title>FoodKG</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.0/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.0/js/bootstrap.min.js"></script>
<link rel="stylesheet" href="css/custom.css">
</head>
<body>
<nav class="navbar navbar-inverse mx-auto">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#myNavbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="#"></a>
</div>
<div class="collapse navbar-collapse" id="myNavbar">
<ul class="nav navbar-nav">
<li><a href="index.html">Home</a></li>
<li><a href="foodkg.html">FoodKG Construction</a></li>
<li><a href="whattomake.html">What To Make Application</a></li>
<li class="active"><a href="kbqa.html">Answering Natural Language Questions over FoodKG</a></li>
<li><a href="contact.html">Contact</a></li>
</ul>
</div>
</div>
</nav>
<div class="container-fluid text-center">
<div class="row content">
<div class="col-sm-2 sidenav">
<div class="bg"></div>
<span class="caption">
source: <a href="https://www.freeimages.com/search/food-question-mark">https://www.freeimages.com/search/food-question-mark</a>.
</span>
</div>
<div class="col-sm-8 text-left">
<a href="https://github.com/foodkg/foodkg.github.io"><img style="position: absolute; top: 0; right: 0; border: 0;"
src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub"></a>
<!-- Content -->
<h1>Answering Natural Language Questions over FoodKG</h1>
<br/>
<p>
We demonstrate a potential use of our FoodKG for answering natural language questions over knowledge graphs, aka, knowledge base question answering (KBQA).
Given questions in natural language such as "what Indian dishes can I make with chicken and garlic?", our goal here is to automatically find answers from the FoodKG.
We believe this is a natural way to access a large-scale knowledge graph, especially for non-experts users.
Moreover, in this way, our FoodKG is able to benefit users by providing nutrition facts of ingredients and diverse recipe options in a user-friendly way.
To this end, we build this application which is <b>Answering Natural Language Questions over FoodKG</b>.
We first create a synthetic Q&A dataset based on our FoodKG using a set of manually designed question templates.
Then we train a state-of-the-art neural network-based KBQA model called <a href="https://arxiv.org/abs/1903.02188">BAMnet</a> on the Q&A dataset.
After training the KBQA model, it is supposed to answer similar natural language questions based on the FoodKG.
</p>
<hr>
<section>
<h2>Prerequisites</h2>
In order to run the following experiments, you will need to first download the <a href="https://github.com/foodkg/foodkg.github.io">code</a>.
All relevant code is in the <b>app_kbqa</b> folder.
This code is written in python 3. You will need to install a few python packages in order to run the code.
We recommend you to use <code>virtualenv</code> to manage your python packages and environments.
Please take the following steps to create a python virtual environment.
<ol>
<li>If you have not installed virtualenv, install it with <code>pip install virtualenv</code>.</li>
<li>Create a virtual environment with <code>virtualenv venv</code>.</li>
<li>Activate the virtual environment with <code>source venv/bin/activate</code>.</li>
<li>Install the package requirements with <code>pip install -r requirements.txt</code>.</li>
</ol>
<hr>
</section>
<section>
<h2>Create a synthetic Q&A dataset</h2>
<h4><b>Fetch KG subgraphs from a remote KG stored in Blazegraph</b></h4>
<p>We assume you have already loaded the FoodKG into <a href="https://www.blazegraph.com/">Blazegraph</a>.
If not, please follow the instructions in the <a href="https://wiki.blazegraph.com/wiki/index.php/Quick_Start">User Guide</a> to download, install, and load the FoodKG RDF data in to the Blazegraph endpoint on your system.
Please also confirm that the variable <code>USE_ENDPOINT_URL</code> hard-coded in <b>data_builder/src/config/data_config.py</b> matches the URL and namespace of your Blazegraph instance.
</p>
<ol>
<li>
<h5>Go to the data_builder/src folder, run the following cmd:</h5>
<code>
python usda.py -o [qas_dir]
</code><br>
<code>
python recipe.py -o [qas_dir]
</code>
<br><br>
In the [qas_dir] folder, you will see two JSON files: <b>usda_subgraphs.json</b> which is for the USDA data and <b>recipe_kg.json</b> which is for the Recipe1M data.
</li><br>
<li>
<h5>Note that in the recipe data, a few tags contain more than 5000 recipes which might cause Out of Memory issue when running the KBQA system. So you may want to create a smaller recipe dataset by randomly keeping at most 2000 recipes under each recipe tag. In order to do that, run the following cmd:</h5>
<code>
python filterout_recipe.py -recipe [qas_dir/recipe_kg.json] -o [qas_dir/sampled_recipe_kg.json] -max_dish_count_per_tag 2000
</code>
</li><br>
<li>
<h5>Now, you can merge the above two files into a single file using the following cmd:</h5>
<code>
cat [qas_dir/usda_subgraphs.json] [qas_dir/sampled_recipe_kg.json] > [qas_dir/foodkg.json]
</code><br><br>
This is the local KG file which will be accessed by a KBQA system.<br>
</li>
</ol><br>
<h4><b>Generate synthetic questions</b></h4>
<ol>
<li>
<h5>Run the following cmd:</h5>
<code>
python generate_qa.py -usda [qas_dir/usda_subgraphs.json] -recipe [qas_dir/sampled_recipe_kg.json] -o [qas_dir] -sampling_prob 0.05 -num_qas_per_tag 20
</code><br><br>
Note that if your machine does not have large RAM (e.g., > 16GB), you can set smaller sampling ratios (i.e., sampling_prob and num_qas_per_tag) when creating the dataset.
</li>
</ol><br>
</section>
<section>
<h2>Run a KBQA system</h2>
<h4><b>Preprocess the Q&A dataset</b></h4>
<ol>
<li>
<h5>Go to the BAMnet/src folder, run the following cmd:</h5>
<code>
python build_all_data.py -data_dir [qas_dir] -kb_path [qas_dir/foodkg.json] -out_dir [qas_dir]
</code>
In the message printed out, your will see some data statistics such as <b>vocab_size</b>, <b>num_ent_types</b>, <b>num_relations</b>. These numbers will be used later when modifying the config file.
</li>
</ol><br>
<h4><b>Load pretrained word embeddings</b></h4>
<ol>
<li>
<h5>Download the pretrained Glove word ebeddings <a href="http://nlp.stanford.edu/data/wordvecs/glove.840B.300d.zip">glove.840B.300d.zip</a>.
</h5>
</li><br>
<li>
<h5>
Unzip the file and convert glove format to word2vec format using the following cmd:</h5>
<code>
python -m gensim.scripts.glove2word2vec --input glove.840B.300d.txt --output glove.840B.300d.w2v
</code>
</li><br>
<li>
<h5> Fetch the pretrained Glove vectors for our vocabulary.
</h5>
<code>
python build_pretrained_w2v.py -emb glove.840B.300d.w2v -data_dir [qas_dir] -out [qas_dir/glove_pretrained_300d_w2v.npy] -emb_size 300
</code>
</li><br>
</ol>
<h4><b>Tran/test a KBQA system</b></h4>
<ol>
<li>
Modify the config file <b>BAMnet/src/config/kbqa.yml</b> to suit your needs. Note that you can start with modifying only the data folder and vocab size (e.g., <b>data_dir</b>, <b>kb_path</b>,
<b>pre_word2vec</b>, <b>vocab_size</b>, <b>num_ent_types</b>, <b>num_relations</b>), and leave other variables as they are.
</li><br>
<li>
<h5>Train the KBQA model.</h5>
<code>
python train.py -config config/kbqa.yml
</code>
</li><br>
<li>
<h5>Test the KBQA model:</h5>
<code>
python run_online.py -config config/kbqa.yml
</code>
</li>
</ol><br>
<!-- Note that we use the BAMnet model as our KBQA system in this application. For more details about the BAMnet model, please refer to the <a href="https://arxiv.org/abs/1903.02188">original paper</a>. -->
</section><br>
<!--
<ol>
<li>Acquire the data listed above</li>
<li>Use /src/prep-scripts/ to join automatically-acquired data with the manual data</li>
<li>Use /src/recipe-handler/ to generate a knowledge graph from the prepared data</li>
<li>Use /src/verify to generate statistics from the result</li>
</ol>
-->
</div>
</div>
</div>
<!-- <footer class="container-fluid text-center">
<p></p>
</footer> -->
</body>
</html>