kbqa.html

<!DOCTYPE html>
<html lang="en">
<head>
  <title>FoodKG</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.0/css/bootstrap.min.css">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.0/js/bootstrap.min.js"></script>
  <link rel="stylesheet" href="css/custom.css">
 </head>
<body>


<nav class="navbar navbar-inverse mx-auto">
  <div class="container">
    <div class="navbar-header">
      <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#myNavbar">
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
      </button>
      <a class="navbar-brand" href="#"></a>
    </div>
    <div class="collapse navbar-collapse" id="myNavbar">
      <ul class="nav navbar-nav">
        <li><a href="index.html">Home</a></li>
        <li><a href="foodkg.html">FoodKG Construction</a></li>
        <li><a href="whattomake.html">What To Make Application</a></li>
        <li class="active"><a href="kbqa.html">Answering Natural Language Questions over FoodKG</a></li>
        <li><a href="contact.html">Contact</a></li>
      </ul>
    </div>
  </div>
</nav>

<div class="container-fluid text-center">
  <div class="row content">
    <div class="col-sm-2 sidenav">
        <div class="bg"></div>
        <span class="caption">
          source: <a href="https://www.freeimages.com/search/food-question-mark">https://www.freeimages.com/search/food-question-mark</a>.
        </span>
      </div>

    <div class="col-sm-8 text-left">
        <a href="https://github.com/foodkg/foodkg.github.io"><img style="position: absolute; top: 0; right: 0; border: 0;"
          src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub"></a>
      <!-- Content -->
      <h1>Answering Natural Language Questions over FoodKG</h1>
      <br/>
      <p>
        We demonstrate a potential use of our FoodKG for answering natural language questions over knowledge graphs, aka, knowledge base question answering (KBQA).
        Given questions in natural language such as "what Indian dishes can I make with chicken and garlic?", our goal here is to automatically find answers from the FoodKG.
        We believe this is a natural way to access a large-scale knowledge graph, especially for non-experts users.
        Moreover, in this way, our FoodKG is able to benefit users by providing nutrition facts of ingredients and diverse recipe options in a user-friendly way.
        To this end, we build this application which is <b>Answering Natural Language Questions over FoodKG</b>.
        We first create a synthetic Q&A dataset based on our FoodKG using a set of manually designed question templates.
        Then we train a state-of-the-art neural network-based KBQA model called <a href="https://arxiv.org/abs/1903.02188">BAMnet</a> on the Q&A dataset.
        After training the KBQA model, it is supposed to answer similar natural language questions based on the FoodKG.
      </p>
      <hr>

<section>
      <h2>Prerequisites</h2>
      In order to run the following experiments, you will need to first download the <a href="https://github.com/foodkg/foodkg.github.io">code</a>.
      All relevant code is in the <b>app_kbqa</b> folder.
      This code is written in python 3. You will need to install a few python packages in order to run the code.
      We recommend you to use <code>virtualenv</code> to manage your python packages and environments.
      Please take the following steps to create a python virtual environment.
        <ol>
      <li>If you have not installed virtualenv, install it with <code>pip install virtualenv</code>.</li>
      <li>Create a virtual environment with <code>virtualenv venv</code>.</li>
      <li>Activate the virtual environment with <code>source venv/bin/activate</code>.</li>
      <li>Install the package requirements with <code>pip install -r requirements.txt</code>.</li>
        </ol>
        <hr>
</section>


<section>
      <h2>Create a synthetic Q&A dataset</h2>
      <h4><b>Fetch KG subgraphs from a remote KG stored in Blazegraph</b></h4>
      <p>We assume you have already loaded the FoodKG into <a href="https://www.blazegraph.com/">Blazegraph</a>.
        If not, please follow the instructions in the <a href="https://wiki.blazegraph.com/wiki/index.php/Quick_Start">User Guide</a> to download, install, and load the FoodKG RDF data in to the Blazegraph endpoint on your system.
        Please also confirm that the variable <code>USE_ENDPOINT_URL</code> hard-coded in <b>data_builder/src/config/data_config.py</b> matches the URL and namespace of your Blazegraph instance.
        </p>
      <ol>
      <li>
      <h5>Go to the data_builder/src folder, run the following cmd:</h5>
        <code>
          python usda.py -o [qas_dir]
        </code><br>
        <code>
          python recipe.py -o [qas_dir]
        </code>
        <br><br>

        In the [qas_dir] folder, you will see two JSON files: <b>usda_subgraphs.json</b> which is  for the USDA data and <b>recipe_kg.json</b> which is for the Recipe1M data.
      </li><br>


        <li>
        <h5>Note that in the recipe data, a few tags contain more than 5000 recipes which might cause Out of Memory issue when running the KBQA system. So you may want to create a smaller recipe dataset by randomly keeping at most 2000 recipes under each recipe tag. In order to do that, run the following cmd:</h5>
          <code>
          python filterout_recipe.py -recipe [qas_dir/recipe_kg.json] -o [qas_dir/sampled_recipe_kg.json] -max_dish_count_per_tag 2000
          </code>
        </li><br>


        <li>
        <h5>Now, you can merge the above two files into a single file using the following cmd:</h5>
          <code>
          cat [qas_dir/usda_subgraphs.json] [qas_dir/sampled_recipe_kg.json] > [qas_dir/foodkg.json]
         </code><br><br>
          This is the local KG file which will be accessed by a KBQA system.<br>
        </li>
    </ol><br>


<h4><b>Generate synthetic questions</b></h4>

<ol>
<li>
<h5>Run the following cmd:</h5>
  <code>
  python generate_qa.py -usda [qas_dir/usda_subgraphs.json] -recipe [qas_dir/sampled_recipe_kg.json] -o [qas_dir] -sampling_prob 0.05 -num_qas_per_tag 20
  </code><br><br>
  Note that if your machine does not have large RAM (e.g., > 16GB), you can set smaller sampling ratios (i.e., sampling_prob and num_qas_per_tag) when creating the dataset.
</li>
</ol><br>
</section>


<section>
  <h2>Run a KBQA system</h2>
<h4><b>Preprocess the Q&A dataset</b></h4>

<ol>
  <li>
  <h5>Go to the BAMnet/src folder, run the following cmd:</h5>

  <code>
  python build_all_data.py -data_dir [qas_dir] -kb_path [qas_dir/foodkg.json] -out_dir [qas_dir]
   </code>
  In the message printed out, your will see some data statistics such as <b>vocab_size</b>, <b>num_ent_types</b>, <b>num_relations</b>. These numbers will be used later when modifying the config file.
 </li>
</ol><br>


<h4><b>Load pretrained word embeddings</b></h4>


<ol>
  <li>
<h5>Download the pretrained Glove word ebeddings <a href="http://nlp.stanford.edu/data/wordvecs/glove.840B.300d.zip">glove.840B.300d.zip</a>.
</h5>
</li><br>


<li>
<h5>
Unzip the file and convert glove format to word2vec format using the following cmd:</h5>
 <code>
  python -m gensim.scripts.glove2word2vec --input glove.840B.300d.txt --output glove.840B.300d.w2v
   </code>
</li><br>


<li>
<h5> Fetch the pretrained Glove vectors for our vocabulary.
</h5>
 <code>
  python build_pretrained_w2v.py -emb glove.840B.300d.w2v -data_dir [qas_dir] -out [qas_dir/glove_pretrained_300d_w2v.npy] -emb_size 300
  </code>
</li><br>
</ol>


<h4><b>Tran/test a KBQA system</b></h4>

<ol>
  <li>
  Modify the config file <b>BAMnet/src/config/kbqa.yml</b> to suit your needs. Note that you can start with modifying only the data folder and vocab size (e.g., <b>data_dir</b>, <b>kb_path</b>,
<b>pre_word2vec</b>, <b>vocab_size</b>, <b>num_ent_types</b>, <b>num_relations</b>), and leave other variables as they are.
</li><br>


<li>
<h5>Train the KBQA model.</h5>
  <code>
  python train.py -config config/kbqa.yml
   </code>

</li><br>

<li>
<h5>Test the KBQA model:</h5>
  <code>
  python run_online.py -config config/kbqa.yml
   </code>
</li>
</ol><br>
<!-- Note that we use the BAMnet model as our KBQA system in this application. For more details about the BAMnet model, please refer to the <a href="https://arxiv.org/abs/1903.02188">original paper</a>. -->
</section><br>


<!--
      <ol>
        <li>Acquire the data listed above</li>
        <li>Use /src/prep-scripts/ to join automatically-acquired data with the manual data</li>
        <li>Use /src/recipe-handler/ to generate a knowledge graph from the prepared data</li>
        <li>Use /src/verify to generate statistics from the result</li>
      </ol>
 -->


    </div>

  </div>
</div>

<!-- <footer class="container-fluid text-center">
  <p></p>
</footer> -->

</body>
</html>