Skip to content
/ NATE-QA Public

Automated QA-based chatbot for our CS 3110 midterm project.

Notifications You must be signed in to change notification settings

jz393/NATE-QA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NATE: Nonhuman Abstract Tech Expert

Team Members: Jane Zhang, Andrei Kozyrev, Shreeya Gad, Junan Qu

Introduction

For our CS 3110 midterm project, the four of us created an OCaml chatbot to answer questions related to the field of computer science.

The bot gathers its knowledge from an extensive text corpus of Wikipedia articles related to the field of CS, ranging from people (e.g. Alan Turing), places (e.g. Silicon Valley), subfields (e.g. Artificial Intelligence), and companies (e.g. Google). First, it takes in the user's input (the question). Then, it uses the term frequency–inverse document frequency (TFIDF) algorithm to find the most relevant document in the data corpus to search for the answer. Once the document is found, it has two separate mechanisms to find the answer to extract from the doc: Jaccard similarity (which compares the question to each sentence in the doc) and Cosine similarity (which embeds the questions/doc sentences first before computing similarity scores). Once it has found the sentence with the highest similarity score, it will respond with that as the answer.

Example: Who is Mark Zuckerberg? -> Mark Elliot Zuckerberg, born May 14, 1984, is an American technology entrepreneur and philanthropist

The bot can also find misspelled words in the user's question and provide possible corrections with its autocorrection feature. If an inputted word cannot be found in the corpus, it will search for candidates in the corpus with the lowest edit (Levenshtein) distance. Additionally, given the input question, the bot can recommend related topics for the user to learn about from a set of topic clusters.

All algorithms were designed from scratch (no external libraries/packages other than built in OCaml modules), and implementations are our own work unless noted otherwise.

We named our bot after our professor, Dr. Nate Foster, whom we thank for structuring the Spring 2019 course, developing our assignments, and teaching us about functional programming paradigms.

Usage

  1. git clone https://github.com/jz393/NATE-QA
  2. make build
  3. make bot

About

Automated QA-based chatbot for our CS 3110 midterm project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published