Skip to content

This is a COVID Document Search Engine that utilizes a self-implemented AVLTree for word storage and self-implemented HashTable for authors. The documents are ranked by term-frequency/inverse document frequency metric and indexed with a command line user interface. Implemented own Hash Table and AVL Tree to store authors and stemmed words.

Notifications You must be signed in to change notification settings

senseisub/SearchEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Suberu Herman

Info: Data Structures Final Project at Southern Methodist University.

Kick Covid in the _____

Description

This is a COVID Document Search Engine written in C++ and built using CMAKE that utilizes a self-implemented AVLTree for word storage and self-implemented HashTable for authors. The documents are ranked by term-frequency/inverse document frequency metric and indexed with a command line user interface. Implemented own Hash Table and AVL Tree to store authors and stemmed words.

How to use

Set up:
Link to Dataset for covid information files

Program argument should be then a path to the directory where the files for indexing are located.
For example:

$ mkdir build
$ cmake ..
$ ./SearchEngineTemplates {directory}

From there the program should run as expected. The program loads and indexes all the files and then the program should be ready for any queries. The keywords recognized by this Search Engine are "AUTHOR", "AND", "OR", and "NOT". The only keywords that can appear at the beginning of a query are AUTHOR and NOT. The other have to be have a search word before and after.
AUTHOR: returns all results of particular author, but when AUTHOR is preceded by a word then the results are documents that contain that word from that author.
AND: Returns documents that contain the words before and after the KEYWORD.
OR: Returns documents that contain either the word before or the one after.
NOT: Returns all documents that do not contain the particular word. Can be compounded with AND or ORs, but had to be the last keyword used in the query.

Examples (THESE QUERIES ARE NOT GUARANTEED TO RETURN RESULTS):

AUTHOR Suess

covid AND chicken

coronavirus OR pizza

NOT covid

HashTable

Co-Written and Co-Implemented by Seun Suberu
Used for storing and retrieving author information.

HashSet

Written and Implemented by Seun Suberu
Used for storing and retrieving stop word values.

AVLTree

Written and Implemented by Seun Suberu
Used for storing and retrieving stemmed words with their associated document identifiers for indexing.

Article Class

Written and Implemented by Seun Suberu
Used for storing and retrieving document information.

Author Class

Written and Implemented by Seun Suberu
Used for storing and retrieving author information.

Word Class

Written and Implemented by Seun Suberu
Wrapper class for stemmed word which also contains a collection of InnerDoc objects for keeping track of documents that contain the stemmed word.

InnerDoc Class

Written and Implemented by Seun Suberu
Utility class of Word class for storing specific document information.

About

This is a COVID Document Search Engine that utilizes a self-implemented AVLTree for word storage and self-implemented HashTable for authors. The documents are ranked by term-frequency/inverse document frequency metric and indexed with a command line user interface. Implemented own Hash Table and AVL Tree to store authors and stemmed words.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published