COVID Search Engine

Seun Suberu

Info: Data Structures Final Project at Southern Methodist University. Remade on December 2022. Remodel of original project.

Kick Covid in the Ass

Description

This is a COVID Document Search Engine written in C++ and built using CMAKE that utilizes a self-implemented AVLTree for word storage and self-implemented HashTable for authors. The documents are ranked by term-frequency/inverse document frequency metric and indexed into self-made data structures. There is a command line user interface provided. Implemented own Hash Table and AVL Tree to store authors and stemmed words.

How to use

Set up:
Link to Dataset for covid information files

Program argument should be then a path to the directory where the files for indexing are located.
For example:

$ mkdir build
$ cmake ..
$ ./SearchEngine {directory}

From there the program should run as expected. The program loads and indexes all the files and then the program should be ready for any queries. The keywords recognized by this Search Engine are "AUTHOR", "AND", "OR", and "NOT". The only keywords that can appear at the beginning of a query are AUTHOR and NOT. The other have to be have a search word before and after.
AUTHOR: returns all results of particular author, but when AUTHOR is preceded by a word then the results are documents that contain that word from that author.
AND: Returns documents that contain the words before and after the KEYWORD.
OR: Returns documents that contain either the word before or the one after.
NOT: Returns all documents that do not contain the particular word. Can be compounded with AND or ORs, but had to be the last keyword used in the query and cannot be the only query.

Examples (THESE QUERIES ARE NOT GUARANTEED TO RETURN RESULTS):

AUTHOR Suess

covid AND chicken

coronavirus OR pizza

pizza NOT covid

pizza AUTHOR grant

pizza

HashTable

Co-Written and Co-Implemented by Seun Suberu
Used for storing and retrieving author information.

HashSet

Written and Implemented by Seun Suberu
Used for storing and retrieving stop word values.

AVLTree

Written and Implemented by Seun Suberu
Used for storing and retrieving stemmed words with their associated document identifiers for indexing.

Article Class

Written and Implemented by Seun Suberu
Used for storing and retrieving document information.

Author Class

Written and Implemented by Seun Suberu
Used for storing and retrieving author information.

Word Class

Written and Implemented by Seun Suberu
Wrapper class for stemmed word which also contains a collection of InnerDoc objects for keeping track of documents that contain the stemmed word.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
Indexer		Indexer
QueryProcessor		QueryProcessor
Stemmer		Stemmer
UserInterface		UserInterface
Util		Util
cmake-build-debug		cmake-build-debug
rapidjson		rapidjson
.DS_Store		.DS_Store
CMakeLists.txt		CMakeLists.txt
README.md		README.md
catch.hpp		catch.hpp
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID Search Engine

Kick Covid in the Ass

Description

How to use

HashTable

HashSet

AVLTree

Article Class

Author Class

Word Class

About

Releases

Packages

Languages

senseiseun/SearchEngine

Folders and files

Latest commit

History

Repository files navigation

COVID Search Engine

Kick Covid in the Ass

Description

How to use

HashTable

HashSet

AVLTree

Article Class

Author Class

Word Class

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages