Skip to content

emmannyyy/Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Crawler and Inverted Indexer

This project is a web crawler and inverted indexer that extracts information from web pages and creates an inverted index for efficient keyword-based searching.

Prerequisites

To compile and run this project, you need to have the following dependencies installed. Clone the repo, and the files below will automatically be included

  • Java Development Kit (JDK)
  • htmlparser.jar
  • jsoup-1.17.2.jar
  • jdbm-1.0.jar

Compilation

To compile the project and move the class files to the appropriate destination, use the following command:

javac -cp htmlparser.jar:jsoup-1.17.2.jar:jdbm-1.0.jar:. *.java

Then, copy class files to the appropriate folder in order to crawl pages

cp *.class ./PROJECT

Also, move all class files to the appropriate folder for the webapp

mv *.class ./apache-tomcat-10.1.20/webapps/comp4321/WEB-INF/classes/PROJECT

Running the Crawler and Inverter

To run the program, use the following command

java -cp htmlparser.jar:jsoup-1.17.2.jar:jdbm-1.0.jar:. PROJECT.Inverter "https://www.cse.ust.hk/~kwtleung/COMP4321/testpage.htm"

Moving the database files to the correct location:

mv *.lg *.db ./apache-tomcat-10.1.20/webapps/comp4321/WEB-INF/database/

Optional: Running other java files for testing (e.g: SearchEngine)

java -cp htmlparser.jar:jsoup-1.17.2.jar:jdbm-1.0.jar:. PROJECT.SearchEngine

Using the Web interface:

Firstly, add environment variables.

  • Set CATALINA_HOME to {path to this project}/apache-tomcat-10.1.20/
  • Set JAVA_HOME to {Path to your JDK}

Next, change directory to the correct folder to start apache tomcat, and run the startup.sh file

cd ./apache-tomcat-10.1.20/bin
./startup.sh

Head over to your browser: http://localhost:8080/comp4321/

You may then perform the searching

Shutting down Apache Tomcat

You should then shutdown apache-tomcat when you are done

./shutdown.sh

License

This project is licensed under the MIT License. Feel free to copy and paste the above content into your README file, making any necessary adjustments or additions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published