Word similarity search created with Java 17 to allow users to rank words or phrases based on a 50d vector dataset. This project allows users to provide a 50d vector dataset of word embeddings and a word/phrase they would like to parse. From there, the program provides the highest ranked words based on similarity using an approach of the user's choosing.
- Specify your own 50d dataset that will be used for similarity searching
- Parse a single word or an entire sentence, and the system will try to provide the best result for you
- Save your findings in your own specified output file and share it with your friends
The available algorithms include:
- Dot Product
- Euclidean Distance
- Cosine Distance
Ensure that you are have Java SDK 17 or higher installed
First compile the src/
directory using the following command
javac src/ie/atu/sw/*.java -d out/
Then run the program with
cd out/
java ie.atu.sw.Runner
You will then be presented with options as follows:
-
Provide file path for 50d word embeddings dataset: Chose the 50d .txt file to act as your model
-
Print total count of words in model: View total amount of words in model
-
Provide file path for output: Specify the output file for your results
-
Cycle similarity search algorithm: Cycle through the available algorithms mentioned above
-
Change number of words to show in similarity ranking: Change the number of words shown in the output file
-
Enable/Disable weight details (false): Toggle word similarity score visibility
-
Begin word similarity search: Input word sequence (e.g. apple banana cheese) and start word similarity search
-
Quit: Exit the application