Skip to content

davidecristiani/txt_similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TXT Similarity

Just a Docker/Python script that checks the similarities between two groups of txt files using the Python library text_distance.

The input are the txt files in the directories:

  • /app/input_txt_a/
  • /app/input_txt_b/

The output are the csv files in the directory:

  • /app/output_csv/

The script uses 5 different algorithms to measure the similarities:

  • Hamming distance
  • Levenshtein distance
  • Jaro-Winkler
  • Jaccard index
  • Ratcliff-Obershelp

About

TXT SIMILARITY

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published