Given a large text corpus find the longest common substring that is repeated most often
The sample corups I was working on inth R file was analyzing Swiss Supreme court rulings which frequently copy exact paragraphs from past decisions. Goal was t to understand if law is getting more complex over time.
OpenData can be found here : https://www.bger.ch/ext/eurospider/live/de/php/clir/http/index_atf.php?lang=de