Given a large text corpus find the longest common substring that is repeated most often
The sample corups I was working on inth R file was analyzing Swiss Supreme court rulings which frequently copy exact paragraphs from past decisions. Goal was t to understand if law is getting more complex over time.
OpenData can be found here :