Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 704 Bytes

README.md

File metadata and controls

12 lines (6 loc) · 704 Bytes

Given a large text corpus find the longest common substring that is repeated most often

The sample corups I was working on inth R file was analyzing Swiss Supreme court rulings which frequently copy exact paragraphs from past decisions. Goal was t to understand if law is getting more complex over time.

OpenData can be found here : https://www.bger.ch/ext/eurospider/live/de/php/clir/http/index_atf.php?lang=de

image (1)

image (2)

image (3)