Seinfeld is my favorite TV show. I wrote this notebook to scrape the scripts of all Seinfeld episodes from the site seinology.com and merge them into a text corpus so that I could train a language model on. Hope you could find it useful. Any feedback would be appreciated.
corpus.txt: the corpus of length 717576 words, including 64919 lines of Seinfeld scripts, ready to train a language models on.