Kotlin implementation of the classic Mark V. Shaney. The model learns, based on pairs of words (prefix), which words usually follows the pair (suffix).
For example the sentence "Mark V. Shaney wrote on the Usenet." results in
Prefix | Suffix |
---|---|
Mark V. | Shaney |
V. Shaney | wrote |
Shaney wrote | on |
wrote on | the |
on the | Usenet. |
To generate text, a random word is chosen and then, suffixes learned are appended. When the text is long enough, there are several suffixes for every prefix, so that there is diversity when constructing a sentence.
The table and sentence construction is similar to a Markov Chain (hence the name 🙃).
The program is an example of an early and extremely simple language model, yet similar to Yoshua Bengio et al.'s Neural model and current transformer LLMs such as GPT.
- Install the Kotlin command-line compiler
- Compile
kotlinc shaney.kt -include-runtime -d shaney.jar
- Run
java -jar shaney.jar 100 chaos.txt
Note: Replace 100
with another length for the generated text and chaos.txt
with another file to learn Markov chains.
science was heading for a crisis of increasing specialization,” a Navy official in charge of research money for experiments, has often been called a strange attractor.
The text from chaos.txt
is from James Gleick's excellent Chaos: Making a New Science