-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entry: Pirate Novel #6
Comments
Progress so far: I can extract relevant sentences from a Gutenberg-sourced corpus, based on the vector similarity of their verbs. Here's the top 15 sentences for "attack" from an Arthurian-themed subset:
It's still going to take a lot of clean-up to be directly useable, obviously, but it does introduce a huge amount of variation. Depending on how much time I want to devote to this part of the project, I may just end up using it to extract some source sentences to turn into hand-templated actions. |
I also have pirates:
|
Nice! |
Status report: I am currently processing all of Project Gutenberg's English-language fiction books into a Textacy corpus. This may prove to be unworkable: thousands and thousands of fiction books have a very large memory footprint. I may have to reduce my focus to a subset of the total. Meanwhile, I've put some work in on the scene expansion and plot generation parts of the system. The part of scene expansion that I've finished is the Conflict system: two characters with opposing goals engage in a contest or combat in a scene. Right now, the only Conflict the system supports is sword fights:
There's some pronoun cleanup that it could be doing but has the underlying data for, and some missing descriptions, but it's a fairly credible output. It repeats a bit, but that's mostly a factor of not enough raw input and the probability decay on repeating a statement being too rapid. A basic action looks like this:
the prereq list has functions that query the Conflict state, while the effects list has functions that alter the world state. (Mostly modeled by adding and removing strings from a list of tags.) There's probably a way to build a context free grammar to handle this, but the system has to also process the changes to the world state, so I built this thing. The system uses the prereq queries to find the list of valid actions, and then randomly picks from that list, with the chance of selecting an action being based on the The Conflict state isn't stored as a static list. Instead, it's dynamically calculated from the stack of actions that have been executed. This lets me do some tricky things with the effects while keeping the conflict state representation as a super simple Python collections.Counter. Note that the conflict state isn't intended to cover the entire novel: once the scene and conflict are done, the results are recorded in the transcript and we can toss the Conflict itself. |
Pirate name generator:
Mostly Tracery, with a bit of trickery to give different name orderings. |
Generating pirate ships and their crews:
|
The book compilation pipeline is complete. I can now press a button and go from generating a book from scratch to outputting a PDF. Advice for the future: getting this set up earlier in the month will probably pay off. Going from iterating on parts of the system to iterating on the book as a whole is a good motivator.
|
A recent bug in the transcript system that resulted in the pirate ships sailing in circles has been fixed, and leads me to conclude that my transcript system is too clever by half. A frequent NaNoGenMo hazard for me. Island generation is in:
Like most of the systems in the generator, it could use a lot more content, but the interesting thing is that it's guaranteed to only output self-consistent island descriptions. The sentences of the descriptions have tags, such as In theory, these tags can also be picked up an reincorporated in the other generators in the system. The islands are generated at the start of the novel, and it stores the information so that, for example, it knows which animal is found on which island, and what sauce the inhabitants use to cook it. I have yet to implement much of anything that makes use of this information, but the data is there, and easy to access. The island generator works completely differently from the conflict generator. Ideally, I'd be able to incorporate a number of generators that use wildly different principles, since I think that results in a much stronger output than spending the same amount of time on one complex generator. I doubt I'm going to have enough time this month to incorporate very many more, though. |
Turns out I did have time to add character descriptions, though I still need to add the hooks that will insert them into the transcript.
|
Release version! The Story of Mary Youx There's a lot more that could be done with this--it doesn't even show off the full potential of the underlying system, and the sword fights aren't in the main narrative--but it's the end of the month. Written in Python 3.5. Contains a lot of text from various Project Gutenberg books, especially from works by Herman Melville, Joseph Conrad, and Richard Henry Dana, Jr. Though there is surprisingly very little from Moby Dick. |
Very nice, Isaac. I'm going to give it some close study, especially the tracery bits. BTW, I got it to run on Ubuntu w/o too much trouble: set up a virtualenv, install tracery, numpy (probably not necessary, since it's almost certainly in ipython), pycorpora and ipython. I have some local installation issues to resolve to get the pdf, all of which are my problem. But I'm getting the output markdown, which is really all I need to study the key parts of your process.. Again, I find this very cool. Thanks for posting it! |
I'm kind of impressed that you got it to run, since I did no deployment testing. (Also, the code is a mess.) The PDF output requires Pandoc and a LaTeX environment plus some custom fonts; you may be able to delete a couple of lines from the latex template and have it work. I should really do a writeup of what's going on under the hood here, because, while I didn't quite reach the literary quality I was aiming for, it did demonstrate that some of the approaches I was trying are viable. (And that some can be redesigned to make it simpler, because I learned a lot in the process.) In particular, the Conflict system in action_commands.py is a bit too clever and I think can be be improved and (most importantly) made more robust. (Also, the swordfight, which can end in several different ways, shows it off much better than the weighing anchor sequences, though Melville would be proud of the detail.) The island generator and the character description generator, on the other hand, turned out better than I was hoping. Tagging the content takes some effort (and they could use some more content) but reversing the assumptions of the usual description generator turned out to be very efficient. Instead of generating the numbers and writing sentences to describe them, pick sentences from the bag and use the characteristics in the sentences to define the island's properties. It means that your probabilities are dictated by the actual content you've written, which is, at least, a more efficient way to get use out of all of your writing. The very-powerful-but-not-connected word2vec system in action_catalog.py and sourcing.py ended up as the great white whale of the project; I used it a little to manually find sentences that would be interesting to include (though I used grep way, way more). But I ran out of time before I could integrate it in an automated way, and I need to fix some bugs and implement some more features before it can be actually useful. |
As usual, I'll probably try a bunch of experiments at the start of the month to see what sticks, but my initial plan is to work on a scene expansion system to write coherent interactions between characters, with the scene parameters dictated by a plot generator. (Either tree-expansion one I came up with last year, or a new one.)
Other promising avenues include word2vec, neural networks, and wave function collapse, which has had some interesting experiments with text already by @mewo2 .
The text was updated successfully, but these errors were encountered: