A festival buzz tracker. Or at least, it will be. Currently, it’s a long, long way from complete.
That said, I’m putting it online, because some have expressed an interest in helping out, and I’m not going to turn down such help.
- Create the dev and test databases (MySQL only at this point – using the
RAND()
function. Work-arounds for PostgreSQL welcome). - Add your database.yml file, borrowing from the template if you wish.
- Install Sphinx if you don’t have it yet.
- Install CouchDB if you don’t have it yet. You can either use ports or compile via source, or use the pre-made CouchDBX.
- Make sure you have the Bundler gem (0.9.x – I’m using 0.9.10)
bundle install
bundle lock
rake db:migrate db:test:clone
*- Clone the CouchDB code and push it to your Couch database.
You should be good to go.
To import show information, run the following:
rake shows:import:2010
You can run this task multiple times, and it won’t duplicate data unless you change show names. And sometimes, you will want to do that. So keep this in mind.
Once you’ve got the shows loaded, you’ll probably want to load at least some tweets for shows:
rake twitter:import
This only loads 60 keywords (which, in a fresh dataset, are the act names). You can run it as many times as you like, but Twitter has a (rough) limit of 120 requests per hour. The rake task is smart enough to query the keywords that have the longest time since their previous update.
You will need to visit http://localhost:3000/users/new to create an account, and then look in the logs for the verification link, as the email probably won’t make it to your inbox.
There’s no distinction yet between normal users and admin users, although we don’t have any need for normal users yet (though that will happen eventually).
Once you’ve got that figured out, then you can tweak some show information at http://localhost:3000/admin/shows.
As mentioned above, importing tweets is done by the following rake task:
rake twitter:import
Each tweet is tied to a specific show in the CouchDB database, which is kinda useful – but keep in mind, the keywords need serious tweaking. I’ve not done this myself, but there’s the beginnings of a system (via the Keyword model and the association with Show).
Keywords are the raw search queries sent to Twitter – and often, the act name will not get us the results we want. For example:
- Tripod – we want the comedy group, not a camera tripod.
- Adam Hills – we want tweets about his festival show, not Spicks and Specks.
And once we have the tweets, then we need to process their text – which is running it through the Pedantic gem to give us the important words, which is far more useful for a Bayesian filter.
rake twitter:process
That task should also flag retweets to be ignored.
From there, it’s a matter of building up a reliable dataset for the (as yet non-existant) Bayesian filter. You can go to http://localhost:3000/admin/tweets/unclassified and start marking tweets as positive, negative or ignored, but it’s pretty damn daunting.
It would be better to have some way of doing this on the admin pages for shows, while tweaking keywords, etc. Also, would need some way of verifying whether tweets are positive or negative, once the Bayesian filter is operational.
Um. Almost Everything?
I haven’t bothered with the design – Ben is working on that.
And I feel that Twitter integration and the design are the key points… once that all flows (relatively), then it’ll be at a point where I can approach the Comedy Festival, ask for data feeds, see if they’re interesting in supporting the site in other ways.
Built by Pat Allan. The name was thought of by Andy Gelme. Design is being provided by Ben Webster.