Gutenberg Graphalyzer

The Gutenberg Graphalyzer project aims to provide a means for measuring the structural complexity of works of literature. It's currently hard-coded to only support texts from the gutenberg project.

Tested Corpus

All results provided were based on a corpus created from Project Gutenberg's April 2010 DVD. A processed corpus collection can be found here

Further Documentation

Further documentation is located on my wiki here. The documentation contained ont he project page will have some useful queries, issues with parsing, and other miscellaneous matter.

Script Information

graphalyzer.py -- Parses an individual project gutenberg text. Assumes header and footer licensing is present. Run the script with '-h' for further information.
make-db-py3.py -- Creates the DB from a directory of project gutenberg text files and an RDF catalog file. Has three global "constants" that must be set for proper usage.
run-experiment.sh -- Runs the graphalyzer script after finding all text files from the corpus. Uses xargs to run the script in parallel. Thanks to the use of SQLite there are no race conditions as far as I have seen.
remove-duplicates.py -- Takes one command ine argument: the directory containing all the text files. Removes duplicate file types. Prefers ASCII over ISO and ISO over UTF-8.
result-analysis.r -- Generates a set of graphs from SQL queries to the results. Can be used as a guideline for future data exploration with R. Can easily be run from the command line with 'R CMD BATCH result-analysis.r'

License Material

All research results, presentations, and documentation are Creative Commons 3.0 Attribution-NonCommercial. All source code is GPL V3.0

GutenbergGraphalyzer by Nathaniel Husted is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
results		results
testfiles		testfiles
writeup		writeup
README.md		README.md
gpl.blurb		gpl.blurb
graphalyzer.py		graphalyzer.py
make-db-py3.py		make-db-py3.py
remove-dupliates.py		remove-dupliates.py
result-analysis.r		result-analysis.r
run-experiment.sh		run-experiment.sh
sample_entry.rdf		sample_entry.rdf
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gutenberg Graphalyzer

Tested Corpus

Further Documentation

Script Information

License Material

About

Releases

Packages

Languages

drwhomphd/GutenbergGraphalyzer

Folders and files

Latest commit

History

Repository files navigation

Gutenberg Graphalyzer

Tested Corpus

Further Documentation

Script Information

License Material

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages