-
Notifications
You must be signed in to change notification settings - Fork 60
Prototype roadmap
Benjamin Ooghe-Tabanou edited this page Dec 21, 2012
·
1 revision
The specifications follow 3 directions:
- proof of concept, define what a web corpus is and make sure the software memory management can handle it;
- functional scope statement, perform an detailed inventory of the possible scenarios of use based on user profiles (field, expertise) and validate the functional analysis;
- architecture, propose a solid software architecture with modules and interfaces.
The aim of the prototype is to test a basic interaction between the crawling function and the corpus building function. The stepping stones are the following:
- end of 2010: specifications and memory usage proof of concept
- summer 2011: a pre-prototype with a User Interface, possibility of crawling web corpura
- end 2011: prototype of UI, core, memory structure and crawler :
- a scrapy outsourcing contract will help us out for the scrapy implementation part
- a lucene outsourcing contract will help us out with Memory structure prototype
From 2012 the development of HCI software will benefit from the DIME-SHS scientific equipment from Science-Po who will based his web corpus service on the HCI tools. His main webmining developper will then participate actively to the development of HCI.
Proof of concept of interactions between corpus building and crawling :
- corpus storage : Memory structure
- primitive interface Design : Web entities + Qualification
- interface to a crawler : Live remote crawling
- interface to a cartographic engine : export to Gexf to Gephi or Gexf Explorer
- timestamps : Activity log
- exploration tools
- spreadsheet view on the corpus
- terms extractions on the fly or from raw text storage
- interface to Archive engine
- cartographic exploration in builder
- collaborative work
- Focus crawling