Prototype roadmap

Table of Contents objectives short term (end of 2010): specification of the hyphen software mid term (end 2011): development of a prototype long term (from 2012): from prototype to alpha software program of prototype priority 0 priority 1 priority 2 priority 3

objectives

short term (end of 2010): specification of the hyphen software

The specifications follow 3 directions:

proof of concept, define what a web corpus is and make sure the software memory management can handle it;
functional scope statement, perform an detailed inventory of the possible scenarios of use based on user profiles (field, expertise) and validate the functional analysis;
architecture, propose a solid software architecture with modules and interfaces.

For a detailed view of the short term roadmap, please refer to the Working_sessions

mid term (end 2011): development of a prototype

The aim of the prototype is to test a basic interaction between the crawling function and the corpus building function. The stepping stones are the following:

end of 2010: specifications and memory usage proof of concept
summer 2011: a pre-prototype with a User Interface, possibility of crawling web corpura
end 2011: prototype of UI, core, memory structure and crawler :
- a scrapy outsourcing contract will help us out for the scrapy implementation part
- a lucene outsourcing contract will help us out with Memory structure prototype

Here is the basic user scenario we want to implement in this first prototype : First prototype basic user scenario

long term (from 2012): from prototype to alpha software

From 2012 the development of HCI software will benefit from the DIME-SHS scientific equipment from Science-Po who will based his web corpus service on the HCI tools. His main webmining developper will then participate actively to the development of HCI.

program of prototype

priority 0

Proof of concept of interactions between corpus building and crawling :

corpus storage : Memory structure
primitive interface Design : Web entities + Qualification
interface to a crawler : Live remote crawling
interface to a cartographic engine : export to Gexf to Gephi or Gexf Explorer

priority 1

timestamps : Activity log
exploration tools
spreadsheet view on the corpus

priority 2

terms extractions on the fly or from raw text storage
interface to Archive engine
cartographic exploration in builder

priority 3

collaborative work
Focus crawling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly