You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EVT3 parser module is too dependent on the structure of the edition and heavily rely on the intrinsic XML nesting to work properly. I'll refer you to #234 for further analysis on how the current parser works and its problems, but I'll mainly stick to DEPA related issues here.
First of all, I'll paste the information found in the parsing wiki:
After reading the source file indicated in the proper configuration parameter, EVT parses the structure of the edition. At the moment, everything is based on pages (this will probably change when we will add the support for critical edition and pageless editions). A page is identified as the list of XML elements included between a <pb/> and the next one (or between a <pb/> and the end of the node containing the main text, which is the <body> in the case of the last page).
Each page is represented in the EVT Model as a Page:
The content of each page is therefore represented as an array of object retrieved by parsing the original XML nodes.
After parsing the structure, for each page identified, we then proceed to parse all the child nodes, by calling the parse method of the GenericParserService.
Parsers are defined in a map that associates a parser with each supported tagName. This map is retrieved by the generic parsing function which chooses the right parser based on the node type and its tagName. If a tag does not match a specific parser, the ElementParser, which does not add any logic to the parsing results, is used. Tags and parsers are divided by belonging TEI module.
That's great, except for the fact that each resulting type is associated with a component through the ContentViewerComponent
This is a dynamic component that takes a ParsedElement as input and establishes which component to use for displaying this data based on the type indicated in the type property.
This type is used to manage the component register, to be accessed for dynamic compilation, and also the type of data that the component in question receives as input
And the problem is that the parsing and the visualization are too dependent on XML tags. The content viewer takes the Page parsedContent attribute and for each element of the array - that is, a XML tag that has been parsed and contains information only about that scope and the nested ones - the ContentViewer associates it with a specific component (e.g. the <bibl> tag is associated with the BibliographyComponent) and visualize it. But with DEPA it's a real mess, because its major strength is using anchors as delimiters. These anchors can be siblings, or even be children of different tags. There's also the possibility of apparatuses overlaying with each other. Therefore, it is hard to think of a forced solution where we would surround the interested scope with a fake tag just to visualize it correctly. Especially with overlaying apparatuses.
The text was updated successfully, but these errors were encountered:
<evt-depa>
<w id="1">this</w>
<evt-depa> <-- this is the starting tag of the second app!? (from="2" to="3")
<w id="2">is</w>
</evt-depa> <-- this is the closing tag of the first app!? (from="1" to="2")
<evt-depa>
<w id="3">some text</w>
</evt-depa> <-- this is the closing tag of the second app!? (from="2" to="3")
<w id="4">test</w>
</evt-depa> <-- this is the closing tag of the second app!? (from="3" to="4")
EVT3 parser module is too dependent on the structure of the edition and heavily rely on the intrinsic XML nesting to work properly. I'll refer you to #234 for further analysis on how the current parser works and its problems, but I'll mainly stick to DEPA related issues here.
First of all, I'll paste the information found in the parsing wiki:
That's great, except for the fact that each resulting type is associated with a component through the ContentViewerComponent
And the problem is that the parsing and the visualization are too dependent on XML tags. The content viewer takes the Page
parsedContent
attribute and for each element of the array - that is, a XML tag that has been parsed and contains information only about that scope and the nested ones - the ContentViewer associates it with a specific component (e.g. the<bibl>
tag is associated with theBibliographyComponent
) and visualize it. But with DEPA it's a real mess, because its major strength is using anchors as delimiters. These anchors can be siblings, or even be children of different tags. There's also the possibility of apparatuses overlaying with each other. Therefore, it is hard to think of a forced solution where we would surround the interested scope with a fake tag just to visualize it correctly. Especially with overlaying apparatuses.The text was updated successfully, but these errors were encountered: