-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-sequential parsing of TEI elements #234
Comments
I've had the time to analyse the complexity of implementing a second round of parsing, based on the current situation. I'll be redundant for the sake of clarity. @xmlParser('sic', SicParser)
export class SicParser extends EmptyParser implements Parser<XMLElement> {
attributeParser = createParser(AttributeParser, this.genericParse);
parse(xml: XMLElement): Sic {
const attributes = this.attributeParser.parse(xml);
const { type } = attributes;
return {
type: Sic,
sicType: type || '',
class: getClass(xml),
content: parseChildren(xml, this.genericParse),
attributes,
};
}
} There's also that However, some parsing is not done following this method. For example:
@xmlParser('evt-named-entities-list-parser', NamedEntitiesListParser)
export class NamedEntitiesListParser extends EmptyParser implements Parser<XMLElement> {
...
} That adds up complexity to take care while scanning through the document. Maybe this could be easily fixed by also adding the tag
export class WitnessesParserService {
private witListTagName = 'listWit';
private witTagName = 'witness';
private witNameAttr = 'type="siglum"';
private groupTagName = 'head';
private attributeParser = createParser(AttributeParser, parse);
constructor(
private genericParserService: GenericParserService,
) {
}
public parseWitnessesData(document: XMLElement): Witnesses {
const lists = Array.from(document.querySelectorAll<XMLElement>(this.witListTagName));
return {
witnesses: this.parseWitnessesList(lists),
groups: this.parseWitnessesGroups(lists),
};
}
private parseWitnessesList(lists: XMLElement[]) {
const parsedList = lists.filter((list) => !isNestedInElem(list, list.tagName))
.map((list) => this.parseWitnesses(list))
.reduce((x, y) => x.concat(y), []);
return parsedList;
} It doesn't also include the mapping between Refactoring those problems is not straight-forward and would require thorough discussions with other devs, and from my understanding right now it's not really a good time for that. We should still discuss how to proceed, as it's a pretty important feature for the stand-off apparatus implementation. |
[A summary of all subsequent comments so that they aren't lost in Slack] RobertoRDT Lorenzo Bafunno
@Davide I. Cucurnia got around it by duplicating some elements, the implementation can be found in analogue-parser.ts RobertoRDT Because, if I got it right, the problem is that when I have a list in the @Davide I. Cucurnia
RobertoRDT https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-standOff.html As you can see in this example Lorenzo Bafunno I'll try to provide an example RobertoRDT Lorenzo Bafunno
Ideally, we could retrieve data from both the
Let's imagine we store a combination of Sorry if it took me that long but I needed time to make it clear, hope it makes sense 😅
— Apologies in advance, as this might be an overly naive suggestion, but just in case — would an initial parse of |
Thanks @Lorenzo Bafunno (@laurelled), for clarifying so many details on Tuesday. I’m still not proposing this as a ‘sure fire’ solution, but it might work as an interim patch of sorts. The idea is:
It’s not great, and it means another sort-of global/config type variable hanging around. On the other hand, as an interim solution — it’s only a set containing a few strings. The check against it should be very cheap in terms of resources. |
As I could experience in these weeks, Andrew's solution is easily implementable. However, another problem has risen. That is how the parsing and the following visualization is done. I'll paste the information found in the parsing wiki:
That's great, except for the fact that each resulting type is associated with a component through the ContentViewerComponent in a very general way:
The content viewer takes the Page The only solution I can think of is creating a fake surrounding tag to handle it in a specific component, "imitating" the way EVT2 handled it. |
For DEPA (#239) we are solving the problem with an in-memory dictionary of the selected elements () that can be queried when parsing the document. We can think of a similar strategy for these attributes as well. |
At the time of writing, EVT 3 parsing of TEI elements is sequential, element by element, which makes it problematic to refer to elements that are at a later position than the currently parsed element.
For example, to avoid the problem of data redundancy in TEI encoding for authorial philology, a solution based on the
copyof
attribute would be very effective:<mod change="layer-0" xml:id="MS_12_23">original text</mod>
<mod change="layer-1"><mod change="copyof="#MS_12_23"/></mod>
This would also be a very useful mechanism in many other situations, e.g. to put together text based on portions of other separate documents and other stand-off use cases. However, since EVT 3 proceeds sequentially, when it encounters
<del copyof="#MS_12_23"/>
it would not yet have parsed the contents of<mod change="strato-0" xml:id="MS_12_23">
(authorial philology encoding proceeds in reverse chronological order, most recent text first) and thus that node would not yet exist.Another currently impossible use case: retrieving the content of a footnote in the
<back>
, or of a bibliographic entry also in the<back>
.Possible solutions:
The text was updated successfully, but these errors were encountered: