You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
right now there are a variety of cases where Reconstruction can raise a MultipleReadingsError: when fetching an initial, a rime, or an entire reading. for at least some of these cases, I think we could do something a little smarter. an example:
we see a 長 in the text without an annotation from LDM, and go to the guangyun looking for a reading.
we see that 長 has three available readings: drjangH, drjang, trjangX.
for the rime, we have only one option, which we can confidently annotate: jang.
for the tone, we have three options: level, rising, and departing.
there's still ambiguity here, but much less ambiguity than simply giving up and not assigning a reading! if we can come up with a systematic way of noting the ambiguity, as B&S do for their OC reconstruction (using things like brackets), we might still salvage some information that would help an algorithm or a human manually correcting the data. for example:
[dr|tr]jang[X|H|_]
and if we annotate each part in a separate field, this might make it into the CoNLL-U as:
MCInitial=[dr/tr]|MCRime=jang|MCTone=[X/H/_]
(using the / instead of | since that character is reserved to separate annotations in CoNLL-U MISC and FEATS fields.)
this also helps in the (unfortunately many) cases where LDM did provide an annotation, but one or both of the characters in his fanqie happen to be polyphones.
The text was updated successfully, but these errors were encountered:
This sounds like a brilliant solution when compared to our previous approach. And you are right, this makes things for a human reader much clearer, as the structure you are proposing inherently draws attention to what's unclear.
Also, just to follow up on LDM, as this logically would result in either of the two things:
the character LDM provided is still ambiguous, as the relevant syllable segment, for example MCInitial=[dr/tr], is ambiguous
the character is not ambiguous anymore, as LDM refers to the syllable segment that is clear, for example MCRime=jang|MCTone=X
right now there are a variety of cases where
Reconstruction
can raise aMultipleReadingsError
: when fetching an initial, a rime, or an entire reading. for at least some of these cases, I think we could do something a little smarter. an example:drjangH
,drjang
,trjangX
.dr
andtr
.jang
.there's still ambiguity here, but much less ambiguity than simply giving up and not assigning a reading! if we can come up with a systematic way of noting the ambiguity, as B&S do for their OC reconstruction (using things like brackets), we might still salvage some information that would help an algorithm or a human manually correcting the data. for example:
[dr|tr]jang[X|H|_]
and if we annotate each part in a separate field, this might make it into the CoNLL-U as:
MCInitial=[dr/tr]|MCRime=jang|MCTone=[X/H/_]
(using the / instead of | since that character is reserved to separate annotations in CoNLL-U
MISC
andFEATS
fields.)this also helps in the (unfortunately many) cases where LDM did provide an annotation, but one or both of the characters in his fanqie happen to be polyphones.
The text was updated successfully, but these errors were encountered: