Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement conversion from middle chinese to old chinese #3

Open
thatbudakguy opened this issue Jan 22, 2022 · 3 comments
Open

implement conversion from middle chinese to old chinese #3

thatbudakguy opened this issue Jan 22, 2022 · 3 comments
Labels
question Further information is requested

Comments

@thatbudakguy
Copy link
Member

thatbudakguy commented Jan 22, 2022

depends on #2.

@thatbudakguy thatbudakguy added the question Further information is requested label Jan 22, 2022
@thatbudakguy thatbudakguy transferred this issue from another repository Feb 20, 2022
@GDRom
Copy link
Member

GDRom commented Jun 7, 2022

Note: there are a few dozen characters that would trigger the following issue:

  • Baxter & Sagart's OCNR providing multiple readings for what is rendered as a single MC reading

Example:

  1. 沈 chén, sink [v.t.]; MC drim < OCNR *C.[d]r[ә]m
  2. 沈 chén, sink [v.i.]; MC drim < OCNR *[d]r[ә]m

Such occurrences ought to pop up predominantly in initial/preinitial positions.

For OC implementation, I'd have to disambiguate manually.

@thatbudakguy
Copy link
Member Author

It's fascinating that these types of subtle changes nearly always seem to have a syntactic or semantic correlate (the transitivity of the verb, here, which took me second to notice!)

Is it worth going through OCNR and pulling out all of these quasi-"minimal pairs" to see if we can come up with a rule? The reason I ask is because the annotation process, for everything that we annotate (phonology included) is I assume going to be "automated first with manual later", and so if we do that process for POS first, we can then use the POS information to make "smarter" initial predictions for the phonology.

I'm actually not sure how transitivity is represented in CoNLL-U (maybe that's the dependency parse?), so really these would both be VERB in the POS category anyway, but just thinking further about this. It'd at least help for the cases of polyphones in middle chinese where the POS (verb vs noun) actually can disambiguate further.

@GDRom
Copy link
Member

GDRom commented Jun 7, 2022

I should have spelled the difference between transitive vs. intransitive out fully; my apologies!

As to your question -- not sure if it's worth it? Maybe? We should discuss the whole process in more detail.

Overall, I'd be tempted to keep the "simple" LDM-model clear of those assumptions (I don't think our friend in the 6th century cared for the difference between intransitive vs. transitive, as both were read the same and meant [largely] the same to him). Instead, we might potentially run into the issue of circular logic (as we'd take Baxter and Sagart's assumptions and built the entire model based on that).

We could, however, include the transitive vs. intransitive distinction in a full-on OCNR model; not sure where to put that in CoNLL-U either, though. UD-Kanbun does not distinguish between that, I think; implicitly, the dependency parse would provide that information (VERB followed by dependent NOUN is transitive; VERB without that is intransitive). We could as a consequence test how much better the OCNR model would do than the LDM-model (as in: does linguistics help us understand what's going on better than LDM's notes on 音義?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants