-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features for DETs and "another" #416
Comments
This is an error in GUM, right? I've always understood the English articles to be restricted to "a(n)" and "the", and that's how it is in EWT and the |
Cunningham's law strikes again! That possibility was why I tagged Amir, at least |
Well, if the guidelines say so then we have to either change GUM or the guidelines... I'd prefer it to have a PronType because it's really just a fusion of the same "an" we tag as having that feature, and the adjective other. Since we tag and deprel it DT/det, and not amod, I would expect it's supposed to match the behavior of the "an" component, but if others see it differently, I'm willing to copy the EWT behavior. |
Historically it is "an"+"other", but "another" as a whole functions differently. (For example, it can take "yet" as an While we're at it I see GUM has |
OK, so Neg for no, Tot for both, and nothing for the rest? Maybe also neg for neither and Dem for yonder? |
Yeah, In principle there could be values that cover {"either", "neither"} and "another". It doesn't seem we have those at present (but see UniversalDependencies/docs#732), so I'm fine with Tagging @dan-zeman in case he wants to weigh in. |
I do like the idea of them having some kind of feature on them, so if there
isn't currently an appropriate feature for "another", perhaps we could add
one
…On Sun, Aug 20, 2023 at 3:54 PM Nathan Schneider ***@***.***> wrote:
Yeah, Dem for "yonder" in its det usage makes sense to me. (If we wanted
to decouple the det function from UPOS, like we do for some other
deprels, arguably "yonder" is an ADV and maybe we'd want to drop the
PronType. But that would be a separate discussion; let's keep DET for now.)
In principle there could be values that cover {"either", "neither"} and
"another". It doesn't seem we have those at present (but see
UniversalDependencies/docs#732
<UniversalDependencies/docs#732>), so I'm fine
with Neg for "neither" and blank for "either" and "another".
Tagging @dan-zeman <https://github.com/dan-zeman> in case he wants to
weigh in.
—
Reply to this email directly, view it on GitHub
<#416 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWNHQYWKO5MFWXTCZ4DXWKIRDANCNFSM6AAAAAA3XOC5HE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
If you want to make that happen I think the way would be to open an issue on the docs repo, and include a table of all determiners with their proposed features (along the lines of https://universaldependencies.org/en/pos/PRON.html). But that will take some discussion—in the meantime we can just use the features we have. |
I would use |
I had posted an issue which could be used for building a standard UniversalDependencies/docs#971 Any thoughts on things such as |
Here's what we converged on in the other thread: https://universaldependencies.org/en/pos/DET.html @AngledLuffa PRs to implement this welcome! |
Thanks for documenting – added udver 2
|
@AngledLuffa any interest in implementing this? Would be great to have for the UD 2.13 release (deadline Nov. 1). |
You have no idea how much of a PITA it's been trying to get Ssurgeon to support empty nodes :/ but I'm almost to point where simple edits to node features are possible, I think |
CoreNLP didn't support empty nodes at all in the graph objects used for SemanticGraph Stanza couldn't read or write those nodes either, it just always discarded them Both of those are now fixed. CoreNLP still can't read or write empty nodes, but I'm just skipping that for now... |
I realized I should add these checks to my validation script and went ahead and added the features with some regex replacements. |
…be PRON in some circumstances (related to #416)
LGTM, thanks. @amir-zeldes something similar for GUM etc? I'll take a look at PUD and the Pronouns datasets |
Yes, it's on my list to implement the feature proposal from the table before the upcoming release, not done yet though. |
In PUD, there are a few lines of
A larger context looks like this:
Is it still |
Similarly, should
|
If that is relative it should be PRON not DET.
Yes, that's half as PDT/DET. |
So these
|
Yes |
UniversalDependencies/UD_English-PUD#20 should the dependencies be |
obj is correct: "a producer that she admired" is a way of conveying "she admired the producer", only with "that" standing in for the producer and moved before "she". |
Great, thanks. Based on that, I merged the PR as is |
The Pronouns dataset doesn't have many errors: |
What about
|
Yeah, |
Pronouns change looks good then? |
Here's an implementation: UD_English-EWT/not-to-release/tools/neaten.py Lines 1126 to 1159 in 532631f
|
What about
|
No, if an ADV has features it would just be comparative or superlative I think |
that's fair, but i'll just leave it for now |
Here's an update for PUD: |
@amir-zeldes implemented in GUM yet? |
I think so - I implemented the table. "Another" now has just |
Yes, the table at https://universaldependencies.org/en/pos/DET.html. @AngledLuffa are we done with this issue? |
Great, feel free to spot check my work, it's all in the dev branch. |
I think we're done - although it occurs to me no one updated LinES. Perhaps I can do that with my script |
One thing I found when trying to script the changes to LinES is that they labeled non-English determiners as DET when part of a proper noun. |
Different treebanks have different policies re: analyzing foreign expressions. Some try to analyze the syntax of the foreign phrase, so |
It depends on whether they decided to annotate foreign phrases following the foreign guidelines, which is legitimate in UD, but optional. But even then foreign multiword names would be gray zone because they can be considered as English phrases but names. |
I updated the |
In comparing EWT and GUM, there are two different standards for the word
another
. In GUM, it has the featurePronType=Art
, whereas in EWT, it has no features. Personally I would think additional features are generally valuable, hence posting it as an issue in EWT.@amir-zeldes
EWT example
GUM example
The text was updated successfully, but these errors were encountered: