-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StanfordParser ignores root dependencies #628
Comments
I think we have no documented policy about handling the dependency tree root yet. MstParser (DKPro Core version) has the root dependency linking to itself. "[ 3, 7]Dependency(null) D3,7 G3,7", MateParser (DKPro Core version) appears to not explicitly mark the ROOT: "[ 36, 44]DOBJ(OA) D36,44 G4,12", -> There is no dependency for the root token 4,12 at all. StanfordParser (DKPro Core version) does the same as MateParser (DKPro Core version): "[ 0, 2]NSUBJ(nsubj) D0,2 G3,7", -> There is no dependency for the root token 3,7 at all. So I guess we have two ways of finding the root dependency:
|
Added other modules that handle dependency parses. |
Conll2006Reader models the root as a dependency looping back to itself. |
So again a summary of different components and how then behave for the root nodes when adding a Dependency annotation the CAS. In most of the following English examples
Legend:
|
The CONLL writers currently expect a self-looping relation labelled ROOT and will render that as a 0 index. That is how the CONLL readers render the data into the CAS. Consequently, a round-trip works here and we have unit tests for this in place. However, this is inconsistent with the output produced by the dependency parsers integrated in DKPro Core. I tend towards changing the parser output to mark roots as self-looping. I am not 100% sure if the parser output should be changed such that these self-looping relations are forcible labelled as ROOT independent of what the parser actually produces. |
I agree that the best route would be to make the top node self-looping. A good solution for compatibility with CONLL CAS Consummers would be to set two parameters: PARAM_ROOT_INDEX and PARAM_ROOT_LABEL, respectively, for using 0 as the root node index (or alternatively the index of the top token), and specifying either a 'ROOT' or 'Sentence' label for the root element, since both labels are commonly used in dependency tagsets. If both parameters are not set, the parsers could continue to behave as each is behaving now, to preserve backward compatibility with whatever pipelines are being used with them now. I don't know, though, if it would be better to make all of them consistent as default (with only a PARAM_ROOT_LABEL to set), and specify a non-mandatory parameter for "old-style" behaviour. This might be the better solution. What should be avoided at all costs is to not set at all a dependency annotation for the ROOT element, since this node is the central one in the tree, or graph, and should be visible to Consummers and to downstream modules. Once a decision is reached, it doesn't seem to be too complicated to make all the annotation processes consistent, since the parsers themselves construct the required information internally. |
- Introduce a new dependency UIMA type ROOT and enforce that it is used for root dependencies. - Enforce that root dependencies loop to themselves and that each sentence has a root dependency (if it has any dependencies at all)
- Introduce a new dependency UIMA type ROOT and enforce that it is used for root dependencies. - Enforce that root dependencies loop to themselves and that each sentence has a root dependency (if it has any dependencies at all)
Introduced a new ROOT dependency type that is being used to mark the root dependency. Also enforce that all ROOT dependencies loop to themselves. The suggested parameters for the CONLL consumers were not introduced. If these are needed, please open a separate issue. |
MstParser has the root dependency linking to itself, e.g.:
"[ 3, 7]Dependency(null) D3,7 G3,7",
StanfordParser indeed ignores the root dependency - I concur that is a bad decision and should be fixed!
See also: https://groups.google.com/d/topic/dkpro-core-user/1EID_QeP2P8/discussion
The text was updated successfully, but these errors were encountered: