Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StanfordParser ignores root dependencies #628

Closed
reckart opened this issue May 18, 2015 · 7 comments
Closed

StanfordParser ignores root dependencies #628

reckart opened this issue May 18, 2015 · 7 comments

Comments

@reckart
Copy link
Member

reckart commented May 18, 2015

MstParser has the root dependency linking to itself, e.g.:

"[ 3, 7]Dependency(null) D3,7 G3,7",

StanfordParser indeed ignores the root dependency - I concur that is a bad decision and should be fixed!

See also: https://groups.google.com/d/topic/dkpro-core-user/1EID_QeP2P8/discussion

@reckart reckart added this to the 1.7.1 milestone May 18, 2015
@reckart reckart added 🐛Bug Something isn't working Module-stanfordnlp labels May 18, 2015
@reckart
Copy link
Member Author

reckart commented May 18, 2015

I think we have no documented policy about handling the dependency tree root yet.

MstParser (DKPro Core version) has the root dependency linking to itself.

"[ 3, 7]Dependency(null) D3,7 G3,7",

MateParser (DKPro Core version) appears to not explicitly mark the ROOT:

"[ 36, 44]DOBJ(OA) D36,44 G4,12",

-> There is no dependency for the root token 4,12 at all.

StanfordParser (DKPro Core version) does the same as MateParser (DKPro Core version):

"[ 0, 2]NSUBJ(nsubj) D0,2 G3,7",

-> There is no dependency for the root token 3,7 at all.

So I guess we have two ways of finding the root dependency:

  1. check for self-linking
  2. check for tokens that are not the dependent in any dependency relation

@reckart
Copy link
Member Author

reckart commented May 18, 2015

Added other modules that handle dependency parses.

@reckart
Copy link
Member Author

reckart commented May 18, 2015

Conll2006Reader models the root as a dependency looping back to itself.

@reckart
Copy link
Member Author

reckart commented May 18, 2015

So again a summary of different components and how then behave for the root nodes when adding a Dependency annotation the CAS. In most of the following English examples [3,7](need) is the root token.

Component Strategy  Example from unit test
Conll2006Reader self-loop  not available
Conll2009Reader self-loop  not available
ClearNlpDependencyParser dangling root [ 0, 2]Dependency(nsubj) D[0,2](We) G[3,7](need)
MaltParser dangling root [ 0, 2]Dependency(nsubj) D[0,2](We) G[3,7](need)
MateParser dangling root [ 0, 2]Dependency(SBJ) D[0,2](We) G[3,7](need)
MstParser self-loop [ 3, 7]Dependency(null) D3,7 G3,7
StanfordParser dangling root [ 0, 2]NSUBJ(nsubj) D[0,2](We) G[3,7](need)
TcfReader self-loop [ 6, 12]Dependency(ROOT) D[6,12](fliegt) G[6,12](fliegt)

Legend:

  • self-loop - roots are marked by a dependency relation in which they are dependent as well as governor
  • dangling root - roots are tokens that are never in the "dependent" role of any dependency relation

@reckart
Copy link
Member Author

reckart commented May 18, 2015

The CONLL writers currently expect a self-looping relation labelled ROOT and will render that as a 0 index. That is how the CONLL readers render the data into the CAS. Consequently, a round-trip works here and we have unit tests for this in place. However, this is inconsistent with the output produced by the dependency parsers integrated in DKPro Core.

I tend towards changing the parser output to mark roots as self-looping.

I am not 100% sure if the parser output should be changed such that these self-looping relations are forcible labelled as ROOT independent of what the parser actually produces.

@cayorodriguez
Copy link

I agree that the best route would be to make the top node self-looping. A good solution for compatibility with CONLL CAS Consummers would be to set two parameters: PARAM_ROOT_INDEX and PARAM_ROOT_LABEL, respectively, for using 0 as the root node index (or alternatively the index of the top token), and specifying either a 'ROOT' or 'Sentence' label for the root element, since both labels are commonly used in dependency tagsets.

If both parameters are not set, the parsers could continue to behave as each is behaving now, to preserve backward compatibility with whatever pipelines are being used with them now.

I don't know, though, if it would be better to make all of them consistent as default (with only a PARAM_ROOT_LABEL to set), and specify a non-mandatory parameter for "old-style" behaviour. This might be the better solution.

What should be avoided at all costs is to not set at all a dependency annotation for the ROOT element, since this node is the central one in the tree, or graph, and should be visible to Consummers and to downstream modules.

Once a decision is reached, it doesn't seem to be too complicated to make all the annotation processes consistent, since the parsers themselves construct the required information internally.

@reckart reckart modified the milestones: 1.7.1, 1.8.0 Dec 14, 2015
reckart added a commit that referenced this issue Dec 29, 2015
- Introduce a new dependency UIMA type ROOT and enforce that it is used for root dependencies.
- Enforce that root dependencies loop to themselves and that each sentence has a root dependency (if it has any dependencies at all)
reckart added a commit that referenced this issue Dec 29, 2015
- Introduce a new dependency UIMA type ROOT and enforce that it is used for root dependencies.
- Enforce that root dependencies loop to themselves and that each sentence has a root dependency (if it has any dependencies at all)
@reckart reckart self-assigned this Dec 29, 2015
@reckart
Copy link
Member Author

reckart commented Dec 29, 2015

Introduced a new ROOT dependency type that is being used to mark the root dependency. Also enforce that all ROOT dependencies loop to themselves. The suggested parameters for the CONLL consumers were not introduced. If these are needed, please open a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants