Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use original (or at least consistent) outcome labels in id2outcome.txt #252

Closed
logological opened this issue Jul 13, 2015 · 1 comment
Closed
Assignees
Labels

Comments

@logological
Copy link
Member

Currently id2outcome.txt uses numeric IDs for the classification outcomes, but (at least for cross-validation experiments) these IDs are not consistent from file to file. For example, BrownPosDemo does a two-fold cross-validation. One of the id2outcome.txt file uses the following mapping of numeric IDs to the original labels:

0=NPg 2=JJ 1=(null) 3=RB 5=TO 4=PPS 6=RP 7=NP 8=NN 10=VBN 9=VB 11=pct 12=PPO 13=BE 14=MD 15=DTS 16=VBZ 17=AT 18=IN 19=CS 20=VBG 21=VBD 22=BEDZ 23=NNS 24=CC 25=CD 26=AP 27=PPg

The other id2outcome.txt file uses a slightly different mapping:

0=NPg 2=(null) 1=JJ 3=RB 5=PPS 4=TO 6=RP 7=NP 8=NN 10=VB 9=VBN 11=pct 12=PPO 13=BE 14=MD 15=DTS 16=VBZ 17=AT 18=IN 19=CS 20=VBG 21=VBD 22=BEDZ 23=NNS 24=CC 25=CD 26=AP 27=PPg

In order to get the raw classifications, not only do I need to combine the two files, but I first have to manually un-map all the numeric IDs to their original labels.

It would be better if id2outcome.txt didn't use numeric IDs at all, but rather used the original label IDs. If for some reason the mapping to numeric IDs is necessary, it would be helpful if the mapping were consistent across files.

@daxenberger
Copy link
Member

InnerBatchUsingTCEvaluationReport now generates a file "id2harmonizedOutcome.txt" in the crossvalidation context folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants