Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coercing column type in get_characters() is broken for polymorphic and uncertain states #186

Closed
hlapp opened this issue Oct 26, 2018 · 0 comments
Labels

Comments

@hlapp
Copy link
Contributor

hlapp commented Oct 26, 2018

The assignment of class for each column in get_characters() based on the data type assigned in the NeXML schema is broken. The result is that polymorphic and uncertain states in a StandardCells (discrete or continuous traits) matrix will be lost (they become NA as a result of the coercion).

The reason this has remained largely obscured is because the current implementation of the coercion has a bug that accidentally limited the coercion to only one of the trait columns. In the tests that have been run with data including polymorphic states that column happened not to be one with polymorphic states. (The problem surfaced when tested with a matrix that contained only one trait column, with a polymorphic state.)

Arguably, the coercion of data type for a column from the character default to a numeric type is useful, and arguably easier done by this library than left to the user. So in the interest of user-friendliness, the type coercion shouldn't just be removed altogether. However, the algorithm needs to change to coerce type to a numeric type only if there are no polymorphic or uncertain states in a column.

@hlapp hlapp added the bug label Oct 26, 2018
hlapp added a commit that referenced this issue Oct 26, 2018
The algorithm implemented here keeps the previous mechanism for column
type coercion, however it is now only applied for characters that don't
have a polymorpic or uncertain state.

Accomplishing this without depending on correctly second-guessing the
symbol requires knowing for each state whether it is polymorphic,
uncertain, or "standard". To not have do this work again if the user
wants to have this knowledge as well, we return this on demand in the
form of a second matrix whose columns are factors with levels `standard`,
`polymorphic`, and `uncertain`. The matrix will have `NA` where the
state is `NA`.

Fixes #186. Closes #175.
cboettig pushed a commit that referenced this issue Oct 26, 2018
…tes (#188)

* Fixes type coercion for morphological characters with polymorphic states

The algorithm implemented here keeps the previous mechanism for column
type coercion, however it is now only applied for characters that don't
have a polymorpic or uncertain state.

Accomplishing this without depending on correctly second-guessing the
symbol requires knowing for each state whether it is polymorphic,
uncertain, or "standard". To not have do this work again if the user
wants to have this knowledge as well, we return this on demand in the
form of a second matrix whose columns are factors with levels `standard`,
`polymorphic`, and `uncertain`. The matrix will have `NA` where the
state is `NA`.

Fixes #186. Closes #175.

* Adds tests for obtaining state type matrix from get_characters

* Regenerates man page for get_characters()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant