Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MONK's problems categorical features are wrongly represented as continuous/Integer #87

Open
phoeinx opened this issue Nov 24, 2024 · 1 comment

Comments

@phoeinx
Copy link

phoeinx commented Nov 24, 2024

Describe the bug
"MONK's problems" features are all intended to be categorical by the paper creating the synthetic dataset.
In the UCI ML repository, they are all recorded as Integer. This leads to possible disadvantages for models trained on them.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://archive.ics.uci.edu/dataset/70/monk+s+problems, and have a look at the variables table showing Integer as type for all features.
  2. Open the MONK's problems competition paper freely available here: https://www.researchgate.net/publication/2293492_The_MONK's_Problems_A_Performance_Comparison_of_Different_Learning_Algorithms
  3. Look at page 2, section 1.1. "The problem" to see that all features are actually categorical.

Expected behavior
Correct representation as categorical features for features a1,a2,a3,a4,a5,a6.

thank you!

@andrew-wang0
Copy link
Member

andrew-wang0 commented Dec 5, 2024

Hi @phoeinx, thanks for the feedback. I'll defer to our librarians @markellekelly and @rlongjohn to confirm and adjust the dataset if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants