Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metadata.json for duration of student_placements dataset #503

Closed
amontanez24 opened this issue Jul 9, 2021 · 0 comments · Fixed by #529
Closed

Update metadata.json for duration of student_placements dataset #503

amontanez24 opened this issue Jul 9, 2021 · 0 comments · Fixed by #529
Assignees
Labels
bug Something isn't working
Milestone

Comments

@amontanez24
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: Any
  • Python version: 3.x
  • Operating System: Any

Error Description

When using the load_tabular_demo method to load the student_placements dataset, the types for the duration column are incorrect. Upon further investigation, I noticed that the metadata.json file for that dataset incorrectly annotated the duration column as a Categorical type. The snippet for that is below

{
    "tables": {
        "student_placements": {
            "fields": {
                "start_date": {
                    "type": "datetime",
                    "format": "%Y-%m-%d"
                },
                "end_date": {
                    "type": "datetime",
                    "format": "%Y-%m-%d"
                },
                "salary": {
                    "type": "numerical",
                    "subtype": "integer"
                },
                "duration": {
                    "type": "categorical"
                },

This causes the types for that column to be loaded as strings as seen in the "Steps to Reproduce" section. This causes certain operations to not correctly work during the demo.

Solution

The s3 metadata file should be updated to label the correct type for the column.

Steps to reproduce

student_placements = load_tabular_demo('student_placements')
x = student_placements['duration'].apply(type)
>>> x
0        <class 'str'>
1        <class 'str'>
2        <class 'str'>
3      <class 'float'>
4        <class 'str'>
            ...
210      <class 'str'>
211      <class 'str'>
212      <class 'str'>
213      <class 'str'>
214    <class 'float'>
Name: duration, Length: 215, dtype: object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants