Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detect_from_csvs and detect_from_dataframes methods to MultiTableMetadata #1520

Closed
amontanez24 opened this issue Jul 28, 2023 · 0 comments · Fixed by #1533
Closed

Add detect_from_csvs and detect_from_dataframes methods to MultiTableMetadata #1520

amontanez24 opened this issue Jul 28, 2023 · 0 comments · Fixed by #1533
Assignees
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

As a user, it would be convenient to be able to detect the metadata for a whole dataset at once, instead of one table at a time. This would speed up the process and also enable detection for relationships to be done.

Currently we have the following detect methods in MultiTableMetadata:

These methods only detect one table at a time which is inconvenient.. It also makes it impossible to detect information between tables like relationships. For this reason we want to add two new methods that allow us to detect an entire folder or dictionary of dataframes at once.

Expected behavior

  • Add detect_from_csvs

    • parameters:
      • folder_name: full name of folder where the csvs are stored.
    • behavior: This method should loop through all the csvs in the folder and load them the same way detect_table_from_csv does. Use the name of the csv as the table name (ie. if the file is adults.csv then the table name is adults).
  • Add detect_from_dataframes

    • parameters:
      • data: dictionary mapping table names to dataframes.
    • behavior: This method should loop through the dataframes in the dictionary and detect them one at a time similarly to detect_table_from_dataframe.
metadata.detect_from_csv(
   folder_name='data/'
)

metadata.detect_from_dataframes(
   data={
       'guests': guests_table,
       'hotels': hotels_table
   }
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants