-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overwrite specified schema (selectively) #754
Milestone
Comments
This was referenced May 17, 2024
lars-reimann
added a commit
that referenced
this issue
Jan 12, 2025
Closes #875 Closes #877 Closes partially #977 ### Summary of Changes Stabilize the API of the `Table` class. This PR introduces several breaking changes to this class: - All optional parameters are now keyword-only, so we can reposition them later. - The `data` parameter of `__init__` is now required. - Rename `remove_columns_except` to `select_columns` - The new method can also be called with a callback that determines which columns to select. - Rename `add_table_as_columns` to `add_tables_as_columns` - Multiple tables can now be passed at once. - Rename `add_table_as_rows` to `add_tables_as_rows` - Multiple tables can now be passed at once. It also adds new functionality throughout the library: - New method `Table.add_index_column` to add a new column with auto-incrementing integer values to a table. - New method `Table.filter_rows` to keep only the rows matched by some predicate. - New method `Table.filter_rows_by_column` to keep only the rows that have a value in a specific column that matches some predicate. - New parameter `random_seed` for `Table.shuffle_rows` and `Table.split_rows` to control the pseudorandom number generator. Previously, the methods were deterministic, but the seed was hidden. - New parameter `missing_value_ratio_threshold` of `Table.remove_columns_with_missing_values` to be able to keep columns with only a few missing values. - Various static factory methods under `ColumnType` to instantiate column types. This prepares for #754. Finally, the methods `Table.summarize_statistics` and `Column.summarize_statistics` are now considerably faster. --------- Co-authored-by: megalinter-bot <[email protected]>
Merged
lars-reimann
added a commit
that referenced
this issue
Jan 14, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem?
Sometimes, the data type of a column is inferred incorrectly.
Desired solution
Allow overwriting the schema when constructing a table.
Possible alternatives (optional)
No response
Screenshots (optional)
No response
Additional Context (optional)
No response
The text was updated successfully, but these errors were encountered: