Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_row and add_rows should not raise if schemata differ #127

Closed
lars-reimann opened this issue Mar 30, 2023 · 3 comments · Fixed by #432
Closed

add_row and add_rows should not raise if schemata differ #127

lars-reimann opened this issue Mar 30, 2023 · 3 comments · Fixed by #432
Assignees
Labels
enhancement 💡 New feature or request released Included in a release

Comments

@lars-reimann
Copy link
Member

lars-reimann commented Mar 30, 2023

Is your feature request related to a problem?

At the moment, the add_row and add_rows methods of Table raise an exception if the schemata of the current Table and the new Row differ:

if self._schema != row.schema:
    raise SchemaMismatchError()

This is a particular problem, if the Table is empty.

Desired solution

  • If the schema of the Table is empty (no columns), accept any Row and use the schema of the Row as the schema of the new Table
  • Otherwise, verify that the list of column names is the same for the Table and the Row. Don't fail if the types differ, but instead widen them so all values in the Table and the values in the Row are included. For each column, the resulting type should be the lowest common supertype of the type of the column in the Table and the Row

Example:

  • Schema of Table:
{
   "a": Integer()
}
  • Schema of Row:
{
   "a": RealNumber()
}
  • Schema of resulting Table:
{
   "a": RealNumber() # Common supertype of `Integer()` and `RealNumber()`
}

Type hierarchy:

  • Anything
    • RealNumber
      • Integer
    • Boolean
    • String

Nullability of one of the types gets propagated to the result, i.e. iff any of the types was nullable, the result type is also nullable.

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

@lars-reimann lars-reimann added the enhancement 💡 New feature or request label Mar 30, 2023
@github-project-automation github-project-automation bot moved this to Backlog in Library Mar 30, 2023
lars-reimann added a commit that referenced this issue Mar 30, 2023
### Summary of Changes

We no longer raise an exception when a user creates an empty `Table`
without specifying a schema. `Table([])` is now allows.

`add_row` and `add_rows` must still be adjusted (#127) to allow adding
rows to such an empty `Table`.

---------

Co-authored-by: lars-reimann <[email protected]>
lars-reimann pushed a commit that referenced this issue Mar 31, 2023
## [0.8.0](v0.7.0...v0.8.0) (2023-03-31)

### Features

* create empty `Table` without schema ([#128](#128)) ([ddd3f59](ddd3f59)), closes [#127](#127)
* improve `ColumnType`s ([#132](#132)) ([1786a87](1786a87)), closes [#113](#113)
* infer schema of row if not passed explicitly ([#134](#134)) ([c5869bb](c5869bb)), closes [#15](#15)
* new method `is_fitted` to check whether a model is fitted ([#130](#130)) ([8e1c3ea](8e1c3ea))
* new method `is_fitted` to check whether a transformer is fitted ([#131](#131)) ([e20954f](e20954f))
* rename `drop_XY` methods of `Table` to `remove_XY` ([#122](#122)) ([98d76a4](98d76a4))
* rename `fit_transform` to `fit_and_transform` ([#119](#119)) ([76a7112](76a7112)), closes [#112](#112)
* rename `shuffle` to `shuffle_rows` ([#125](#125)) ([ea21928](ea21928))
* rename `slice` to `slice_rows` ([#126](#126)) ([20d21c2](20d21c2))
* rename `TableSchema` to `Schema` ([#133](#133)) ([1419d25](1419d25))
@zzril zzril moved this from Backlog to Todo in Library May 19, 2023
@Marsmaennchen221 Marsmaennchen221 self-assigned this May 26, 2023
@Marsmaennchen221 Marsmaennchen221 moved this from Todo to In Progress in Library May 26, 2023
@alex-senger alex-senger self-assigned this May 26, 2023
@alex-senger
Copy link
Contributor

We'll add this functionality to the from_rowsmethod too.

@Marsmaennchen221
Copy link
Contributor

Waiting for #322 for updates with ColumnTypes

@Marsmaennchen221 Marsmaennchen221 moved this from In Progress to 🧱 Blocked in Library Jun 16, 2023
alex-senger added a commit that referenced this issue Jul 12, 2023
…ma conversion when creating a new table (#432)

Closes #404 
Closes #322  
Closes #127  
This Pull request merges the issues #322 and #127.


### Summary of Changes

<!-- Please provide a summary of changes in this pull request, ensuring
all changes are explained. -->

---------

Co-authored-by: megalinter-bot <[email protected]>
Co-authored-by: sibre28 <[email protected]>
Co-authored-by: Alexander <[email protected]>
@github-project-automation github-project-automation bot moved this from 🧱 Blocked to ✔️ Done in Library Jul 12, 2023
lars-reimann pushed a commit that referenced this issue Jul 13, 2023
## [0.15.0](v0.14.0...v0.15.0) (2023-07-13)

### Features

* Add copy method for tables ([#405](#405)) ([72e87f0](72e87f0)), closes [#275](#275)
* add gaussian noise to image ([#430](#430)) ([925a505](925a505)), closes [#381](#381)
* add schema conversions when adding new rows to a table and schema conversion when creating a new table ([#432](#432)) ([6e9ff69](6e9ff69)), closes [#404](#404) [#322](#322) [#127](#127) [#322](#322) [#127](#127)
* add test for empty tables for the method `Table.sort_rows` ([#431](#431)) ([f94b768](f94b768)), closes [#402](#402)
* added color adjustment feature ([#409](#409)) ([2cbee36](2cbee36)), closes [#380](#380)
* added test_repr table tests ([#410](#410)) ([cb77790](cb77790)), closes [#349](#349)
* discretize table ([#327](#327)) ([5e3da8d](5e3da8d)), closes [#143](#143)
* Improve error handling of TaggedTable ([#450](#450)) ([c5da544](c5da544)), closes [#150](#150)
* Maintain tagging in methods inherited from `Table` class ([#332](#332)) ([bc73a6c](bc73a6c)), closes [#58](#58)
* new error class `OutOfBoundsError` ([#438](#438)) ([1f37e4a](1f37e4a)), closes [#262](#262)
* rename several `Table` methods for consistency ([#445](#445)) ([9954986](9954986)), closes [#439](#439)
* suggest similar columns if column gets accessed that doesnt exist ([#385](#385)) ([6a097a4](6a097a4)), closes [#203](#203)

### Bug Fixes

* added the missing ids in parameterized tests ([#412](#412)) ([dab6419](dab6419)), closes [#362](#362)
* don't warn if `Imputer` transforms column without missing values ([#448](#448)) ([f0cb6a5](f0cb6a5))
* Warnings raised by underlying seaborn and numpy libraries  ([#425](#425)) ([c4143af](c4143af)), closes [#357](#357)
@lars-reimann
Copy link
Member Author

🎉 This issue has been resolved in version 0.15.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment