Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Introduce Abstract Base Class Model and add fit and predict methods to Table #352

Closed
zzril opened this issue Jun 9, 2023 · 2 comments
Labels
cleanup 🧹 Refactorings and other tasks that improve the code wontfix This will not be worked on

Comments

@zzril
Copy link
Contributor

zzril commented Jun 9, 2023

Is your feature request related to a problem?

  • Inconsistency: For Transformers, we have both a transform method in the Transformer class as well as a transform_table in the Table class. For the (models' and transformers') fit methods, there is no such method inside the Table class.
  • Classifiers and Regressors share API design and conceptual functionality, but there is no suitable common super type that allows for uniform handling.

Desired solution

Introduce new class Model as parent class for Classifier and Regressor.

Introduce methods:

  • fit_model(Model) to Table class
  • fit_transformer(Transformer) to Table class
  • predict(Model) to Table class
    (Analogous to transform_table method in Table class.)

Override these appropriately in TaggedTable (see #58).

@zzril zzril added enhancement 💡 cleanup 🧹 Refactorings and other tasks that improve the code labels Jun 9, 2023
@github-project-automation github-project-automation bot moved this to Backlog in Library Jun 9, 2023
@zzril zzril changed the title refactor: Introduce Model ABC and allow for double dispatch refactor: Introduce Abstract Base Class Model and allow for double dispatch Jun 9, 2023
@zzril zzril changed the title refactor: Introduce Abstract Base Class Model and allow for double dispatch refactor: Introduce Abstract Base Class Model and add fit and predict methods to Table Jun 9, 2023
@lars-reimann
Copy link
Member

lars-reimann commented Jun 23, 2023

So far, this superclass is something I've deliberately avoided since I don't see much use in it.

  • There are different types of ML models, which require different training data. Classifiers and regressors are both supervised models, requiring labeled data (TaggedTable). Unsupervised models, for example, could work with a normal Table. Therefore, at best we could create a SupervisedModel class and make Classifier and Regressor subclasses of that.
  • Even then, we usually need to know whether we are dealing with a classifier or regressor, or we wouldn't be able to use any metrics.
  • Because of this, we would also need multiple methods on Table or its subclasses (e.g. fit_supervised_model on TaggedTable or fit_unsupervised_model on Table).
  • Unsupervised models also don't make predictions, so we also can't just add a predict method to Table but need multiple.
  • We might also want to train / make predictions with supervised models on other types of data like images, so those also need additional methods.

Overall, I think this proposal pollutes the data containers too much. Moreover, it introduces a cyclic dependency between the data package and the ml package, which can lead to numerous problems in Python. At the moment we only have a dependency from ml to data, which also matches the conceptual model I have of the domain (you need data to do ML but don't need ML to work with data).

Because of this, I'd strongly vote against

  • Model class
  • fit_model method on Table
  • predict method on Table

I also don't like fit_transformer on Table since it works mostly on the Transformer rather than the Table and, thus, belongs to the Transformer class. This is opposed to transform_table, which mostly works on the Table, so it makes sense to have it in the Table class. The transform methods on Transformer are just necessary to have different transformation methods (Strategy design pattern).

This leaves a potential SupervisedModel ABC, with Classifier and Regressor as subclasses. But I'd only add that once we have a use case for it.

@lars-reimann lars-reimann added the wontfix This will not be worked on label Jun 23, 2023
@lars-reimann
Copy link
Member

Superseded by #377.

@lars-reimann lars-reimann closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
@github-project-automation github-project-automation bot moved this from Backlog to ✔️ Done in Library Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 Refactorings and other tasks that improve the code wontfix This will not be worked on
Projects
Archived in project
Development

No branches or pull requests

2 participants