Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification use-case #16

Open
wbecker opened this issue Jun 11, 2017 · 2 comments
Open

Classification use-case #16

wbecker opened this issue Jun 11, 2017 · 2 comments

Comments

@wbecker
Copy link
Contributor

wbecker commented Jun 11, 2017

This project doesn't currently allow for the predicting the type of an input, as there is no sense of knowing to what type an input value maps.

Normally when using a classifier, there is a two stage process.
1 - fit(X, y), using training input and output data
2 - predict(X), using unknown data, and returning the estimated

It would be good if this project presented a similar interface.

I would suggest creating a class, wmd_classifier, which implements these two models.

fit, which would:

  • take in an array of documents and break them down into bows
  • create a WMD instance
  • cache centroids

predict, which would:

  • take in a document
  • break it into a bow
  • calculate its centroid
  • call nearest_neighbours
  • calculate the output type, based on the k nearest neighbours, weighted by their closeness
@wbecker
Copy link
Contributor Author

wbecker commented Jun 11, 2017

I'd be happy to contribute something like this!

@vmarkovtsev
Copy link
Collaborator

vmarkovtsev commented Jun 13, 2017

@wbecker This is 👍
sklearn-like interface would be really useful. Feel free to PR.

My only suggestion is to abstract the way a document is transformed into nBOW. E.g. provide a function in __init__ and let the documents be "objects", with nice defaults for spacy/strings.

And let's name it WmdClassifier. I have just stated the contribution guidelines in https://github.com/src-d/wmd-relax/wiki/Contributions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants