-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tfidf runtime enhancement changes #1571
tfidf runtime enhancement changes #1571
Conversation
cc @miguelgfierro Thanks! |
@AdityaSoni19031997 approved! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good contribution, thanks!
Thanks Miguel! Looking forward to explore bits and pieces of this repository! |
cc @miguelgfierro I am just curious, Why the repository doesn't use PyTorch? |
hehe, good question. Parts of the code in this repo are old, Microsoft Research used to use TF, and now they are moving to PyTorch. In future releases you will probably see more code in PyTorch, but we will keep supporting TF. |
Haha, Not a TF fan! I like PyTorch more than TF. Is there any brach where people are working in porting the snips to torch? Thanks. |
we are not planning to port the current code to PyTorch, the future one will be developed in PyTorch |
Description
The minimal changes made to the
tf_idf_utils
file in this PR helps in reducing the overall runtime by avoiding the looping and slicing the pandas DataFrame.The PR helps in resolving the issue raised in #1568.
Checklist:
staging branch
and not tomain branch
.Looking forward to the feedback on the PR and any other changes needed.
Best,
Aditya.