-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sentence labeler #3570
Add sentence labeler #3570
Conversation
fdae4ef
to
a711cb6
Compare
c5a7277
to
bc7fa10
Compare
I addressed your change request, is there anything else that you need me to change or can this be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this and sorry for taking so long to review! See the comments for suggested changes.
Generally, I think this is quite a useful helper function for all Flair dataset classes that have annotations labeled as character offsets.
Regarding the chunking/truncation, it would be nice in the future to have such functionality be attached to the Corpus
class, similar to the filter_long_sentences
method but with truncation or chunking rather than filtering. This way, it could be used for any corpus.
f726672
to
082e845
Compare
This is a simplified version of the chunking utility provided in #3520