-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/RowProcessor-replaceNewlinesWithSpaces #137
feature/RowProcessor-replaceNewlinesWithSpaces #137
Conversation
Data/src/test/java/org/tribuo/data/columnar/RowProcessorTest.java
Outdated
Show resolved
Hide resolved
Data/src/test/java/org/tribuo/data/columnar/RowProcessorTest.java
Outdated
Show resolved
Hide resolved
Data/src/test/java/org/tribuo/data/columnar/RowProcessorTest.java
Outdated
Show resolved
Hide resolved
cac2bd6
to
d8347af
Compare
4175623
to
00c24bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I originally wrote this, and revisiting it now, I'm not convinced that the default behavior is correct. That is, we should not be altering the incoming data in RowProcessor
before it is fed to the feature processors. In a future version we'll probably want to change that, but to avoid breaking changes this is probably the best interim solution. The constructor situation on RowProcessor
is also in great need of harmonization, probably through consolidation into a builder pattern.
Thanks for putting in the work to put it in, and for bringing this code to our attention again.
Thanks. As Jack said, we probably want to revisit this as part of a larger refactor to make |
Description
Add optional
boolean
flagRowProcessor.replaceNewlinesWithSpaces
that controls whether or not to replace newlines with spaces in values before passing them to field processors (don't change default behavior by defaulting this totrue
).Motivation
TextFieldProcessor
s may wish to be aware of newlines in the input and encode them as features. Example encoded as a unit test.