-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support for reading Delimited files #3424
Conversation
Most of functionality is there but needs testing and some error handling is still missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few little comments but basics look good so happy to get it in and then we can iterate
@@ -0,0 +1,94 @@ | |||
from Standard.Base import all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this file should be in Internal as not for end user use
operation. By default, a warning is issued, but the operation proceeds. | ||
If set to `Report_Error`, the operation fails with a dataflow error. | ||
If set to `Ignore`, the operation proceeds without errors or warnings. | ||
read_file : Delimited -> File -> Problem_Behavior -> Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the Delimited
type defined?
read_file
feels the wrong name - read_delimited_file
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that you call it through the module name: Delimited_Reader.read_file
so adding delimited
is a bit redundant.
Something like, if you have StringParser
you usually do StringParser.parse
and not StringParser.parseString
(at least that's how I'd approach it, naming conventions can be a bit subjective I guess).
I can rename to as you suggest :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Delimited
type is of course defined inside of the File_Format
, as it is part of this ADT. It can be technically moved, but if I move Delimited_Reader
to Internal
, it shouldn't be there because it is a public part of the API. I put it in File_Format
because all the other formats were kept there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should be there just didn't see it being imported, so was confused that Delimited worked as opposed to File_Format.Delimited
.
Delimited_Reader.enso
feels the wrong name. I'd say just Delimited.enso
.
The read_file
inside it feels to general to me hence suggesting read_delimited
. Agree is repeating but this is an internal function and otherwise feels too close to File.read
and also would follow the read_text
, read_bytes
pattern.
@@ -64,19 +64,18 @@ from_csv : File.File | Text -> Boolean -> Text -> Table ! Parse_Error | |||
from_csv csv has_header=True prefix='C' = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this file and Delimited_Reader be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid it - Csv
is the old module that should be deprecated and removed once Delimited
is on-par with its functionality.
So making them separate makes it easier to see what is going to be removed. We keep it only because our Table tests heavily rely on this mechanism and Delimited
is not yet fully-featured to replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough - perhaps a task to remove Csv.enso
then so we don't forget
std-bits/table/src/main/java/org/enso/table/read/DelimitedReader.java
Outdated
Show resolved
Hide resolved
They block me from merging, are a micro-fix not related to the PR. Probably better addressed in the Builtins refactor anyway.
Pull Request Description
Implements https://www.pivotaltracker.com/story/show/181823957
Important Notes
Checklist
Please include the following checklist in your PR:
./run dist
and./run watch
.