Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for reading Delimited files #3424

Merged
merged 13 commits into from
Apr 29, 2022

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Apr 28, 2022

Pull Request Description

Implements https://www.pivotaltracker.com/story/show/181823957

Important Notes

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the Scala, Java, and Rust style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

@radeusgd radeusgd self-assigned this Apr 28, 2022
@radeusgd radeusgd marked this pull request as ready for review April 28, 2022 21:26
@radeusgd radeusgd requested review from 4e6 and jdunkerley as code owners April 28, 2022 21:26
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few little comments but basics look good so happy to get it in and then we can iterate

@@ -0,0 +1,94 @@
from Standard.Base import all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this file should be in Internal as not for end user use

operation. By default, a warning is issued, but the operation proceeds.
If set to `Report_Error`, the operation fails with a dataflow error.
If set to `Ignore`, the operation proceeds without errors or warnings.
read_file : Delimited -> File -> Problem_Behavior -> Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the Delimited type defined?
read_file feels the wrong name - read_delimited_file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that you call it through the module name: Delimited_Reader.read_file so adding delimited is a bit redundant.

Something like, if you have StringParser you usually do StringParser.parse and not StringParser.parseString (at least that's how I'd approach it, naming conventions can be a bit subjective I guess).

I can rename to as you suggest :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Delimited type is of course defined inside of the File_Format, as it is part of this ADT. It can be technically moved, but if I move Delimited_Reader to Internal, it shouldn't be there because it is a public part of the API. I put it in File_Format because all the other formats were kept there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be there just didn't see it being imported, so was confused that Delimited worked as opposed to File_Format.Delimited.

Delimited_Reader.enso feels the wrong name. I'd say just Delimited.enso.
The read_file inside it feels to general to me hence suggesting read_delimited. Agree is repeating but this is an internal function and otherwise feels too close to File.read and also would follow the read_text, read_bytes pattern.

@@ -64,19 +64,18 @@ from_csv : File.File | Text -> Boolean -> Text -> Table ! Parse_Error
from_csv csv has_header=True prefix='C' =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this file and Delimited_Reader be merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid it - Csv is the old module that should be deprecated and removed once Delimited is on-par with its functionality.

So making them separate makes it easier to see what is going to be removed. We keep it only because our Table tests heavily rely on this mechanism and Delimited is not yet fully-featured to replace it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - perhaps a task to remove Csv.enso then so we don't forget

@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Apr 29, 2022
They block me from merging, are a micro-fix not related to the PR.
Probably better addressed in the Builtins refactor anyway.
@mergify mergify bot merged commit 8219dca into develop Apr 29, 2022
@mergify mergify bot deleted the wip/radeusgd/file-read-delimited-181823957 branch April 29, 2022 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants