Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add appending support for Delimited files #3573

Merged
merged 15 commits into from
Jul 11, 2022

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Jul 7, 2022

Pull Request Description

Implements https://www.pivotaltracker.com/story/show/182309839

Important Notes

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the
    Scala,
    Java,
    and
    Rust
    style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH
      ./run ide dist and ./run ide watch.

@radeusgd radeusgd self-assigned this Jul 7, 2022
@radeusgd radeusgd changed the base branch from develop to wip/radeusgd/error-catch-by-type July 8, 2022 15:02
@radeusgd radeusgd force-pushed the wip/radeusgd/delimited-append-182309839 branch from c980a79 to 4f8aca5 Compare July 8, 2022 15:02
@radeusgd radeusgd force-pushed the wip/radeusgd/error-catch-by-type branch from 3736803 to 5d3ae84 Compare July 8, 2022 16:22
@radeusgd radeusgd force-pushed the wip/radeusgd/delimited-append-182309839 branch from 00242dd to 9abbb56 Compare July 8, 2022 17:26
@radeusgd radeusgd marked this pull request as ready for review July 8, 2022 17:27
@radeusgd radeusgd requested review from 4e6 and jdunkerley as code owners July 8, 2022 17:27
@radeusgd radeusgd requested a review from hubertp July 8, 2022 17:29
Base automatically changed from wip/radeusgd/error-catch-by-type to develop July 11, 2022 08:26
@mergify mergify bot requested a review from JaroslavTulach as a code owner July 11, 2022 08:26
Copy link
Collaborator

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

## PRIVATE
Reads the beginning of the file to detect the existing headers and column
count.
detect_headers : File -> File_Format.Delimited -> Detected_Headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kinda feels like this should use a Maybe and be empty instead of using a separate new marker type No_Data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought it makes sense to create a domain-specific ADT for all the cases here. But your solution also sounds good to me, so I can switch to it. I think I'd use a Nothing as we know it would either be a Nothing or one of the two other ADT cases - using Maybe seems like an overkill to me because I want to represent 3 states and not a presence or lack of two states.

Comment on lines +137 to +140
should_write_headers headers = case headers of
True -> True
Infer -> True
False -> False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
should_write_headers headers = case headers of
True -> True
Infer -> True
False -> False
should_write_headers headers = case headers of
Infer -> True
_ -> headers

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, well it can also be:

case headers of
    False -> False
    _ -> True

but I wanted to make it explicit which of the three options map to which. IMO the codes where the other two options are collapsed are slightly less readable - it is less clear which options map to which.

Comment on lines 311 to 314
if (effectiveColumnNames == null) {
detectHeaders();
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like ensureHeadersDetected could be called here instead for DRY?

skipFirstRows();
Row firstRow = internalReadNextRow();
if (firstRow == null) {
effectiveColumnNames = new String[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this return null according to the comment on getDefinedColumnNames?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

effectiveColumnNames and definedColumnNames are two distinct concepts.

test/Table_Tests/data/.gitignore Show resolved Hide resolved
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but looks good to me.


private record Row(long lineNumber, String[] cells) {}

private final Deque<Row> pendingRows = new ArrayDeque<>(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a Stack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? I need it to be FIFO, but stack is FILO, no? Or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry meant a queue. We don't need the double ended version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense then.

Still - I don't know a Queue interface implementation in Java's stdlib that is not also a Deque (apart from the BlockingQueue but we certainly don't need the concurrency features here).

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but I can make the type Queue while still instantiating the ArrayDeque. That sounds good.

@radeusgd radeusgd force-pushed the wip/radeusgd/delimited-append-182309839 branch from 9abbb56 to 12b0e42 Compare July 11, 2022 08:53
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jul 11, 2022
@mergify mergify bot merged commit df10e4b into develop Jul 11, 2022
@mergify mergify bot deleted the wip/radeusgd/delimited-append-182309839 branch July 11, 2022 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants