Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retire SupportsMoveFromTempLocAfterFillingIt from DataUrlTarget #115

Merged
merged 1 commit into from
Oct 29, 2020

Conversation

vinceatbluelabs
Copy link
Contributor

@vinceatbluelabs vinceatbluelabs commented Oct 27, 2020

It's not needed anymore - the integration tests still pass, and it doesn't make any sense on its face.

I ran across the need to change this interface (see #114) and noticed that it was being used in a nonsensical way here.

It's not needed anymore, presumably due to changes in the move()
algorithm.
processing_instructions=pis)
return self.move_from_records_directory(directory,
processing_instructions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function said "Move something to this filesystem by first creating a temporary directory ON THIS SAME FILESYSTEM and then copying from there". Kind of silly, since you can also just copy directly to the intended destination (see the

The purpose of move_from_temp_loc_after_filling_it() is for systems like Redshift where the fastest way to copy things in is from an S3 bucket. In those cases, even if the source isn't an S3 bucket, we want to copy to a temporary S3 bucket before loading.

I'm guessing this function in this file was implemented at some point of time in the past in order to work around some wart or another in the move() algorithm. The move() algorithm uses these Python interfaces to figure out a way to get data from the source to the target in a multiple dispatch way.

There's been a lot of incremental development and refactoring among all of these things, and evolutionary dead-ends are not unknown :)

@@ -16,7 +14,6 @@


class DataUrlTarget(SupportsMoveFromDataframes,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Targets in Records Mover represent the second part in a move (e.g., in a Redshift to S3 copy, S3 is the target and Redshift is the source).

DataUrlTarget represents any kind of filesystem-like thing - currently it works with file://, gs:// and s3:// URLs, but will probably get sftp:// support at some point as well.

@vinceatbluelabs
Copy link
Contributor Author

Hi @crvena-sonja! Just pinging on this review. This is review is mainly for a second developer to have a little more familiarity with some Records Mover internals.

Copy link

@crvena-sonja crvena-sonja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☠️

@vinceatbluelabs vinceatbluelabs merged commit fed4f9d into master Oct 29, 2020
@vinceatbluelabs vinceatbluelabs deleted the retire_protocol branch October 29, 2020 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants