Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retire SupportsMoveFromTempLocAfterFillingIt from DataUrlTarget #115

Merged
merged 1 commit into from
Oct 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion metrics/coverage_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
93.5700
93.5600
2 changes: 1 addition & 1 deletion metrics/mypy_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
92.3400
92.3300
22 changes: 0 additions & 22 deletions records_mover/records/targets/data_url.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
from ..results import MoveResult
from ..records_directory import RecordsDirectory
from .base import (SupportsMoveFromDataframes,
SupportsMoveFromTempLocAfterFillingIt,
SupportsMoveToRecordsDirectory,
SupportsMoveFromRecordsDirectory)
from ..processing_instructions import ProcessingInstructions
from ...url.base import BaseFileUrl
Expand All @@ -16,7 +14,6 @@


class DataUrlTarget(SupportsMoveFromDataframes,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Targets in Records Mover represent the second part in a move (e.g., in a Redshift to S3 copy, S3 is the target and Redshift is the source).

DataUrlTarget represents any kind of filesystem-like thing - currently it works with file://, gs:// and s3:// URLs, but will probably get sftp:// support at some point as well.

SupportsMoveFromTempLocAfterFillingIt,
SupportsMoveFromRecordsDirectory):
def __init__(self,
output_loc: BaseFileUrl,
Expand Down Expand Up @@ -63,25 +60,6 @@ def move_from_records_directory(self,
records_format.generate_filename('data'): self.output_loc.url
})

def move_from_temp_loc_after_filling_it(self,
records_source:
SupportsMoveToRecordsDirectory,
processing_instructions:
ProcessingInstructions) -> MoveResult:
pis = processing_instructions
records_format = records_source.compatible_format(self)
if records_format is None:
raise NotImplementedError("No compatible records format between "
f"{records_source} and {self}")
with self.output_loc.temporary_directory() as temp_loc:
directory = RecordsDirectory(records_loc=temp_loc)
records_source.\
move_to_records_directory(directory,
records_format=records_format,
processing_instructions=pis)
return self.move_from_records_directory(directory,
processing_instructions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function said "Move something to this filesystem by first creating a temporary directory ON THIS SAME FILESYSTEM and then copying from there". Kind of silly, since you can also just copy directly to the intended destination (see the

The purpose of move_from_temp_loc_after_filling_it() is for systems like Redshift where the fastest way to copy things in is from an S3 bucket. In those cases, even if the source isn't an S3 bucket, we want to copy to a temporary S3 bucket before loading.

I'm guessing this function in this file was implemented at some point of time in the past in order to work around some wart or another in the move() algorithm. The move() algorithm uses these Python interfaces to figure out a way to get data from the source to the target in a multiple dispatch way.

There's been a lot of incremental development and refactoring among all of these things, and evolutionary dead-ends are not unknown :)

def can_move_from_this_format(self,
source_records_format: BaseRecordsFormat) -> bool:
if self.records_format is None:
Expand Down
35 changes: 0 additions & 35 deletions tests/unit/records/targets/test_data_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,38 +113,3 @@ def test_move_from_records_directory(self, mock_MoveResult):
})
mock_records_format.generate_filename.assert_called_with('data')
self.assertEqual(out, mock_MoveResult.return_value)

@patch('records_mover.records.targets.data_url.isinstance')
@patch('records_mover.records.targets.data_url.RecordsDirectory')
@patch('records_mover.records.targets.data_url.DelimitedRecordsFormat')
def test_move_from_temp_loc_after_filling_it(self,
mock_DelimitedRecordsFormat,
mock_RecordsDirectory,
mock_isinstance):
mock_output_url = 'whatever://foo/foo.csv'
mock_output_loc = MagicMock(name='output_loc', spec=BaseFileUrl)
mock_output_loc.url = mock_output_url
mock_default_records_format = mock_DelimitedRecordsFormat.return_value
mock_records_format = mock_default_records_format.alter_hints.return_value
data_url_target = DataUrlTarget(output_loc=mock_output_loc,
records_format=None)

mock_records_source = Mock(name='records_source')
mock_records_format = mock_records_source.compatible_format.return_value
mock_processing_instructions = Mock(name='processing_instructions')
mock_pis = mock_processing_instructions
mock_temp_loc = mock_output_loc.temporary_directory.return_value.__enter__.return_value
mock_directory = mock_RecordsDirectory.return_value
mock_directory.load_format.return_value = mock_records_format

out = data_url_target.move_from_temp_loc_after_filling_it(mock_records_source,
mock_processing_instructions)
mock_RecordsDirectory.assert_called_with(records_loc=mock_temp_loc)
mock_records_source.move_to_records_directory.\
assert_called_with(mock_directory,
records_format=mock_records_format,
processing_instructions=mock_pis)

mock_filename = mock_records_format.generate_filename.return_value
self.assertEqual(out.move_count, None)
self.assertEqual(out.output_urls, {mock_filename: mock_output_loc.url})