-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: adds WSP functional test for local package #107
tests: adds WSP functional test for local package #107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
hepcrawl/spiders/wsp_spider.py
Outdated
@@ -89,16 +89,24 @@ def __init__(self, package_path=None, ftp_folder="WSP", ftp_host=None, ftp_netrc | |||
def start_requests(self): | |||
"""List selected folder on remote FTP and yield new zip files.""" | |||
if self.package_path: | |||
yield Request(self.package_path, callback=self.handle_package_file) | |||
dummy, new_files = local_list_files( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In python, for non-used variables like this dummy
it's 'convention' to use just _
.
hepcrawl/spiders/wsp_spider.py
Outdated
self.target_folder | ||
) | ||
|
||
for _file in new_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of _file
and new_files
, it's better to use file_path
and new_files_paths
so you know it's a string representing the file path, and not a file object (that is usually clear when you write the code yourself, but for someone else it might become very confusing when debugging).
hepcrawl/utils.py
Outdated
|
||
|
||
def local_list_files(local_folder, target_folder): | ||
"""List files from given package folder to target folder.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this function is supposed to do by the comment, can you rephrase? What does it mean that the files 'are listed from a dir to another dir'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was not clear before either, but better not add more confusion ;)
hepcrawl/utils.py
Outdated
return list_files(local_folder, target_folder, files) | ||
|
||
|
||
def list_files(remote_folder, target_folder, files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be called list_missing_files
instead?
The files
parameter would be better called file_names
.
It feels to me that this is doing too many things at the same time... can you think of a way to split getting the list of missing files, with adding the paths to the files? (so there's no need to return both all_files
and missing_files
, making it easier to understand, test and use)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the all_files
parameter does not seem to be used anywhere... we can remove it, then there's no need for a dummy
variable anymore.
tests/functional/WSP/test_wsp.py
Outdated
cleaner(package_location + 'IDAQPv20i01-03160015-1510863') | ||
|
||
|
||
def cleaner(path='/tmp/WSP/'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad name for a function, cleaner gives the impression to be an object, that you call on (it's an entity that has meaning by itself) while this is actually a function, maybe rename to something like clean_dir
?
Can you also try to separate them in individual commits? (it's a bit painful to do now, so next time try to create a lot of commits instead, that can be squashed easily later ;) ) |
btw. ran locally the functional tests with this pr, and got error (checking if it's my issue):
|
81edfed
to
8c9fab2
Compare
hepcrawl/utils.py
Outdated
|
||
def list_missing_files(remote_folder, target_folder, file_names): | ||
missing_files = [] | ||
for filename in file_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the same style for filename
and file_names
tests/functional/WSP/test_wsp.py
Outdated
netrc_location = os.path.join(os.path.dirname( | ||
os.path.realpath(__file__)), | ||
'fixtures/ftp_server/.netrc' | ||
os.path.join('fixtures', 'ftp_server', '.netrc') | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indent here is messed up
tests/functional/WSP/test_wsp.py
Outdated
} | ||
|
||
clean_dir() | ||
clean_dir(package_location + 'IDAQPv20i01-03160015-1510863') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this missing some os.path.join
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, dirs, files in next(os.walk(package_location)):
for dir_name in dirs:
clean_dir(dir_name)
for file_name in files:
os.unlink(file_name)
tests/functional/WSP/test_wsp.py
Outdated
def set_up_local_environment(): | ||
package_location = os.path.join(os.path.dirname( | ||
os.path.realpath(__file__)), | ||
'fixtures/ftp_server/WSP/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad indent
tests/functional/WSP/test_wsp.py
Outdated
) | ||
|
||
assert [override_generated_fields(result) for result in results] == \ | ||
[override_generated_fields(expected) for expected in expected_results] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Save the lists in temp variables, then just compare them.
tests/functional/WSP/test_wsp.py
Outdated
spider='WSP', | ||
settings={}, | ||
**set_up_environment.get('CRAWLER_ARGUMENTS') | ||
**set_up_local_environment.get('CRAWLER_ARGUMENTS') | ||
) | ||
|
||
gottern_results = [override_generated_fields(result) for result in results] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/gottern/gotten/
fe57c0c
to
3824bc9
Compare
tests/functional/WSP/test_wsp.py
Outdated
for dir_name in dirs: | ||
clean_dir(os.path.join(package_location, dir_name)) | ||
for file_name in files: | ||
if '.zip' not in file_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not file_name.endswith('.zip')
tests/functional/WSP/test_wsp.py
Outdated
|
||
|
||
def remove_generated_files(package_location): | ||
for _, dirs, files in os.walk(package_location): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a for, you can just use the first result:
_, dirs, files = next(os.walk(...))
for dir_name in dirs:
...
hepcrawl/utils.py
Outdated
|
||
|
||
def local_list_files(local_folder, target_folder): | ||
"""List files from given local path to target folder.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment does not make much sense.
3824bc9
to
1303c08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing and we merge right away 👍
hepcrawl/utils.py
Outdated
missing_files.append(source_file) | ||
all_files.append(source_file) | ||
return all_files, missing_files | ||
file_names = host.listdir(host.curdir + '/' + server_folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join
* Adds WSP functional test for local package path. * Refactored existing WSP functional test (add setup-teardown fixtures). * Refactored `utils.ftp_list_files` to `utils.list_files` for reusability. * Fixed WSP local package crawling mechanism. Closes inspirehep#106 Signed-off-by: Spiros Delviniotis <[email protected]>
1303c08
to
94af814
Compare
@david-caro Ready! 💃 |
utils.ftp_list_files
toutils.list_files
for re-usability.Closes #106
Signed-off-by: Spiros Delviniotis [email protected]