-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
desy: introduce desy spider #155
desy: introduce desy spider #155
Conversation
07cf5e6
to
54970bf
Compare
abcb28c
to
840f18a
Compare
840f18a
to
066864a
Compare
This is still missing the part of the media loader that populates the fft_file_paths no? |
btw. I want to pair program this with you :) |
@david-caro the
@david-caro Sure! That would be great! 😄 |
7baed3d
to
9c9bc7d
Compare
tests/unit/test_arxiv_all.py
Outdated
from scrapy.http import TextResponse | ||
def _get_processed_item(record, spider): | ||
item = pipeline.process_item(record, spider) | ||
return item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/item/record
tests/unit/test_arxiv_all.py
Outdated
pipeline = InspireCeleryPushPipeline() | ||
pipeline.open_spider(spider) | ||
return [pipeline.process_item(record, spider) for record in records] | ||
|
||
return [_get_processed_item(parsed_item, spider) for parsed_item in parsed_items] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/_get_processed_item/_get_processed_record/
tests/unit/test_arxiv_all.py
Outdated
def many_results(spider): | ||
"""Return results generator from the arxiv spider. Tricky fields, many | ||
records. | ||
""" | ||
from scrapy.http import TextResponse | ||
def _get_processed_item(record, spider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/record/item/
return parsed_record | ||
|
||
parsed_item = spider.parse_node(response, nodes) | ||
assert parsed_item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert parsed_item.record
9c9bc7d
to
3f0c8f6
Compare
3f0c8f6
to
f35c6bc
Compare
f07202f
to
ed1a59b
Compare
Signed-off-by: David Caro <[email protected]>
Signed-off-by: David Caro <[email protected]>
Signed-off-by: David Caro <[email protected]>
Now it contains more than just crawler format to hep format converters. Signed-off-by: David Caro <[email protected]>
Signed-off-by: David Caro <[email protected]>
Signed-off-by: David Caro <[email protected]>
Signed-off-by: David Caro <[email protected]>
ed1a59b
to
ce8ccc0
Compare
Signed-off-by: Spyridon Delviniotis <[email protected]>
Build fixed! 😄 LGTM |
|
Signed-off-by: David Caro <[email protected]>
Signed-off-by: David Caro <[email protected]>
Signed-off-by: David Caro <[email protected]>
Signed-off-by: Spiros Delviniotis [email protected]
Description
Related Issue
Closes #133
Motivation and Context
Checklist:
RFC
and look for it).